On existence

June 29, 2018July 5, 2018 ~ squarishbracket ~ Leave a comment

Epistemic status: This is a line of thought that I’m not fully on board with, but have been taking more seriously recently. I wouldn’t be surprised if I object to all of this down the line.

The question of whether or not a given thing exists is not an empty question or a question of mere semantics. It is a question which you can get empirical evidence for, and a question whose answer affects what you expect to observe in the world.

Before explaining this further, I want to draw an analogy between ontology and causation (and my attitudes towards them).

Early in my philosophical education, my attitude towards causality was sympathetic to the Humean-style eliminativism, in which causality is a useful construct that isn’t reflected in the fundamental structure of the world. That is, I quickly ‘tossed out’ the notion of causality, comfortable to just talk about the empirical regularities governed by our laws of physics.

Later, upon encountering some statisticians that exposed me to the way that causality is actually calculated in the real world, I began to feel that I had been overly hasty. In fact, it turns out that there is a perfectly rigorous and epistemically accessible formalization of causality, and I now feel that there is no need to toss it out after all.

Here’s an easy way of thinking about this: While the slogan “Correlation does not imply causality” is certainly true, the reverse (“Causality does not imply correlation”) is trickier. In fact, whenever you have a causal relationship between variables, you do end up expecting some observable correlations. So while you cannot deductively conclude a causal relationship from a merely correlational one, you can certainly get evidence for some causal models.

This is just a peek into the world of statistical discovery of causal relationships – going further requires a lot more work. But that’s not necessary for my aim here. I just want to express the following parallel:

Rather than trying to set up a perfect set of necessary and sufficient conditions for application of the term ’cause’, we can just take a basic axiom that any account of causation must adhere to. Namely: Where there’s causation, there’s correlation.

And rather than trying to set up a perfect set of necessary and sufficient conditions for the term ‘existence’, we can just take a basic axiom that any account of existence must adhere to. Namely: If something affects the world, it exists.

This should seem trivially obvious. While there could conceivably be entities that exist without affecting anything, clearly any entity that has a causal impact on the world must exist.

The contrapositive of this axiom is that if something doesn’t exist, it does not affect the world.

Again, this is not a controversial statement. And importantly, it makes ontology amenable to scientific inquiry! Why? Because two worlds with different ontologies will have different repertoires of causes and effects. A world in which nothing exists is a world in which nothing affects anything – a dead, static region of nothingness. We can rule out this world on the basis of our basic empirical observation that stuff is happening.

This short argument attempts to show that ontology is a scientifically respectable concept, and not merely a matter of linguistic game-playing. Scientific theories implicitly assume particular ontologies by relying upon laws of nature which reference objects with causal powers. Fundamentally, evidence that reveals the impotence of these supposed causal powers serves as evidence against the ontological framework of such theories.

I think the temptation to wave off ontological questions as somehow disreputable and unscientific actually springs from the fundamentality of this concept. Ontology isn’t a minor add-on to our scientific theories done to appease the philosophers. Instead, it is built in from the ground floor. We can’t do science without implicitly making ontological assumptions. I think it’s better to make these assumptions explicit and debate about the fundamental principles by which we justify them, then it is to do it invisibly, without further analysis.

Concepts we keep and concepts we toss out

June 28, 2018 ~ squarishbracket ~ 4 Comments

Often when we think about philosophical concepts like identity, existence, possibility, and so on, we find ourselves confronted with numerous apparent paradoxes that require us to revise our initial intuitive conception. Sometimes, however, the revisions necessary to render the concept coherent end up looking so extreme as to make us prefer to just throw out the concept altogether.

An example: claims about identity are implicit in much of our reasoning (“I was thinking about this in the morning” implicitly presumes an identity between myself now and the person resembling me in my apartment this morning). But when we analyze our intuitive conception of identity, we find numerous incoherencies (e.g. through Sorites-style paradoxes in which objects retain their identity through arbitrarily small transformations, but then end up changing their identity upon the conjunction of these transformations anyway).

When faced with these incoherencies, we have a few options: first of all, we can decide to “toss out” the concept of identity (i.e. determine that the concept is too fundamentally paradoxical to be saved), or we can decide to keep it. If we keep it, then we are forced to bite some bullets (e.g. by revising the concept away from our intuitions to a more coherent neighboring concept, or by accepting the incoherencies).

In addition, keeping the concept does not mean thinking that the concept actually successfully applies to anything. For instance, one might keep the concept of free will (in that they have a well-defined personal conception of it), while denying that free will exists. This is the difference between saying “People don’t have free will, and that has consequences X, Y, and Z” and saying “I think that contradictions are so deeply embedded in the concept of free will that it’s fundamentally unsavable, and henceforth I’m not going to reason in terms of it.” I often hop back and forth between these positions, but I think they are really quite different.

One final way to describe this distinction: When faced with a statement like “X exists,” we have three choices: We can say that the statement is true, that it is false, or that it is not a proposition. This third category is what we would say about statements like “Arghleschmargle” or “Colorless green ideas sleep furiously”. While they are sentences that we can speak, they just aren’t the types of things that could be true or false. To throw out the concept of existence is to say that a statement like “X exists” is neither true nor false, and to keep it is to treat it as having a truth value.

I have a clear sense for any given concept whether or not I think it’s better to keep or toss out, and I imagine that others can do the same. Here’s a table of some common philosophical concepts and my personal response to each:

Keep
Causality
Existence
Justification
Free will
Time
Consciousness
Randomness
Meaning (of life)
Should (ethical)
Essences
Representation / Intentionality

Toss Out
Knowledge
Identity
Possibility
Objects
Forms
Purposes (in the teleological sense)
Beauty

Many of these I’m not sure about, and I imagine I could have my mind easily changed (e.g. identity, possibility, intentionality). Some I’ve even recently changed my mind about (causality, existence). And others I feel quite confident about (e.g. knowledge, randomness, justification).

I’m curious about how others’ would respond… What philosophical concepts do you lean towards keeping, and which concepts do you lean towards tossing out?

Seeing, Doing, Imagining

June 27, 2018June 27, 2018 ~ squarishbracket ~ Leave a comment

A picture from Pearl that I think nicely sums up the different levels of reasoning:

Screen Shot 2018-06-27 at 1.10.23 AM.png

Mere probability theory gives you only the first level. To get to the level of causal and counterfactual understanding, we must supplement probabilities with graphical models.

Against moral realism

June 27, 2018 ~ squarishbracket ~ 3 Comments

Here’s my primary problem with moral realism: I can’t think of any acceptable epistemic framework that would give us a way to justifiably update our beliefs in the objective truth of moral claims. I.e. I can’t think of any reasonable account of how we could have justified beliefs in objectively true moral principles.

Here’s a sketch of a plausible-seeming account of epistemology. Broad-strokes, there are two sources of justified belief: deduction and induction.

Deduction refers to the process by which we define some axioms and then see what logically follows from them. So, for instance, the axioms of Peano Arithmetic entail the theorem that 1+1=2 – or, in Peano’s language, S(0) + S(0) = S(S(0)). The central reason why reasoning by deduction is reliable is that the truths established are true by definition – they are made true by the way we have constructed our terms, and are thus true in every possible world.

Induction is scientific reasoning – it is the process of taking prior beliefs, observing evidence, and then updating these beliefs (via Bayes’ rule, for instance). The central reason why induction is reliable comes from the notion of causal entanglement. When we make an observation and update our beliefs based upon this observation, the brain state “believes X” has become causally entangled with the truth of the the statement X. So, for instance, if I observe a positive result on a pregnancy test, then my belief in the statement “I am pregnant” has become causally entangled with the truth of the statement “I am pregnant.” It is exactly this that justifies our use of induction in reasoning about the world.

Now, where do moral claims fall? They are not derived from deductive reasoning… that is, we cannot just arbitrarily define right and wrong however we like, and then derive morality from these definitions.

And they are also not truths that can be established through inductive reasoning; after all, objective moral truths are not the types of things that have any causal effects on the world.

In other words, even if there are objective moral truths, we would have no way of forming justified beliefs about this. To my mind, this is a pretty devastating situation for a moral realist. Think about it like this: a moral realist who doesn’t think that moral truths have causal power over the world must accept that all of their beliefs about morality are completely causally independent of their truth. If we imagine keeping all the descriptive truths about the world fixed, and only altering the normative truths, then none of the moral realist’s moral beliefs would change.

So how do they know that they’re in the world where their moral beliefs actually do align with the moral reality? Can they point to any reason why their moral beliefs are more likely to be true than any other moral statements? As far as I can tell, no, they can’t!

Now, you might just object to the particular epistemology I’ve offered up, and suggest some new principle by which we can become acquainted with moral truth. This is the path of many professional philosophers I have talked to.

But every attempt that I’ve heard of for doing this begs the question or resorts to just gesturing at really deeply held intuitions of objectivity. If you talk to philosophers, you’ll hear appeals to a mysterious cognitive ability to reflect on concepts and “detect their intrinsic properties”, even if these properties have no way of interacting with the world, or elaborate descriptions of the nature of “self-evident truths.”

(Which reminds me of this meme)

None of this deals with the central issue in moral epistemology, as I see it. This central issue is: How can a moral realist think that their beliefs about morality are any more likely to be true than any random choice of a moral framework?

Explanation is asymmetric

June 26, 2018July 3, 2018 ~ squarishbracket ~ Leave a comment

We all regularly reason in terms of the concept of explanation, but rarely think hard about what exactly we mean by this explanation. What constitutes a scientific explanation? In this post, I’ll point out some features of explanation that may not be immediately obvious.

Let’s start with one account of explanation that should seem intuitively plausible. This is the idea that to explain X to a person is to give that person some information I that would have allowed them to predict X.

For instance, suppose that Janae wants an explanation of why Ari is not pregnant. Once we tell Janae that Ari is a biological male, she is satisfied and feels that the lack of pregnancy has been explained. Why? Well, because had Janae known that Ari was a male, she would have been able to predict that Ari would not get pregnant.

Let’s call this the “predictive theory of explanation.” On this view, explanation and prediction go hand-in-hand. When somebody learns a fact that explains a phenomenon, they have also learned a fact that allows them to predict that phenomenon.

To spell this out very explicitly, suppose that Janae’s state of knowledge at some initial time is expressed by

K₁ = “Males cannot get pregnant.”

At this point, Janae clearly cannot conclude anything about whether Ari is pregnant. But now Janae learns a new piece of information, and her state of knowledge is updated to

K₂ = “Ari is a male & males cannot get pregnant.”

Now Janae is warranted in adding the deduction

K’ = “Ari cannot get pregnant”

This suggests that added information explains Ari’s non-pregnancy for the same reason that it allows the deduction of Ari’s non-pregnancy.

Now, let’s consider a problem with this view: the problem of relevance.

Suppose a man named John is not pregnant, and somebody explains this with the following two premises:

People who take birth control pills almost certainly don’t get pregnant.
John takes birth control pills regularly.

Now, these two premises do successfully predict that John will not get pregnant. But the fact that John takes birth control pills regularly gives no explanation at all of his lack of pregnancy. Naively applying the predictive theory of explanation gives the wrong answer here.

You might have also been suspicious of the predictive theory of explanation on the grounds that it relied on purely logical deduction and a binary conception of knowledge, not allowing us to accommodate the uncertainty inherent in scientific reasoning. We can fix this by saying something like the following:

What it is to explain X to somebody that knows K is to give them information I such that

(1) P(X | K) is small, and
(2) P(X | K, I) is large.

“Small” and “large’ here are intentionally vague; it wouldn’t make sense to draw a precise line in the probabilities.

The idea here is that explanations are good insofar as they (1) make their explanandum sufficiently likely, where (2) it would be insufficiently likely without them.

We can think of this as a correlational account of explanation. It attempts to root explanations in sufficiently strong correlations.

First of all, we can notice that this doesn’t suffer from a problem with irrelevant information. We can find relevance relationships by looking for independencies between variables. So maybe this is a good definition of scientific explanation?

Unfortunately, this “correlational account of explanation” has its own problems.

Take the following example.

uploaded image

This flagpole casts a shadow of length L because of the angle of elevation of the sun and the height of the flagpole (H). In other words, we can explain the length of the shadow with the following pieces of information:

I₁ = “The angle of elevation of the sun is θ”
I₂ = “The height of the lamp post is H”
I₃ = Details involving the rectilinear propagation of light and the formation of shadows

Both the predictive and correlational theory of explanation work fine here. If somebody wanted an explanation for why the shadow’s length is L, then telling them I₁, I₂, and I₃ would suffice. Why? Because I₁, I₂, and I₃jointly allow us to predict the shadow’s length! Easy.

X = “The length of the shadow is L.”
(I₁ & I₂ & I₃) ⇒ X
So I₁ & I₂ & I₃ explain X.

And similarly, P(X | I₁ & I₂ & I₃) is large, and P(X) is small. So on the correlational account, the information given explains X.

But now, consider the following argument:

(I₁ & I₃ & X) ⇒ I₂
So I₁ & I₃ & X explain I₂.

The predictive theory of explanation applies here. If we know the length of the shadow and the angle of elevation of the sun, we can deduce the height of the flagpole. And the correlational account tells us the same thing.

But it’s clearly wrong to say that the explanation for the height of the flagpole is the length of the shadow!

What this reveals is an asymmetry in our notion of explanation. If somebody already knows how light propagates and also knows θ, then telling them H explains L. But telling them L does not explain H!

In other words, the correlational theory of explanation fails, because correlation possesses symmetry properties that explanation does not.

This thought experiment also points the way to a more complete account of explanation. Namely, the relevant asymmetry between the length of the shadow and the height of the flagpole is one of causality. The reason why the height of the flagpole explains the shadow length but not vice versa, is that the flagpole is the cause of the shadow and not the reverse.

In other words, what this reveals to us is that scientific explanation is fundamentally about finding causes, not merely prediction or statistical correlation. This causal theory of explanation can be summarized in the following:

An explanation of A is a description of its causes that renders it intelligible.

More explicitly, an explanation of A (relative to background knowledge K) is a set of causes of A that render X intelligible to a rational agent that knows K.

Wave function entropy

June 25, 2018May 9, 2019 ~ squarishbracket ~ 4 Comments

Entropy is a feature of probability distributions, and can be taken to be a quantification of uncertainty.

Standard quantum mechanics takes its fundamental object to be the wave function – an amplitude distribution. And from an amplitude distribution Ψ you can obtain a probability distribution Ψ^*Ψ.

So it is very natural to think about the entropy of a given quantum state. For some reason, it looks like this concept of wave function entropy is not used much in physics. The quantum-mechanical version of entropy that is typically referred to is the Von-Neumann entropy, which involves uncertainty over which quantum state a system is in (rather than uncertainty intrinsic to a quantum state).

I’ve been looking into some of the implications of the concept of wave function entropy, and found a few interesting things.

Firstly, let’s just go over what precisely wave function entropy is.

Quantum mechanics is primarily concerned with calculating the wave function Ψ(x), which distributes complex amplitudes over phase space. The physical meaning of these amplitudes is interpreted by taking their absolute square Ψ^*Ψ, which is a probability distribution.

Thus, the entropy of the wave function is given by:

S = – ∫ Ψ^*Ψ ln(Ψ^*Ψ) dx

As an example, I’ll write out some of the wave functions for the basic hydrogen atom:

(Ψ^*Ψ)_1s = e^-2r / π
(Ψ^*Ψ)_2s = (2 – r)² e^-r / 32π(Ψ^*Ψ)_2p = r² e^-r cos(θ) / 32π
(Ψ^*Ψ)_3s = (2r² – 18r + 27)² e^-⅔r / 19683π

With these wave functions in hand, we can go ahead and calculate the entropies! Some of the integrals are intractable, so using numerical integration, we get:

S_1s ≈ 70
S_2s ≈ 470
S_2p ≈ 326
S_3s ≈ 1320

The increasing values for (1s, 2s, 3s) make sense – higher energy wave functions are more dispersed, meaning that there is greater uncertainty in the electron’s spatial distribution.

Let’s go into something a bit more theoretically interesting.

We’ll be interested in a generalization of entropy – relative entropy. This will quantify, rather than pure uncertainty, changes in uncertainty from a prior probability distribution ρ to our new distribution Ψ^*Ψ. This will be the quantity we’ll denote S from now on.

S = – ∫ Ψ^*Ψ ln(Ψ^*Ψ/ρ) dx

Now, suppose we’re interested in calculating the wave functions Ψ that are local maxima of entropy. This means we want to find the Ψ for which δS = 0. Of course, we also want to ensure that a few basic constraints are satisfied. Namely,

∫ Ψ^*Ψ dx = 1
∫ Ψ^*HΨ = E

These constraints are chosen by analogy with the constraints in ordinary statistical mechanics – normalization and average energy. H is the Hamiltonian operator, which corresponds to the energy observable.

We can find the critical points of entropy that satisfy the constraint by using the method of Lagrange multipliers. Our two Lagrange multipliers will be α (for normalization) and β (for energy). This gives us the following equation for Ψ:

Ψ ln(Ψ^*Ψ/ρ) + (α + 1)Ψ + βHΨ = 0

We can rewrite this as an operator equation, which gives us

ln(Ψ^*Ψ/ρ) + (α + 1) + βH = 0
Ψ^*Ψ = ρ/Z e^-βH

Here we’ve renamed our constants so that Z = e^α+1 is a normalization constant.

So we’ve solved the wave function equation… but what does this tell us? If you’re familiar with some basic quantum mechanics, our expression should look somewhat familiar to you. Let’s backtrack a few steps to see where this familiarity leads us.

Ψ ln(Ψ^*Ψ/ρ) + (α + 1)Ψ + βHΨ = 0
HΨ + 1/β ln(Ψ^*Ψ/ρ) Ψ = – (α + 1)/β Ψ

Let’s rename – (α + 1)/β to a new constant λ. And we’ll take a hint from statistical mechanics and call 1/β the temperature T of the state. Now our equation looks like

HΨ + T ln(Ψ^*Ψ/ρ) Ψ = λΨ

This equation is almost the Schrodinger equation. In particular, the Schrodinger equation pops out as the zero-temperature limit of this equation:

As T → 0,
our equation becomes…
HΨ = λΨ

The obvious interpretation of the constant λ in the zero temperature limit is E, the energy of the state.

What about in the infinite-temperature limit?

As T → ∞,
our equation becomes…
Ψ^*Ψ = ρ

Why is this? Because the only solution to the equation in this limit is for ln(Ψ^*Ψ/ρ) → 0, or in other words Ψ^*Ψ/ρ → 1

And what this means is that in the infinite temperature limit, the critical entropy wave function is just that which gives the prior distribution.

We can interpret this result as a generalization of the Schrodinger equation. Rather than a linear equation, we now have an additional logarithmic nonlinearity. I’d be interested to see how the general solutions to this equation differ from the standard equations, but that’s for another post.

HΨ + T ln(Ψ^*Ψ/ρ) Ψ = λΨ

Query sensitivity of evidential updates

June 24, 2018July 3, 2018 ~ squarishbracket ~ Leave a comment

Plausible reasoning, unlike logical deduction, is sensitive not only to the information at hand but also to the query process by which the information was obtained.

Judea Pearl, Probabilistic Reasoning in Intelligent Systems

This quote references an interesting feature of inductive reasoning that’s worth unpacking. It is indicative of the different level of complexity involved in formalizing induction than that involved in formalizing deduction.

A very simple example of this:

You spread a rumor to your neighbor N. A few days later you hear the same rumor from another neighbor N’. Should you increase your belief in the rumor now that N acknowledges it, or should you determine first whether N heard it from N’?

Clearly, if the only source of information for N’ was N, then your belief should not change. But if N’ independently confirmed the validity of the rumor, you have good reason to increase your belief in it.

In general, when you have both top-down (predictive) and bottom-up (explanatory/diagnostic) inferences in evidential reasoning, it is important to be able to trace back queries. If not, one runs the risk of engaging in circular reasoning.

So far this is all fairly obvious. Now here’s an example that’s more subtle.

Three prisoners problem

Three prisoners have been tried for murder, and their verdicts will be read tomorrow morning. Only one will be declared guilty, and the other two will be declared innocent.

Before sentencing, Prisoner A asks the guard (who knows which prisoner will be declared guilty) to do him a favor and give a letter to one of the other two prisoners who will be released (since only one person will be declared guilty, Prisoner A knows that at least one of the other two prisoners will be released). The guard does so, and later, Prisoner A asks him which of the two prisoners (B or C) he gave the letter two. The guard responds “I gave the letter to Prisoner B.”

Now Prisoner A reasons as follows:

“Previously, my chances of being executed were one in three. Now that I know that B will be released, only C and I remain as candidates for being declared guilty. So now my chances are one in two.”

Is this wrong?

Denote “A is guilty” as G_A, and “B is innocent” as I_B. Now, since G_A → I_B, we have that P(I_B | G_A) = 1. This tells us that we can write

P(G_A | I_B) = P(I_B | G_A) P(G_A) / P(I_B)
= P(G_A) / P(I_B) = ⅓ / ⅔ = ½

The problem with this argument is that we have excluded some of the context of the guard’s response, namely, that the guard could only have answered “I gave the letter to Prisoner B” or “I gave the letter to Prisoner C.” In other words, the fact “Prisoner B will be declared innocent” leads to the wrong conclusion about the credibility of A’s guilt.

Let’s instead condition on I_B’ = “Guard says that B will be declared innocent.” Now we get

P(G_A | I_B’) = P(I_B’ | G_A) P(G_A) / P(I_B’) = ½ ⋅ ⅓ / ½ = ⅓.

It’s not sufficient to just condition on what the guard said. We must consider the range of possible statements that the guard could have made.

In general, we cannot only assess the impact of propositions implied by information. We must also consider what information we could have received.

Things get clearer if we consider a similar thought experiment.

1000 prisoners problem

You are one of 1000 prisoners awaiting sentencing, with the knowledge that only one of you will be declared guilty. You come across a slip of paper from the court listing 998 prisoners; each name marked ‘innocent’. You look through all 998 names and find that your name is missing.

This should worry you greatly – your chances of being declared guilty have gone from 1 in 1000 to 1 in 2.

But imagine that you now see the query that produced the list.

Query: “Print the names of any 998 innocent right-handed prisoners.”

If you are the only left-handed prisoner, then you should thank your lucky stars. Why? Because now that you know that the query couldn’t have produced your name, the fact that it didn’t gives you no information. In other words, your chances have gone back from 1:2 to 1:1000.

In this example you can see very clearly why information about the possible outputs of a query is relevant to how we should update on the actual output of the query. We must know the process by which we attain information in order to be able to accommodate that information into our beliefs.

But now what if we don’t have this information? Suppose that you only run into the list of prisoners, and have no additional knowledge about how it was produced. Well, then we must consider all possible queries that might have produced this output!

This is no small matter.

For simplicity, let’s reconsider the simpler example with just three prisoners: Prisoners A, B, and C. Imagine that you are Prisoner A.

You come across a slip of paper from the court containing the statement I, where

I = “Prisoner B will be declared innocent”.

Now, we must assess the impact of I on the proposition G_A = “Prisoner A is guilty.”

P(G_A | I) = P(I | G_A) P(G_A) / P(I) = ⅓ P(I | G_A) / P(I)

The result of this calculation depends upon P(I | G_A), or in other words, how likely it is that the slip would declare Prisoner B innocent, given that you are guilty. This depends on the query process, and can vary anywhere from 0 to 1. Let’s just give this variable a name: P(I | G_A) = p.

We’ll also need to know two other probabilities: (1) that the slip declares B innocent given that B is guilty, and (2) that it declares B innocent given that C is guilty. We’ll assume that the slip cannot be lying (i.e. that the first of these is zero), and name the second probability q = P(I | G_C).

P(I | G_A) = p (slip could declare either B or C innocent)
P(I | G_B) = 0 (slip could declare either A or C innocent)
P(I | G_C) = q (slip could declare either A or B innocent)

Now we have

P(G_A | I) = ⅓ p / P(I)
=⅓ p / [P(I | G_A) P(G_A) + P(I | G_B) P(G_B) + P(I | G_C) P(G_C)]
= ⅓ p / (⅓ p + 0 + ⅓ q)
= p/(p + q)

How do we assess this value, given that p and q are unknown? The Bayesian solution is to treat the probabilities p and q as random variables, and specify probability distributions over their possible values: f(p) and g(q). This distribution should contain all of your prior knowledge about the plausible queries that might have produced I.

The final answer is obtained by integrating over all possible values of p and q.

P(G_A | I) = E[p/(p + q)]
= ∫ p/(p + q) f(p) g(q) dp dq

Supposing that our distributions over p and q are maximally uncertain, the final distribution we obtain is

P(G_A | I) = ∫ p/(p + q) dp dq
= 0.5

Now suppose that we know that the slip could not declare A (yourself) innocent (as we do in the original three prisoners problem). Then we know that q = 1 (since if C is guilty and A couldn’t be on the slip, B is the only possible choice). This gives us

P(G_A | I) = ∫ p/(p + 1) f(p) dp

If we are maximally uncertain about the value of p, we obtain

P(G_A | I) = ∫ p/(p + 1) dp
= 1 – ln(2)
≈ 0.30685

If, on the other hand, we are sure that the value of p is 50% (i.e., we know that in the case that A is guilty, the guard chooses randomly between B and C), we obtain

P(G_A | I) = .5/(.5 + 1) = ⅓

We’ve re-obtained our initial result! Interestingly, we can see that being maximally uncertainty about the guard’s procedure for choosing between B and C gives a different answer than knowing that the guard chooses totally randomly between B and C.

Notice that this is true even though these reflect the same expectation of what choice the guard will make!

I.e., in both cases (total uncertainty about p, and knowledge that p is exactly .5), we should have 50% credence in the guard choosing B. This gives us some insight into the importance of considering different types of uncertainty when doing induction, which is a topic for another post.

Summarizing conscious experience

June 20, 2018June 24, 2018 ~ squarishbracket ~ 1 Comment

There’s a puzzle for implementation of probabilistic reasoning in human beings. This is that the start of the reasoning process in humans is conscious experience, and it’s not totally clear how we should update on conscious experiences.

Jeffreys defined a summary of an experience E as a set B of propositions {B₁, B₂, … B_n} such that for all other propositions in your belief system A, P(A | B) = P(A | B, E).

In other words, B is a minimal set of propositions that fully screens off your experience.

This is a useful concept because summary sentences allow you to isolate everything that is epistemically relevant about conscious experience. if you have a summary B of an experience E, then you only need to know P(A | B) and P(B | E) in order to calculate P(A | E).

Notice that the summary set is subjective; it is defined only in terms of properties of your personal belief network. The set of facts that screens off E for you might be different from the set of facts that screens it off for somebody else.

Quick example.

Consider a brief impression by candlelight of a cloth held some distance away from you. Call this experience E.

Suppose that all you could decipher from E is that the cloth was around 2 meters away from you, and that it was either blue (with probability 60%) or green (with probability 40%). Then the summary set for E might be {“The cloth is blue”, “The cloth is green”, “The cloth is 2 meters away from you”, “The cloth is 3 meters away from you”, etc.}.

If this is the right summary set, then the probabilities P(“The cloth is blue”), P(“The cloth is green”) and P(“The cloth is x meters away from you”) should screen off E from the rest of your beliefs.

One trouble is that it’s not exactly obvious how to go about converting a given experience into a set of summary propositions. We could always be leaving something out. For instance, one more thing we learned upon observing E was the proposition “I can see light.” This is certainly not screened off by the other propositions so far, so we need to add it in as well.

But how do we know that we’ve gotten everything now? If we think a little more, we realize that we have also learned something about the nature of the light given off by the candle flame. We learn that it is capable of reflecting the color of light that we saw!

But now this additional consideration is related to how we interpret the color of the cloth. In other words, not only might we be missing something from our summary set, but that missing piece might be relevant to how we interpret the others.

I’d like to think more about this question: In general, how do we determine the set of propositions that screens off a given experience from the rest of your beliefs? Ultimately, to be able to coherently assess the impact of experiences on your web of beliefs, your model of reality must contain a model of yourself as an experiencer.

The nature of this model is pretty interesting from a philosophical perspective. Does it arise organically out of factual beliefs about the physical world? Well, this is what a physicalist would say. To me, it seems quite plausible that modeling yourself as a conscious experiencer would require a separate set of rules relating physical happenings to conscious experiences. How we should model this set of rules as a set of a priori hypotheses to be updated on seems very unclear to me.

Simple induction

June 17, 2018June 18, 2018 ~ squarishbracket ~ Leave a comment

In front of you is a coin. You don’t know the bias of this coin, but you have some prior probability distribution over possible biases (between 0: always tails, and 1: always heads). This distribution has some statistical properties that characterize it, such as a standard deviation and a mean. And from this prior distribution, you can predict the outcome of the next coin toss.

Now the coin is flipped and lands heads. What is your prediction for the outcome of the next toss?

This is a dead simple example of a case where there is a correct answer to how to reason inductively. It is as correct as any deductive proof, and derives a precise and unambiguous result:

Fixed

This is a law of rational thought, just as rules of logic are laws of rational thought. It’s interesting to me how the understanding of the structure of inductive reasoning begins to erode the apparent boundary between purely logical a priori reasoning and supposedly a posteriori inductive reasoning.

Anyway, here’s one simple conclusion that we can draw from the above image: After the coin lands heads, it should be more likely that the coin will land heads next time. After all, the initial credence was µ, and the final credence is µ multiplied by a value that is necessarily greater than 1.

You probably didn’t need to see an equation to guess that for each toss that lands H, future tosses landing H become more likely. But it’s nice to see the fundamental justification behind this intuition.

We can also examine some special cases. For instance, consider a uniform prior distribution (corresponding to maximum initial uncertainty about the coin bias). For this distribution (π = 1), µ = 1/2 and σ = 1/3. Thus, we arrive at the conclusion that after getting one heads, your credence in the next toss landing heads should be 13/18 (72%, up from 50%).

We can get a sense of the insufficiency of point estimates using this example. Two prior distributions with the same average value will respond very differently to evidence, and thus the final point estimate of the chance of H will differ. But what is interesting is that while the mean is insufficient, just the mean and standard deviation suffice for inferring the value of the next point estimate.

In general, the dynamics are controlled by the term σ/µ. As σ/µ goes to zero (which corresponds to a tiny standard deviation, or a very confident prior), our update goes to zero as well. And as σ/µ gets large (either by a weak prior or a low initial credence in the coin being H-biased), the observation of H causes a greater update.

How large can this term possibly get? Obviously, the updated point estimate should asymptote towards 1, but this is not obvious from the form of the equation we have (it looks like σ/µ can get arbitrarily large, forcing our final point estimate to infinity). What we need to do is optimize the updated point estimate, while taking into account the constraints implied by the relationship between σ and µ.

The North Korea problem isn’t solved

June 14, 2018 ~ squarishbracket ~ Leave a comment

Donald Trump and Kim Jong Un just met and signed a deal committing North Korea to nuclear disarmament. Yay! Problem solved!

Except that there’s a long historical precedent of North Korea signing deals just like this one, only to immediately go back on them. Here’s a timeline for some relevant historical context.

1985: North Korea signs Nuclear Non-Proliferation Treaty
1992: North Korea signs historic agreement to halt nuclear program! (#1)
1993: North Korea is found to be cheating on its commitments under the NPT
1994: In exchange for US assistance in production of proliferation-free nuclear power plants, North Korea signs historic agreement to halt nuclear program! (#2)
1998: North Korea is suspected of having an underground nuclear facility
1998: North Korea launches missile tests over Japan
1999: North Korea signs historic agreement to end missile tests, in exchange for a partial lifting of economic sanctions by the US.
2000: North Korea signs historic agreement to reunify Korea! Nobel Peace Prize is awarded
2002-2003: North Korea admits to having a secret nuclear weapons program, and withdraws from the NPT
2004: North Korea allows an unofficial US delegation to visit its nuclear facilities to display a working nuclear weapon
2005: In exchange for economic and energy assistance, North Korea signs historic agreement to halt nuclear program and denuclearize! (#3)
2006: North Korea fires seven ballistic missiles and conducts an underground nuclear test
2006: North Korea declares support for denuclearization of Korean peninsula
2006: North Korea again supports denuclearization of Korean peninsula
2007: In exchange for energy aid from the US, North Korea signs historic agreement to halt nuclear program! (#4)
2007: N&S Korea sign agreement on reunification
2009: North Korea issues a statement outlining a plan to weaponize newly separated plutonium
2010: North Korea threatens war with South Korea
2010: North Korea again announces commitment to denuclearize
2011: North Korea announces plan to halt nuclear and missile tests
2012: North Korea announces halt to nuclear program
2013: North Korea announces intentions to conduct more nuclear tests
2014: North Korea test fires 30 short-range rockets, as well as two medium missiles into the Sea of Japan
2015: North Korea offers to halt nuclear tests
2016: North Korea announces that it has detonated a hydrogen bomb
2016: North Korea again announces support for denuclearization
2017: North Korea conducts its sixth nuclear test
2018: Kim Jong Un announces that North Korea will mass produce nuclear warheads and ballistic missiles for deployment
2018: In exchange for the cancellation of US-South Korea military exercises, North Korea, once again, commits to “work toward complete denuclearization on the Korean peninsula”

Maybe this time is really, truly different. But our priors should be informed by history, and history tell us that it’s almost certainly not.