Why minimizing sum of squares is equivalent to frequentist inference

(This will be the first in a short series of posts describing how various commonly used statistical methods are approximate versions of frequentist, Bayesian, and Akaike-ian inference)

Suppose that we have some data D = { (x₁, y₁), (x₂, y₂), … , (xɴ, yɴ) }, and a candidate function y = f(x).

Frequentist inference involves the assessment of the likelihood of the data given this candidate function: P(D | f).

Since D is composed of N independent data points, we can assess the probability of each data point separately, and multiply them all together.

P(D | f) = P(x₁, y₁ | f) P(x₂, y₂ | f) … P(xɴ, yɴ | f)

So now we just need to answer the question: What is P(x, y | f)?

f predicts that for the value x, the most likely y-value is f(x).

The other possible y-values will be normally distributed around f(x).

IMG_20180522_192208774

The equation for this distribution is a Gaussian:

P(x, y | f) = exp[ -(y – f(x))² / 2σ² ] / √(2πσ²)

Now that we know how to find P(x, y | f), we can easily calculate P(D | F)!

P(D | f) = exp[ -(y – f(x))² / 2σ² ] /√(2πσ²) ・ exp[ -(y – f(x))² / 2σ² ] / √(2πσ²) … exp[ -(y – f(x))² / 2σ² ] / √(2πσ²)
= exp[ -(y – f(x))² / 2σ² ] ・ exp[ -(y – f(x))² / 2σ² ] … exp[ -(y – f(x))² / 2σ² ] / (2πσ²)N/2

Products are messy and logarithms are monotonic, so log(P(D | f)) is easier to work with: it turns the product into a sum.

log P(D | f) = log( exp[ -(y₁ – f(x₁))² / 2σ² ] … exp[ -(yɴ – f(xɴ))² / 2σ² ] / (2πσ²)N/2 )
= log( exp[ -(y₁ – f(x₁))² / 2σ² ] ) + … log( exp[ -(yɴ – f(xɴ))² / 2σ² ] ) – N/2 log(2πσ²)
= -(y₁ – f(x₁))² / 2σ² ) + -(yɴ – f(xɴ))² / 2σ² ) – N/2 log(2πσ²)
= -1/2σ² [ (y₁ – f(x₁))² + … +(yɴ – f(xɴ))² ] – N/2 log(2πσ²)

Now notice that the sum of squares just naturally pops out!

SOS = (y₁ – f(x₁))² + … + (yɴ – f(xɴ))²
log P(D | f) = -SOS/2σ² – N/2 log(2πσ²)

Frequentist inference chooses f to maximize P(D | f). We can now immediately see why this is equivalent to minimizing SOS!

argmax{ P(D | f) }
= argmax{ log P(D | f) }
= argmax{ – SOS/2σ² – N/2 log(2πσ²) }
= argmin{ SOS/2σ² + N/2 log(2πσ²) }
= argmin{ SOS/2σ² }
= argmin{ SOS }

Next, we’ll go Bayesian…

History is Lamarckian

I just finished this novel, and loved every bit of it. It’s a plodding epic chronicling the colonization of Mars, and the first of a trilogy (Red Mars, Blue Mars, Green Mars) which I plan to continue.

Here’s one of my favorite exchanges, between the fiery revolution-minded anarchist Arkady and the group of more conventional thinkers among the first one hundred colonists of Mars. While I’m inclined to dismiss Arkady-types in the real world as wild-eyed idealists whose dreams are not anchored to the realities of human history, this was a passage that made me think hard, through the sheer force of its eloquence and originality.

Over a dessert of strawberries, Arkady floated up to propose a toast. “To the new world we now create!”

A chorus of groans and cheers; by now they all knew what he meant. Phyllis threw down a strawberry and said, “Look, Arkady, this settlement is a scientific station. Your ideas are irrelevant to it. Maybe in fifty or a hundred years. But for now, it’s going to be like the stations in Antarctica.”

“That’s true,” Arkady said. “But in fact Antarctic stations are very political. Most of them were built so that countries that built them would have a say in the revision of the Antarctic treaty. And now the stations are governed by laws set by that treaty, which was made by a very political process! So you see, you cannot just stick your head in sand crying ‘I am a scientist, I am a scientist!’ ” He put a hand to his forehead, in the universal mocking gesture of the prima donna. “No. When you say that, you are only saying, ‘I do not wish to think about complex systems!’ Which is not really worthy of true scientists, is it?”

“The Antarctic is governed by a treaty because no one lives there except in scientific stations,” Maya said irritably. To have their final dinner, their last moment of freedom, disrupted like this!

“True,” Arkady said. “But think of the result. In Antarctica, no one can own land. No one country or organization can exploit the continent’s natural resources, without the consent of every other country. No one can claim to own those resources, or take them and sell them to other people, so that some profit from them while others pay for their use. Don’t you see how radically different that is from the way the rest of the world is run? And this is the last area on Earth to be organized, to be given a set of laws. It represents what all governments working together feel instinctively is fair, revealed on land free from claims of sovereignty, or really from any history at all. It is, to say it plainly, Earth’s best attempt to create just property laws! Do you see? This is the way entire world should be run, if only we could free it from the straitjacket of history!”

Sax Russell, blinking mildly, said, “But Arkady, since Mars is going to be ruled by a treaty based on the old Antarctic one, what are you objecting to? The Outer Space Treaty states that no country can claim land on Mars, no military activities are allowed, and all bases are open to inspection by any country. Also no martian resources can become the property of a single nation —the UN is supposed to establish an international regime to govern any mining or other exploitation. If anything is ever done along that line, which I doubt will happen, then it is to be shared among all the nations of the world.” He turned a palm upward. “Isn’t that what you’re agitating for, already achieved?”

“It’s a start,” Arkady said. ”But there are aspects of that treaty you haven’t mentioned. Bases built on Mars will belong to the countries that build them, for instance. We will be building American and Russian bases, according to this provision of the law. And that puts us right back into the nightmare of Terran law and Terran history. American and Russian businesses will have the right to exploit Mars, as long as the profits are somehow shared by all the nations signing the treaty. This may only involve some sort of percentage paid to UN, in effect no more than bribe. I don’t believe we should acknowledge these provisions for even a moment!”

Silence followed this remark.

Ann Clayborne said, “This treaty also says we have to take measures to prevent the disruption of planetary environments, I think is how they put it. It’s in Article Seven. That seems to me to expressly forbid the terraforming that so many of you are talking about.”

“I would say that we should ignore that provision as well,” Arkady said quickly. “Our own well-being depends on ignoring it.”

This view was more popular than his others, and several people said so.

“But if you’re willing to disregard one article,” Arkady pointed out, “you should be willing to disregard the rest. Right?”

There was an uncomfortable pause.

“All these changes will happen inevitably,” Sax Russell said with a shrug. “Being on Mars will change us in an evolutionary way.”

Arkady shook his head vehemently, causing him to spin a little in the air over the table. “No, no, no, no! History is not evolution! It is a false analogy! Evolution is a matter of environment and chance, acting over millions of years. But history is a matter of environment and choice, acting within lifetimes, and sometimes within years, or months, or days! History is Lamarckian! So that if we choose to establish certain institutions on Mars, there they will be! And if we choose others, there they will be!” A wave of his hand encompassed them all, the people seated at the tables, the people floating among the vines: “I say we should make those choices ourselves, rather than having them made for us by people back on Earth. By people long dead, really.”

Phyllis said sharply, “You want some kind of communal utopia, and it’s not possible. I should think Russian history would have taught you something about that.”

“It has,” Arkady said. “Now I put to use what it has taught me.”

“Advocating an ill-defined revolution? Fomenting a crisis situation? Getting everyone upset and at odds with each other?”

A lot of people nodded at this, but Arkady waved them away. “I decline to accept blame for everyone’s problems at this point in the trip. I have only said what I think, which is my right. If I make some of you uncomfortable, that is your problem. It is because you don’t like the implications of what I say, but can’t find grounds to deny them.”

“Some of us can’t understand what you say,” Mary exclaimed.

“I say only this!” Arkady said, staring at her bug-eyed: “We have come to Mars for good. We are going to make not only our homes and our food, but also our water and the very air we breathe—all on a planet that has none of these things. We can do this because we have technology to manipulate matter right down to the molecular level. This is an extraordinary ability, think of it! And yet some of us here can accept transforming the entire physical reality of this planet, without doing a single thing to change our selves, or the way we live. To be twenty-first century scientists on Mars, in fact, but at the same time living within nineteenth century social systems, based on seventeenth century ideologies. It’s absurd, it’s crazy, it’s—it’s—” he seized his head in his hands, tugged at his hair, roared “It’s unscientific! And so I say that among all the many things we transform on Mars, ourselves and our social reality should be among them. We must terraform not only Mars, but ourselves.”

History is Lamarckian, in exactly the sense declared by Arkady. But this of course does not imply that the social systems we build are not subject to the same forces of selection that have caused the downfall of so many past societies.

***

Anyway, I highly recommend this book, and to give you a flavor, here are a few more of my favorite quotes, presented with zero context…

“We were too old!”

“We were not too old. We chose not to think of it. Most ignorance is by choice, you know, and so ignorance is very telling about what really matters to people.”

“Come on,” he said. He propped himself up on an elbow to look at her. “You really don’t know what beauty is, do you?”

“I certainly do,” Nadia said mulishly.

Arkady ignored her and said, “Beauty is power and elegance, right action, form fitting function, intelligence, and reasonability.”

“We didn’t mean to be selfish,” Hiroko said slowly. “We wanted to try it, to show by experiment how we can live here. Someone has to show what you mean when you talk about a different life, John Boone. Someone has to live the life.”

Sax Russell rose to his feet. He looked the same as ever, perhaps a bit more flushed than usual, but mild, small, blinking owlishly, his voice calm and dry, as if lecturing on some textbook point of thermodynamics, or enumerating the periodic table.

“The beauty of Mars exists in the human mind,” he said in that dry factual tone, and everyone stared at him amazed. “Without the human presence it is just a concatenation of atoms, no different than any other random speck of matter in the universe. It’s we who understand it, and we who give it meaning. All our centuries of looking up at the night sky and watching it wander through the stars. All those nights of watching it through the telescopes, looking at a tiny disk trying to see canals in the albedo changes. All those dumb sci-fi novels with their monsters and maidens and dying civilizations. And all the scientists who studied the data, or got us here. That’s what makes Mars beautiful. Not the basalt and the oxides.”

He paused to look around at them all. Nadia gulped; it was strange in the extreme to hear these words come out of the mouth of Sax Russell, in the same dry tone that he would use to analyze a graph. Too strange!

“Now that we are here,” he went on, “it isn’t enough to just hide under ten meters of soil and study the rock. That’s science, yes, and needed science too. But science is more than that. Science is part of a larger human enterprise, and that enterprise includes going to the stars, adapting to other planets, adapting them to us. Science is creation. The lack of life here, and the lack of any finding in fifty years of the SETI program, indicates that life is rare, and intelligent life even rarer. And yet the whole meaning of the universe, its beauty, is contained in the consciousness of intelligent life. We are the consciousness of the universe, and our job is to spread that around, to go look at things, to live everywhere we can. It’s too dangerous to keep the consciousness of the universe on only one planet, it could be wiped out. And so now we’re on two, three if you count the moon. And we can change this one to make it safer to live on. Changing it won’t destroy it. Reading its past might get harder, but the beauty of it won’t go away. If there are lakes, or forests, or glaciers, how does that diminish Mars’s beauty? I don’t think it does. I think it only enhances it. It adds life, the most beautiful system of all. But nothing life can do will bring Tharsis down, or fill Marineris. Mars will always remain Mars, different from Earth, colder and wilder. But it can be Mars and ours at the same time. And it will be. There is this about the human mind; if it can be done, it will be done. We can transform Mars and build it like you would build a cathedral, as a monument to humanity and the universe both. We can do it, so we will do it. So—” he held up a palm, as if satisfied that the analysis had been supported by the data in the graph – as if he had examined the periodic table, and found that it still held true – “we might as well start.”

Short and sweet proof of the f(xy) = f(x) + f(y) logarithmic property

If you want a continuous function f(x) from the reals to the reals that has the property that for all real x and y, f(xy) = f(x) + f(y), then this function must take the form f(x) = k log(x) for some real k.

A proof of this just popped into my head in the shower. (As always with shower-proofs, it was slightly wrong, but I worked it out and got it right after coming out).

I haven’t seen it anywhere before, and it’s a lot simpler than previous proofs that I’ve encountered.

Here goes:

f(xy) = f(x) + f(y)

differentiate w.r.t. x…
f'(xy) y = f'(x)

differentiate w.r.t. y…
f”(xy) xy + f'(xy) = 0

rearrange, and rename xy to z…
f”(z) = -f'(z)/z

solve for f'(z) with standard 1st order DE techniques…
df’/f’ = – dz/z
log(f’) = -log(z) + constant
f’ = constant/z

integrate to get f…
f(z) = k log(z) for some constant k

And that’s the whole proof!

As for why this is interesting to me… the equation f(xy) = f(x) + f(y) is very easy to arrive at in constructing functions with desirable features. In words, it means that you want the function’s outputs to be additive when the inputs are multiplicative.

One example of this, which I’ve written about before, is formally quantifying our intuitive notion of surprise. We formalize surprise by asking the question: How surprised should you be if you observe an event that you thought had a probability P? In other words, we treat surprise as a function that takes in a probability and returns a scalar value.

We can lay down a few intuitive desideratum for our formalization of surprise, and one such desideratum is that for independent events E and F, our surprise at them both happening should just be the sum of the surprise at each one individually. In other words, we want surprise to be additive for independent events E and F.

But if E and F are independent, then the joint probability P(E, F) is just the product of the individual probabilities: P(E, F) = P(E) P(F). In other words, we want our outputs to be additive, when our inputs are multiplicative!

This automatically gives us that the form of our surprise function must be k log(z). To spell it out explicitly…

Desideratum: Surprise(P(E, F)) = Surprise(P(E)) + Surprise(P(F))

But P(E,F) = P(E) P(F), so…
Surprise(P(E) P(F)) = Surprise(P(E)) + Surprise(P(F))

Renaming P(E) to x and P(F) to y…
Surprise(xy) = Surprise(x) + Surprise(y)

Thus, by the above proof…
Surprise(x) = k log(x) for some constant k

That’s a pretty strong constraint for some fairly weak inputs!

That’s basically why I find this interesting: it’s a strong constraint that comes out of an intuitively weak condition.

Constructing the world

In this six and a half hour lecture series by David Chalmers, he describes the concept of a minimal set of statements from which all other truths are a priori “scrutable” (meaning, basically, in-principle knowable or derivable).

What are the types of statements in this minimal set required to construct the world? Chalmers offers up four categories, and abbreviates this theory PQIT.

P

P is the set of physical facts (for instance, everything that would be accessible to a Laplacean demon). It can be thought of as essentially the initial conditions of the universe and the laws governing their changes over time.

Q

Q is the set of facts about qualitative experience. We can see Chalmers’ rejection of physicalism here, as he doesn’t think that Q is eclipsed within P. Example of a type of statement that cannot be derived from P without Q: “There is a beige region in the bottom right of my visual field.”

I

Here’s a true statement: “I’m in the United States.” Could this be derivable from P and Q? Presumably not; we need another set of indexical truths that allows us to have “self-locating” beliefs and to engage in anthropic reasoning.

T

Suppose that P, Q, and I really are able to capture all the true statements there are to be captured. Well then, the statement “P, Q, and I really are able to capture all the true statements there are to be captured” is a true statement, and it is presumably not captured by P, Q, and I! In other words, we need some final negative statements that tell us that what we have is enough, and that there are no more truths out there. These “that’s all”-type statements are put into the set T.

⁂⁂⁂

So this is a basic sketch of Chalmer’s construction. I like that we can use these tags like PQIT or PT or QIT as a sort of philosophical zip-code indicating the core features of a person’s philosophical worldview. I also want to think about developing this further. What other possible types of statements are there out there that may not be captured in PQIT? Here is a suggestion for a more complete taxonomy:

p    microphysics
P    macrophysics (by which I mean all of science besides fundamental physics)
Q    consciousness
R    normative rationality
E    
normative ethics
C    counterfactuals
L    
mathematical / logical truths
I     indexicals
T    “that’s-all” statements

I’ve split P into big-P (macrophysics) and little-p (microphysics) to account for the disagreements about emergence and reductionism. Normativity here is broad enough to include both normative epistemic statements (e.g. “You should increase your credence in the next coin toss landing H after observing it land H one hundred times in a row”) and ethical statements. The others are fairly self-explanatory.

The most ontologically extravagant philosophical worldview would then be characterized as pPQRECLIT.

My philosophical address is pRLIT (with the addendum that I think C comes from p, and am really confused about Q). What’s yours?

Moving Naturalism Forward: Eliminating the macroscopic

Sean Carroll, one of my favorite physicists and armchair philosophers, hosted a fantastic conference on philosophical naturalism and science, and did the world a great favor by recording the whole thing and posting it online. It was a three-day long discussion on topics like the nature of reality, emergence, morality, free will, meaning, and consciousness. Here are the videos for the first two discussion sections, and the rest can be found by following Youtube links.

 

Having watched through the entire thing, I have updated a few of my beliefs, plan to rework some of my conceptual schema, and am puzzled about a few things.

A few of my reflections and take-aways:

  1. I am much more convinced than before that there is a good case to be made for compatibilism about free will.
  2. I think there is a set of interesting and challenging issues around the concept of representation and intentionality (about-ness) that I need to look into.
  3. I am more comfortable with intense reductionism claims, like “All fact about the macroscopic world are entailed by the fundamental laws of physics.”
  4. I am really interested in hearing Dan Dennett talk more about grounding morality, because what he said was starting to make a lot of sense to me.
  5. I am confused about the majority attitude in the room that there’s not any really serious reason to take an eliminativist stance about macroscopic objects.
  6. I want to find more details about the argument that Simon DeDeo was making for the undecidability of questions about the relationship between macroscopic theories and microscopic theories (!!!).
  7. There’s a good way to express the distinction between the type of design human architects engage in and the type of design that natural selection produces, which is about foresight and representations of reasons. I’m not going to say more about this, and will just refer you to the videos.
  8. There are reasons to suspect that animal intelligence and capacity to suffer are inversely correlated (that is, the more intelligent an animal, the less capacity to suffer it likely has). This really flips some of our moral judgements on their head. (You must deliver a painful electric shock to either a human or to a bird. Which one will you choose?)

Let me say a little more about number 5.

I think that questions about whether macroscopic objects like chairs or plants really REALLY exist, or whether there are really only just fermions and bosons are ultimately just questions about how we should use the word “exist.” In the language of our common sense intuitions, obviously chairs exist, and if you claim otherwise, you’re just playing complicated semantic games. I get this argument, and I don’t want to be that person that clings to bizarre philosophical theses that rest on a strange choice of definitions.

But at the same time, I see a deep problem with relying on our commonsense intuitions about the existence of the macro world. This is that as soon as we start optimizing for consistency, even a teeny tiny bit, these macroscopic concepts fall to pieces.

For example, here is a trilemma (three statements that can’t all be correct):

  1. The thing I am sitting on is a chair.
  2. If you subtract a single atom from a chair, it is still a chair.
  3. Empty space is not a chair.

These seem to me to be some of the most obvious things we could say about chairs. And yet they are subtly incoherent!

Number 1 is really shorthand for something like “there are chairs.” And the reason why the second premise is correct is that denying it requires that there be a chair such that if you remove a single atom, it is no longer a chair. I take it to be obvious that such things don’t exist. But accepting the first two requires us to admit that as we keep shedding atoms from a chair, it stays a chair, even down to the very last atom. (By the way, some philosophers do actually deny number 2. They take a stance called epistemicism, which says that concepts like “chair” and “heap” are actually precise and unambiguous, and there exists a precise point at which a chair becomes a non-chair. This is the type of thing that makes me giggle nervously when reflecting on the adequacy of philosophy as a field.)

As I’ve pointed out in the past, these kinds of arguments can be applied to basically everything in the macroscopic world. They wreak havoc on our common sense intuitions and, to my mind, demand rejection of the entire macroscopic world. And of course, they don’t apply to the microscopic world. “If X is an electron, and you change its electric charge a tiny bit, is it still an electron?” No! Electrons are physical substances with precise and well-defined properties, and if something doesn’t have these properties, it is not an electron! So the Standard Model is safe from this class of arguments.

Anyway, this is all just to make the case that upon close examination, our commonsense intuitions about the macroscopic world turn out to be subtly incoherent. What this means is that we can’t make true statements like “There are two cars in the garage”. Why? Just start removing atoms from the cars until you get to a completely empty garage. Since no single-atom change can make the relevant difference to “car-ness”, at each stage, you’ll still have two cars!

As soon as you start taking these macroscopic concepts seriously, you find yourself stuck in a ditch. This, to me, is an incredibly powerful argument for eliminativism, and I was surprised to find that arguments like these weren’t stressed at the conference. This makes me wonder if this argument is as powerful as I think.

Some social justice factoids

Starting on a brief personal note…

I’m a bit disappointed with myself for being absent from this blog for the past few weeks. In a Reddit AMA last week, my favorite blogger said that the limiting factor on his productivity is the amount of time he has in a day. This to me is an ideal that I wish I could always be at. The limiting factor on my productivity is almost always my mental capacity to avoid the infinite potential sources of short-term gratification, and to motivate myself to do the things that I get deeper and more long-lived satisfaction out of. Writing this blog is one of those things. My capacity to enforce mental discipline is pretty correlated with my overall state of mind and mood. I think you can actually probably fairly reliably track my mental health by just looking at how often I’m posting here!

I’m also disappointed because I have been thinking about a great many interesting things that deserve posts. I like the idea of using this blog as a faithful recording of my intellectual life, and having discontinuities doesn’t help with this. Much of what I’ve been thinking about over the past few weeks is related to meta-ethics, but it also goes more broadly into the nature of philosophy in general. I hope to write up some posts on these soon.

In the meantime, I’ve also been compiling some interesting factoids I’ve recently encountered related to social justice. Here they are, with sources!

Race

  • Bias against blacks in the justice system can be found in sentencing and in arrests for drug use, but not in arrest rates for violent crimes, police shootings, prosecution rates, or conviction rates. Source.
  • Juries in the Deep South were commonly all-white up until the 1986 case Batson v Kentucky (where loopholes that allowed exclusion of blacks from juries were closed). (from Just Mercy, p. 60)
  • Black Americans graduate from high school at the same rate as white Americans (92.3% vs 95.6%). Source.
    • In 1968, these numbers were 54.4% and 75%.
    • Percentage of college graduates age 25 to 29: 22.8% and 42.1%. (19.3% gap)
  • White adults who don’t graduate high school, don’t get married before having children, and don’t work full time have much greater median wealth than comparable black and Latino adults. Source.
    • Consumption habits can’t explain the wealth gap: white households spend more than black households of comparable incomes.
    • The median white single parent has 2.2 times more wealth than the median black two-parent household and 1.9 times more wealth than the median Latino two-parent household.
  • Poverty rates among African Americans have declined substantially: 34.7% in 1968 to 21.4% in 2016. Source.
    • Among whites: 10% in 1968 to 8.8% in 2016.
  • Great table showing the change in socioeconomic circumstances of blacks and whites in the US from 1968 to 2018: (Source)  
    • Most strikingly in that table… Median household wealth is 10 times higher for white Americans than black Americans (but it used to be 20 times higher).

Gender

  • There is a 7% unexplained wage gap between men and women in the US. Source.
    • Controlling for college major selection, occupational segregation, hours worked, unionization, education, race, ethnicity, age, and marital status.
  • Female leaders are evaluated slightly more negatively than equivalent male leaders (controlling for leadership style). Source.
    • The discrepancy is more pronounced for autocratic leadership styles, and vanishes for democratic leadership styles.
  • Most anthropologists hold there are no known societies that are/were unambiguously matriarchal. Source.
  • Experiments show that women value temporal flexibility relatively more than men, and men value income growth relatively more than women. This is the most powerful explanation of the wage gap. Source.
    • Right after college, wages are pretty similar between men and women, and the wage gap appears as time passes, indicating that ‘innate’ differences aren’t hugely at play (including bargaining ability and temperament).
    • 75% of the wage gap is due to differences within occupations, and only 25% across occupations.
    • Among the top-paying occupations (salary ≥ $60K), the within-occupation corrected pay gaps are biggest where there’s lots of self-employment (explained by self-employment being more demanding).
  • Symphony orchestras introduced blind auditions in the ‘90s, which served as a natural experiment that found significant gender bias against women. Source.
    • The analysis found that in a blind audition for preliminary rounds, the same woman was 9.3% more likely to be hired (from 19.3% to 28.6%), and the same man is 2.3% less likely to be hired.
    • For final rounds, the same woman was 14.8% more likely to be hired in a blind audition (from 8.7% to 23.5%).
    • Introduction of blind auditions also caused an explosion of female auditions.
  • The rate of false reporting for sexual assault is in the range of 2-8%. Source.
  • Estimates of the prevalence rate of campus sexual assault in the US vary hugely, from .61% to 27% of female students, depending on survey definitions and methodology. Source.
  • The percentage of trans men that report lifetime suicide attempts is 46%, trans women is 42%, LGB adults is 10-20%, and among the overall US population is 4.6%. Source.
    • Suicide attempt rates are lower (by about 9%) among trans women that are perceived by others as women, but are the same among trans men.

Other

  • “The IAT is a noisy, unreliable measure that correlates far too weakly with any real-world outcomes to be used to predict individuals’ behavior.” Source.
    • Many early studies on IAT as a predictor of discriminatory behavior had serious methodological problems, including falsification of data by an “overzealous undergraduate”.
    • IAT has a test-retest reliability of .55 on a scale from 0 to 1.
    • Meta-analyses of the IAT-behavior link show that race IAT scores are weak predictors of discriminatory behavior.
    • IAT tests done on fictional races that are identified as one oppressed and the other privileged show “implicit bias” against the oppressed group.
    • More noise in the data predictably biases the IAT score downwards
  • When people hear stereotyping is normal, they may do more of it. Source.
  • The “few antibias trainings that have been proven to change people’s behavior” look at bias as a habit that can be broken. The Prejudice and Intergroup relations lab at UW Madison has had promising results with these type of trainings. Source.

Some takeaways: A lot of the concerns of the social justice movement are clearly very valid and rooted in real issues of societal inequalities that have been handed down to us by previous generations. That said, however, there is a good degree of subtlety required in the analysis of race and gender issues that is missing in the mainstream social justice movement.

The oft-cited 23% gender gap is misleading to say the least, and the actual percentage due to discrimination is unclear but something less than 7%. The focus the Black Lives Matter movement puts on racially biased police shootings is unjustified, and the focus would be better placed on disparate sentencing and drug arrests. And more generally, the overall trends in racial inequality in the United States look extremely positive in virtually every dimension.

It also looks like current methods at identifying and intervening on things like implicit bias and stereotyping leave a lot to be desired. This has some serious implications for questions about actual practical solutions to issues of racism and sexism… even if we acknowledge their existence and seriousness, this does not mean that we should jump on board with any plausible-sounding diversity training program. The question of how to solve these issues is highly nontrivial and deserves a lot of careful attention.

What is integrated information?

Integrated information theory relates consciousness to degrees of integrated information within a physical system. I recently became interested in IIT and found it surprisingly hard to locate a good simple explanation of the actual mathematics of integrated information online.

Having eventually just read through all of the original papers introducing IIT, I discovered that integrated information is closely related to some of my favorite bits of mathematics, involving information theory and causal modeling.  This was exciting enough to me that I decided to write a guide to understanding integrated information. My goal in this post is to introduce a beginner to integrated information in a rigorous and (hopefully!) intuitive way.

I’ll describe it increasing levels of complexity, so that even if you eventually get lost somewhere in the post, you’ll be able to walk away having learned something. If you get to the end of this post, you should be able to sit down with a pencil and paper and calculate the amount of integrated information in simple systems, as well as how to calculate it in principle for any system.

Level 1

So first, integrated information is a measure of the degree to which the components of a system are working together to produce outputs.

A system composed of many individual parts that are not interacting with each other in any way is completely un-integrated – it has an integrated information ɸ = 0. On the other hand, a system composed entirely of parts that are tightly entangled with one another will have a high amount of integrated information, ɸ >> 0.

For example, consider a simple model of a camera sensor.

tut_sensors_grid2

This sensor is composed of many independent parts functioning completely separately. Each pixel stores a unit of information about the outside world, regardless of what its neighboring pixels are doing. If we were to somehow sever the causal connections between the two halves of the sensor, each half would still capture and store information in exactly the same way.

Now compare this to a human brain.

FLARE-Technique-Offers-Snapshots-of-Neuron-Activity

The nervous system is a highly entangled mesh of neurons, each interacting with many many neighbors in functionally important ways. If we tried to cut the brain in half, severing all the causal connections between the two sides, we would get an enormous change in brain functioning.

Makes sense? Okay, on to level 2.

Level 2

So, integrated information has to do with the degree to which the components of a system are working together to produce outputs. Let’s delve a little deeper.

We just said that we can tell that the brain is integrating lots of information, because the functioning would be drastically disrupted if you cut it in half. A keen reader might have realized that the degree to which the functioning is disrupted will depend a lot on how you cut it in half.

For instance, cut off the front half of somebody’s brain, and you will end up with total dysfunction. But you can entirely remove somebody’s cerebellum (~50% of the brain’s neurons), and end up with a person that has difficulty with coordination and is a slow learner, but is otherwise a pretty ordinary person.

Human head, MRI and 3D CT scans

What this is really telling us is that different parts of the brain are integrating information differently. So how do we quantify the total integration of information of the brain? Which cut do we choose when evaluating the decrease in functioning?

Simple: We look at every possible way of partitioning the brain into two parts. For each one, we see how much the brain’s functioning is affected. Then we locate the minimum information partition, that is, the partition that results in the smallest change in brain functioning. The change in functioning that results from this particular partition is the integrated information!

Okay. Now, what exactly do we mean by “changes to the system’s functioning”? How do we measure this?

Answer: The functionality of a system is defined by the way in which the current state of the system constrains the past and future states of the system.

To make full technical sense of this, we have to dive a little deeper.

Level 3

How many possible states are there of a Connect Four board?

(I promise this is relevant)

The board is 6 by 7, and each spot can be either a red piece, a black piece, or empty.

Screen Shot 2018-04-20 at 1.03.04 AM

So a simple upper bound on the number of total possible board states is 342 (of course, the actual number of possible states will be much lower than this, since some positions are impossible to get into).

Now, consider what you know about the possible past and future states of the board if the board state is currently…

Screen Shot 2018-04-20 at 1.03.33 AM

Clearly there’s only one possible past state:

Screen Shot 2018-04-20 at 1.03.04 AM

And there are seven possible future states:

What this tells us is that the information about the current state of the board constrains the possible past and future states, selecting exactly one possible board out of the 342 possibilities for the past, and seven out of 342 possibilities for the future.

More generally, for any given system S we have a probability distribution over past and future states, given that the current state is X.

System

Pfuture(X, S) = Pr( Future state of S | Present state of S is X )
Ppast(X, S) = Pr( Past state of S | Present state of S is X )

For any partition of the system into two components, S1 and S2, we can consider the future and past distributions given that the states of the components are, respectively, X1 and X2, where X = (X1, X2).

System

Pfuture(X, S1, S2) = Pr( Future state of S1 | Present state of S1 is X1 )・Pr( Future state of S2 | Present state of S2 is X2 )
Ppast(X, S1, S2) = Pr( Past state of S1 | Present state of S1 is X1 )・Pr( Past state of S2 | Present state of S2 is X2 )

Now, we just need to compare our distributions before the partition to our distributions after the partition. For this we need some type of distance function D that assesses how far apart two probability distributions are. Then we define the cause information and the effect information for the partition (S1, S2).

Cause information = D( Ppast(X, S), Ppast(X, S1, S2) )
Effect information = D( Pfuture(X, S), Pfuture(X, S1, S2) )

In short, the cause information is how much the distribution over past states changes when you partition off your system into two separate systems And the future information is the change in the distribution over future states when you partition the system.

The cause-effect information CEI is then defined as the minimum of the cause information CI and effect information EI.

CEI = min{ CI, EI }

We’ve almost made it all the way to our full definition of ɸ! Our last step is to calculate the CEI for every possible partition of S into two pieces, and then select the partition that minimizes CEI (the minimum information partition MIP).

The integrated information is just the cause effect information of the minimum information partition!

ɸ = CEI(MIP)

Level 4

We’ve now semi-rigorously defined ɸ. But to really get a sense of how to calculate ɸ, we need to delve into causal diagrams. At this point, I’m going to assume familiarity with causal modeling. The basics are covered in a series of posts I wrote starting here.

Here’s a simple example system:

XOR AND.png

This diagram tells us that the system is composed of two variables, A and B. Each of these variables can take on the values 0 and 1. The system follows the following simple update rule:

A(t + 1) = A(t) XOR B(t)
B(t + 1) = A(t) AND B(t)

We can redraw this as a causal diagram from A and B at time 0 to A and B at time 1:

Causal Diagram

What this amounts to is the following system evolution rule:

    ABt → ABt+1
00        00
01       10
10       10
11       01

Now, suppose that we know that the system is currently in the state AB = 00. What does this tell us about the future and past states of the system?

Well, since the system evolution is deterministic, we can say with certainty that the next state of the system will be 00. And since there’s only one way to end up in the state 00, we know that the past state of the system 00.

We can plot the probability distributions over the past and future distributions as follows:

Probabilities Full System

This is not too interesting a distribution… no information is lost or gained going into the past or future. Now we partition the system:

XOR AND Cut

The causal diagram, when cut, looks like:

Causal Diagram Cut

Why do we have the two “noise” variables? Well, both A and B take two variables as inputs. Since one of these causal inputs has been cut off, we replace it with a random variable that’s equally likely to be a 0 or a 1. This procedure is called “noising” the causal connections across the partition.

According to this diagram, we now have two independent distributions over the two parts of the system, A and B. In addition, to know the total future state of a system, we do the following:

P(A1, B1 | A0, B0) = P(A1 | A0) P(B1 | B0)

We can compute the two distributions P(A1 | A0) and P(B1 | B0) straightforwardly, by looking at how each variable evolves in our new causal diagram.

A0 = 0 ⇒ A1 = 0, 1 (½ probability each)
B0 = 0 ⇒ B1 = 0

A0 = 0 ⇒ A-1 = 0, 1 (½ probability each)
B0 = 0 ⇒ B-1 = 0, 1 (probabilities ⅔ and ⅓)

This implies the following probability distribution for the partitioned system:

Partitioned System

I recommend you go through and calculate this for yourself. Everything follows from the updating rules that define the system and the noise assumption.

Good! Now we have two distributions, one for the full system and one for the partitioned system. How do we measure the difference between these distributions?

There are a few possible measures we could use. My favorite of these is the Kullback-Leibler divergence DKL. Technically, this metric is only used in IIT 2.0, not IIT 3.0 (which uses the earth-mover’s distance). I prefer DKL, as it has a nice interpretation as the amount of information lost when the system is partitioned. I have a post describing DKL here.

Here’s the definition of DKL:

DKL(P, Q) = ∑ Pi log(Pi / Qi)

We can use this quantity to calculate the cause information and the effect information:

Cause information = log(3) ≈ 1.6
Effect information = log(2) = 1

These values tell us that our partition destroys about .6 more bits of information about the past than it does the future. For the purpose of integrated information, we only care about the smaller of these two (for reasons that I don’t find entirely convincing).

Cause-effect information = min{ 1, 1.6 } = 1

Now, we’ve calculated the cause-effect information for this particular partition. And since our system has only two variables, this is the only possible partition.

The integrated information is the cause-effect information of the minimum information partition. Since our system only has two components, the partition we’ve examined is the only possible partition, meaning that it must be the minimum information partition. And thus, we’ve calculated ɸ for our system!

ɸ = 1

Level 5

Let’s now define ɸ in full generality.

Our system S consists of a vector of N variables X = (X1, X2, X3, …, XN), each an element in some space 𝒳. Our system also has an updating rule, which is a function f: 𝒳N → 𝒳N. In our previous example, 𝒳 = {0, 1}, N = 2, and f(x, y) = (x XOR y, x AND y).

More generally, our updating rule f can map X to a probability distribution p:  𝒳N → . We’ll denote P(Xt+1 | Xt) as the distribution over the possible future states, given the current state. P is defined by our updating rule: P(Xt+1 | Xt) = f(Xt). The distribution over possible past states will be denoted P(Xt-1 | Xt). We’ll obtain this using Bayes’ rule: P(Xt-1 | Xt) = P(Xt | Xt-1) P(Xt-1) / P(Xt) = f(Xt-1) P(Xt-1) / P(Xt).

A partition of the system is a subset of {1, 2, 3, …, N}, which we’ll label A. We define B = {1, 2, 3, …, N} \ A. Now we can define XA = ( X)a∈A, and XB = ( X)b∈B. Loosely speaking, we can say that X = (XA, XB), i.e. that the total state is just the combination of the two partitions A and B.

We now define the distributions over future and past states in our partitioned system:

Q(Xt+1 | Xt) = P(XA, t+1 | XA, t) P(XB, t+1 | XB, t)
Q(Xt-1 | Xt) = P(XA, t-1 | XA, t) P(XB, t-1 | XB, t).

The effect information EI of the partition defined by A is the distance between P(Xt+1 | Xt) and Q(Xt+1 | Xt), and the cause information CI is defined similarly. The cause-effect information is defined as the minimum of these two.

CI(f, A, Xt) = D( P(Xt-1 | Xt), Q(Xt-1 | Xt) )
EI(f, A, Xt) = D( P(Xt+1 | Xt), Q(Xt+1 | Xt) )

CEI(f, A, Xt) = min{ CI(f, A, Xt), EI(f, A, Xt) }

And finally, we define the minimum information partition (MIP) and the integrated information:

MIP = argminA CEI(f, A, Xt)
ɸ(f, Xt) = minA CEI(f, A, Xt)
= CEI(f, MIP, Xt)

And we’re done!

Notice that our final result is a function of f (the updating function) as well as the current state of the system. What this means is that the integrated information of a system can change from moment to moment, even if the organization of the system remains the same.

By itself, this is not enough for the purposes of integrated information theory. Integrated information theory uses ɸ to define gradations of consciousness of systems, but the relationship between ɸ and consciousness isn’t exactly one-to-on (briefly, consciousness resides in non-overlapping local maxima of integrated information).

But this post is really meant to just be about integrated information, and the connections to the theory of consciousness are actually less interesting to me. So for now I’ll stop here! 🙂

How to sort a list of numbers

It’s funny how some of the simplest questions you can think of actually have interesting and nontrivial answers.

Take this question, for instance. If I hand you a list of N numbers, how can you quickly sort this list from smallest to largest? Suppose that you are only capable of sorting two numbers at a time.

You might immediately think of some obvious ways to solve this problem. For instance, you could just start at the beginning of the list and repeatedly traverse it, swapping numbers if they’re in the wrong order. This will guarantee that each pass-through, the largest number out of those not yet sorted will “bubble up” to the top. This algorithm is called Bubble Sort.

The problem is that for a sizable list, this will take a long long time. On average, for a list of size N, Bubble Sort takes O(N^2) steps to finish sorting. Can you think of a faster algorithm?

It turns out that we can go much faster with some clever algorithms – from O(N^2) to O(N logN). If you have a little time to burn, here are some great basic videos describing and visualizing different sorting techniques:

Comparisons of Merge, Quick, Heap, Bubble and Insertion Sort

Visualizations

Beautiful visualization of Bubble, Shell, and Quick Sort

(I’d be interested to know why the visualization of Bubble Sort in the final frame gave a curve of unsorted values that looked roughly logarithmic)

Visualization of 9 sorting algorithms

Auditory presentation of 15 sorting algorithms

Bonus clip…

Cool proof of the equivalence of Selection Sort and Insertion Sort

Searching a list

By the way, how about if you have a sorted list and want to find a particular value in it? If we’re being dumb, we could just ignore the fact that our list came pre-sorted and search through every element on the list in order. We’d find our value in O(N). We can go much faster by just starting in the middle of our list, looking at the size of the middle value to determine which half of the list to look at next, and then repeating with the appropriate half of the list. This is called binary search and takes O(logN), a massive speedup.

Now, let’s look back at the problem of finding a particular value in an unsorted list. Can we think of any techniques to accomplish this more quickly than O(N)?

No. Try as you might, as long as the list has no structure for you to exploit, your optimal performance will be limited to O(N).

Or so everybody thought until quantum mechanics showed up and blew our minds.

It turns out that a quantum computer would be able to solve this exact problem – searching through an unsorted list – in only O(√N) steps! 

This should seem impossible to you. If a friend hands you a random jumbled list of 10,000 names and asks you to locate one particular name on the list, you’re going to end up looking through 5,000 names on average. There’s no clever trick you can use to speed up the process. Except that quantum mechanics says you’re wrong! In fact, if you were a quantum computer, you’d only have to search through ~100 names to find your name.

This quantum algorithm is called Grover’s algorithm, and I want to write up a future post about it. But that’s not even the coolest part! It turns out that if a non-local hidden variable interpretation of quantum mechanics is correct, then a quantum computer could search an N-item database in O(∛N)! You can imagine the triumphal feeling of the hidden variables people if one day they managed to build a quantum computer that simultaneously proved them right and broke the record for search algorithm speed!

Defining racism

How would you define racism?

I’ve been thinking about this lately in light of some of the scandal around research into race and IQ. It’s a harder question than I initially thought; many of the definitions that pop to mind end up being either too strong or too weak. The term also functions differently in different contexts (e.g. personal racism, institutional racism, racist policies). In this post, I’m specifically talking about personal racism – that term we use to refer to the beliefs and attitudes of those like Nazis or Ku Klux Klan members (at the extreme end).

I’m going to walk through a few possible definitions. This will be fairly stream-of-consciousness, so I apologize if it’s not incredibly profound or well-structured.

Definition 1 Racism is the belief in the existence of inherent differences between the races.

‘Inherent’ is important, because we don’t want to say that somebody is racist for acknowledging differences that can ultimately be traced back to causes like societal oppression. The problem with this definition is that, well, there are inherent differences between the races.

The Chinese are significantly shorter than the Dutch. Raising a Chinese person in a Dutch household won’t do much to equalize this difference. What’s important, it seems, is not the belief in the existence of inherent differences, but instead the belief in the existence of inherent inferiorities and superiorities. So let’s try again.

Definition 2 Racism is the belief in the existence of inherent racial differences that are normatively significant.

This is pretty much the dictionary definition of the term “racism”. While it’s better, there are still some serious problems. Let’s say that somebody discovered that the Slavs are more inherently prone to violence than, say, Arabs. Suppose that somebody ran across this fact, and that this person also held the ethical view that violent tendencies are normatively important. That is, they think that peaceful people are ethically superior to violent people.

If they combine this factual belief with this seemingly reasonable normative belief, they’ll end up being branded as a racist, by our second definition. This is clearly undesirable… given that the word ‘racism’ is highly normatively loaded, we don’t want it to be the case that somebody is racist for believing true things. In other words, we probably don’t want our definition of racism to ever allow it to be the right attitude to take, or even a reasonable attitude to take.

Maybe the missing step is the generalization of attitudes about Slavs and Arabs to individuals. This is a sentiment that I’ve heard fairly often… racism is about applying generalizations about groups to individuals (for instance, racial profiling). Let’s formalize this:

Definition 3 Racism is about forming normative judgments about individuals’ characteristics on the basis of beliefs about normative group-level differences.

This sounds nice and all, but… you know what another term for “applying facts about groups to individuals” is? Good statistical reasoning.

If you live in a town composed of two distinct populations, the Hebbeberans and the Klabaskians, and you know that Klabaskians are on average twenty times more likely than Hebbeberans to be fatally allergic to cod, then you should be more cautious with serving your extra special cod sandwich to a Klabaskian friend than to a Hebbeberan.

Facts about populations do give you evidence about individuals within those populations, and the mere acknowledgement of this evidence is not racist, for the same reason that rationality is not racist.

So if we don’t want to call rationality racist, then maybe our way out of this is to identify racism with irrationality.

Definition 4 Racism is the holding of irrational beliefs about normative racial differences.

Say you meet somebody from Malawi (a region with an extremely low average IQ). Your first rational instinct might be to not expect too much from them in the way of cognitive abilities. But now you learn that they’re a theoretical physicist who’s recently been nominated for a Nobel prize for their work in quantum information theory. If the average IQ of Malawians is still factoring in at all to your belief about this person’s intelligence, then you’re being racist.

I like this definition a lot better than our previous ones. It combines the belief in racial superiority with irrationality. On the other hand, it has problems as well. One major issue is that there are plenty of cases of benign irrationality, where somebody is just a bad statistical reasoner, but not motivated by any racial hatred. Maybe they over-updated on some piece of information, because they failed to take into account an important base-rate.

Well, the base-rate fallacy is one of the most common cognitive biases out there. Surely this isn’t enough to make them a racist? What we want is to capture the non-benign brand of irrational normative beliefs about race – those that are motivated by hatred or prejudice.

Definition 5 Racism is the holding of irrational normative beliefs about racial differences, motivated by racial hatred or prejudice.

I think this does the best at avoiding making the category too large, but it may be too strong and keep out some plausible cases of racism. I’d like to hear suggestions for improvements on this definition, but for now I’ll leave it there. One potential take-away is that the word ‘racism’ is a nasty combination of highly negatively charged and ambiguous, and that such words are best treated with caution, especially when applied them to edge cases.

Utter confusion about consciousness

I’m starting to get a sense of why people like David Chalmers and Daniel Dennett call consciousness the most mysterious thing known to humans. I’m currently just really confused, and think that pretty much every position available with respect to consciousness is deeply unsatisfactory. In this post, I’ll just walk through my recent thinking.

Against physicalism

In a previous post, I imagined a scientist from the future who told you they had a perfected theory of consciousness, and asked how we could ask for evidence confirming this. This theory of consciousness could presumably be thought of as a complete mapping from physical states to conscious states – a set of psychophysical laws. Questions about the nature of consciousness are then questions about the nature of these laws. Are they ultimately the same kind of laws as chemical laws (derivable in principle from the underlying physics)? Or are they logically distinct laws that must be separately listed on the catalogue of the fundamental facts about the universe?

I take physicalism to be the stance that answers ‘yes’ to the first question and ‘no’ to the second. Dualism and epiphenomenalism answer ‘no’ to the first and ‘yes’ to the second, and are distinguished by the character of the causal relationships between the physical and the conscious entailed by the psychophysical laws.

So, is physicalism right? Imagining that we had a perfect mapping from physical states to conscious states, would this mapping be in principle derivable from the Schrodinger equation? I think the answer to this has to be no; whatever the psychophysical laws are, they are not going to be in principle derivable from physics.

To see why, let’s examine what it looks like when we derive macroscopic laws from microscopic laws. Luckily, we have a few case studies of successful reduction. For instance, you can start with just the Schrodinger equation and derive the structure of the periodic table. In other words, the structure and functioning of atoms and molecules naturally pops out when you solve the equation for systems of many particles.

You can extrapolate this further to larger scale systems. When we solve the Schrodinger equation for large systems of biomolecules, we get things like enzymes and cell membranes and RNA, and all of the structure and functioning corresponding to our laws of biology. And extending this further, we should expect that all of our behavior and talk about consciousness will be ultimately fully accounted for in terms of purely physical facts about the structure of our brain.

The problem is that consciousness is something more than just the words we say when talking about consciousness. While it’s correlated in very particular ways with our behavior (the structure and functioning of our bodies), it is by its very nature logically distinct from these. You can tell me all about the structure and functioning of a physical system, but the question of whether or not it is conscious is a further fact that is not logically entailed. The phrase LOGICALLY entailed is very important here – it may be that as a matter of fact, it is a contingent truth of our universe that conscious facts always correspond to specific physical facts. But this is certainly not a relationship of logical entailment, in the sense that the periodic table is logically entailed by quantum mechanics.

In summary, it looks like we have a problem on our hands if we want to try to derive facts about consciousness from facts about fundamental physics. Namely, the types of things we can derive from something like the Schrodinger equation are facts about complex macroscopic structure and functioning. This is all well and good for deriving chemistry or solid-state physics from quantum mechanics, as these fields are just collections of facts about structure and functioning. But consciousness is an intrinsic property that is logically distinct from properties like macroscopic structure and functioning. You simply cannot expect to start with the Schrodinger equation and naturally arrive at statements like “X is experiencing red” or “Y is feeling sad”, since these are not purely behavioral statements.

Here’s a concise rephrasing of the argument I’ve made, in terms of a trilemma. Out of the following three postulates, you cannot consistently accept all three:

  1. There are facts about consciousness.
  2. Facts about consciousness are not logically entailed by the Schrodinger equation (substitute in whatever the fundamental laws of physics end up being).
  3. Facts about consciousness are fundamentally facts about physics.

Denying (1) makes you an eliminativist. Presumably this is out of the question; consciousness is the only thing in the universe that we can know with certainty exists, as it is the only thing that we have direct first-person access to. Indeed, all the rest of our knowledge comes to us by means of our conscious experience, making it in some sense the root of all of our knowledge. The only charitable interpretations I have of eliminativism involve semantic arguments subtly redefining what we mean by “consciousness” away from “that thing which we all know exists from first-hand experience” to something whose existence can actually be cast doubt on.

Denying (2) seems really implausible to me for the considerations given above.

So denying (3) looks like our only way out.

Okay, so let’s suppose physicalism is wrong. This is already super important. If we accept this argument, then we have a worldview in which consciousness is of fundamental importance to the nature of reality. The list of fundamental facts about the universe will be (1) the laws of physics and (2) the laws of consciousness. This is really surprising for anybody like me that professes a secular worldview that places human beings far from the center of importance in the universe.

But “what about naturalism?” is not the only objection to this position. There’s a much more powerful argument.

Against non-physicalism

Suppose we now think that the fundamental facts about the universe fall into two categories: P (the fundamental laws of physics, plus the initial conditions of the universe) and Q (the facts about consciousness). We’ve already denied that P = Q or that there is a logical entailment relationship from P to Q.

Now we can ask about the causal nature of the psychophysical laws. Does P cause Q? Does Q cause P? Does the causation go both ways?

First, conditional on the falsity of physicalism, we can quickly rule out theories that claim that Q causes P (i.e. dualist theories). This is the old Cartesian picture that is unsatisfactory exactly because of the strength of the physical laws we’ve discovered. In short, physics appears to be causally complete. If you fix the structure and functioning on the microscopic level, then you fix the structure and functioning on the macroscopic level. In the language of philosophy, macroscopic physical facts supervene upon microscopic physical facts.

But now we have a problem. If all of our behavior and functioning is fully causally accounted for by physical facts, then what is there for Q (consciousness) to play a causal role in? Precisely nothing!

We can phrase this in the following trilemma (again, all three of these cannot be simultaneously true):

  1. Physicalism is false.
  2. Physics is causally closed.
  3. Consciousness has a causal influence on the physical world.

Okay, so now we have ruled out any theories in which Q causes P. But now we reach a new and even more damning conclusion. Namely, if facts about consciousness have literally no causal influence on any aspect of the physical world, then they have no causal influence, in particular, on your thoughts and beliefs about your consciousness.

Stop to consider for a moment the implications of this. We take for granted that we are able to form accurate beliefs about our own conscious experiences. When we are experiencing red, we are able to reliably produce accurate beliefs of the form “I am experiencing red.” But if the causal relationship goes from P to Q, then this becomes extremely hard to account for.

What would we expect to happen if our self-reports of our consciousness fell out of line with our actual consciousness? Suppose that you suddenly noticed yourself verbalizing “I’m really having a great time!” when you actually felt like you were in deep pain and discomfort. Presumably the immediate response you would have would be confusion, dismay, and horror. But wait! All of these experiences must be encoded in your brain state! In other words, to experience horror at the misalignment of your reports about your consciousness and your actual consciousness, it would have to be the case that your physical brain state would change in a particular way. And a necessary component of the explanation for this change would be the actual state of your consciousness!

This really gets to the heart of the weirdness of epiphenomenalism (the view that P causes Q, but Q doesn’t causally influence P). If you’re an epiphenomenalist, then all of your beliefs and speculations about consciousness are formed exactly as they would be if your conscious state were totally different. The exact same physical state of you thinking “Hey, this coffee cake tastes delicious!” would arise even if the coffee cake actually tasted like absolute shit.

To be sure, you would still “know” on the inside, in the realm of your direct first-person experience that there was a horrible mismatch occurring between your beliefs about consciousness and your actual conscious experience. But you couldn’t know about it in any way that could be traced to any brain state of yours. So you couldn’t form beliefs about it, feel shocked or horrified about it, have any emotional reactions to it, etc. And if every part of your consciousness is traceable back to your brain state, then your conscious state must be in some sense “blind” to the difference between your conscious state and your beliefs about your conscious state.

This is completely absurd. On the epiphenomenalist view, any correlation between the beliefs you form about consciousness and the actual facts about your conscious state couldn’t possibly be explained by the actual facts about your consciousness. So they must be purely coincidental.

In other words, the following two statements cannot be simultaneously accepted:

  • Consciousness does not causally influence our behavior.
  • Our beliefs about our conscious states are more accurate than random guessing.

So where does that leave us?

It leaves us in a very uncomfortable place. First of all, we should deny physicalism. But the denial of physicalism leaves us with two choices: either Q causes P or it does not.

We should deny the first, because otherwise we are accepting the causal incompleteness of physics.

And we should deny the second, because it leads us to conclude that essentially all of our beliefs about our conscious experiences are almost certainly wrong, undermining all of our reasoning that led us here in the first place.

So here’s a summary of this entire post so far. It appears that the following four statements cannot all be simultaneously true. You must pick at least one to reject.

  1. There are facts about consciousness.
  2. Facts about consciousness are not logically entailed by the Schrodinger equation (substitute in whatever the fundamental laws of physics end up being).
  3. Physics is causally closed.
  4. Our beliefs about our conscious states are more accurate than random guessing.

Eliminativists deny (1).

Physicalists deny (2).

Dualists deny (3).

And epiphenomenalists must deny (4).

I find that the easiest to deny of these four is (2). This makes me a physicalist, but not because I think that physicalism is such a great philosophical position that everybody should hold. I’m a physicalist because it seems like the least horrible of all the horrible positions available to me.

Counters and counters to those counters

A response that I would have once given when confronted by these issues would be along the lines of: “Look, clearly consciousness is just a super confusing topic. Most likely, we’re just thinking wrong about the whole issue and shouldn’t be taking the notion of consciousness so seriously.”

Part of this is right. Namely, consciousness is a super confusing topic. But it’s important to clearly delineate between which parts of consciousness are confusing and which parts are not. I’m super confused about how to make sense of the existence of consciousness, how to fit consciousness into my model of reality, and how to formalize my intuitions about the nature of consciousness. But I’m definitively not confused about the existence of consciousness itself. Clearly consciousness, in the sense of direct first-person experience, exists, and is a property that I have. The confusion arises when we try to interpret this phenomenon.

In addition, “X is super confusing” might be a true statement and a useful acknowledgment, but it doesn’t necessarily push us in one direction over another when considering alternative viewpoints on X. So “X is super confusing” isn’t evidence for “We should be eliminativists about X” over “We should be realists about X.” All it does is suggest that something about our model of reality needs fixing, without pointing to which particular component it is that needs fixing.

One more type of argument that I’ve heard (and maybe made in the past, to my shame) is a “scientific optimism” style of argument. It goes:

Look, science is always confronted with seemingly unsolvable mysteries.  Brilliant scientists in each generation throw their hands up in bewilderment and proclaim the eternal unsolvability of the deep mystery of their time. But then a few generations later, scientists end up finding a solution, and putting to shame all those past scientists that doubted the power of their art.

Consciousness is just this generation’s “great mystery.” Those that proclaim that science can never explain the conscious in terms of the physical are wrong, just as Lord Kelvin was wrong when he affirmed that the behavior of living organisms cannot be explained in terms of purely physical forces, and required a mysterious extra element (the ‘vital principle’ as he termed it).

I think that as a general heuristic, “Science is super powerful and we should be cautious before proclaiming the existence of specific limits on the potential of scientific inquiry” is pretty damn good.

But at the same time, I think that there are genuinely good reasons, reasons that science skeptics in the past didn’t have, for affirming the uniqueness of consciousness in this regard.

Lord Kelvin was claiming that there were physical behaviors that could not be explained by appeal to purely physical forces. This is a very different claim from the claim that there are phenomenon that are not purely logically reducible to structural properties of matter, that cannot be explained by purely physical forces. This, it seems to me, is extremely significant, and gets straight to the crux of the central mystery of consciousness.