Getting evidence for a theory of consciousness

I’ve been reading about the integrated information theory of consciousness lately, and wondering about the following question. In general, what are the sources of evidence we have for a theory of consciousness?

One way to think about this is to imagine yourself teleported hundreds of years into the future and talking to a scientist in this future world. This scientist tells you that in his time, consciousness is fully understood. What sort of experiments would you expect to be able to run to verify for yourself that the future’s theory of consciousness really is sufficient?

One thing you could do is point to a bunch of different physical systems, ask the scientist what his theory of consciousness says about them, and compare them to your intuitions. So, for instance, does the theory say that you are conscious? What about humans in general? What about people in deep sleep? How about dogs? Chickens? Frogs? Insects? Bacterium? Are Siri-style computer programs conscious? What about a rock? And so on.

The obvious problem with this is that it assumes the validity of your intuitions about consciousness. Sure it seems obvious that a rock is not conscious, that humans generally are, and that dogs are conscious, but less so than humans, but how do we know that these are trustworthy intuitions?

I think the validity of these intuitions is necessarily grounded in our phenomenology and our observations of how it correlates with our physical substance. So, for instance, I notice that when I fall asleep, my consciousness fades in and out. On the other hand, when I wiggle my big toe, this has an effect on the character of my conscious experience, but doesn’t shut it off entirely. This tells me that something about what happens to my body when I fall asleep is relevant to the maintenance of my consciousness, while the angle of my big toe is not.

In general, we make many observations like these and piece together a general theory of how consciousness relates to the physical world, not just in terms of the existence of consciousness, but also in terms of what specific conscious experiences we expect for a given change to our physical system. It tells us, for instance, that receiving a knock on the head or drinking too much alcohol is sometimes sufficient to temporarily suspend consciousness, while breaking a finger or cutting your hair is not.

Now, since we are able to intervene on our physical body at will and observe the results, our model is a causal model. An implication of this is that it should be able to handle counterfactuals. So, for instance, it can give us an answer to the question “Would I still be conscious if I cut my hair off, changed my skin color, shrunk several inches in height, and got a smaller nose?” This answer is presumably yes, because our theory distinguishes between physical features that are relevant to the existence of consciousness and those that are not.

Extending this further, we can ask if we would still be conscious if we gradually morphed into another human being, with a different brain and body. Again, the answer would appear to be yes, as long as nothing essential to the existence of consciousness is severed along the way. But now we are in a position to be able to make inferences about the existence of consciousness in bodies outside our own! For if I think that I would be conscious if I slowly morphed into my boyfriend, then I should also believe that my boyfriend is conscious himself. I could deny this by denying that the same physical states give rise to the same conscious states, but while this is logically possible, it seems quite implausible.

This gives rational grounds for our belief in the existence of consciousness in other humans, and allows us justified access to all of the work in neuroscience analyzing the connection between the brain and consciousness. It also allows us to have a baseline level of trust in the self-reports of other people about their conscious experiences, given the observation that we are generally reliable reporters of our conscious experience.

Bringing this back to our scientist from the future, I can think of some much more convincing tests I would do than the ‘tests of intuition’ that we did at first. Namely, suppose that the scientist was able to take any description of an experience, translate that into a brain state, and then stimulate your brain in such a way as to produce that experience for you. So over and over you submit requests – “Give me a new color experience that I’ve never had before, but that feels vaguely pinkish and bluish, with a high pitch whine in the background”, “Produce in me an emotional state of exaltation, along with the sensation of warm wind rushing through my hair and a feeling of motion”, etc – and over and over the scientist is able to excellently match your request. (Also, wow imagine how damn cool this would be if we could actually do this.)

You can also run the inverse test: you tell the scientist the details of an experience you are having while your brain is being scanned (in such a way that the scientist cannot see it). Then the scientist runs some calculations using their theory of consciousness and makes some predictions about what they’ll see on the brain scan. Now you check the brain scan to see if their predictions have come true.

To me, repeated success in experiments of this kind would be supremely convincing. If a scientist of the future was able to produce at will any experience I asked for (presuming my requests weren’t too far out as to be physical impossible), and was able to accurately translate facts about my consciousness into facts about my brain, and could demonstrate this over and over again, I would be convinced that this scientist really does have a working theory of consciousness.

And note that since this is all rooted in phenomenology, it’s entirely uncoupled from our intuitive convictions about consciousness! It could turn out that the exact framework the scientist is using to calculate the connections between my physical body and my consciousness end up necessarily entailing that rocks are conscious and that dolphins are not. And if the framework’s predictive success had been demonstrated with sufficient robustness before, I would just have to accept this conclusion as unintuitive but true. (Of course, it would be really hard to imagine how any good theory of consciousness could end up coming to this conclusion, but that’s beside the point.)

So one powerful source of evidence we have for testing a theory of consciousness is the correlations between our physical substance and our phenomenology. Is that all, or are there other sources of evidence tout there?

We can straightforwardly adopt some principles from the philosophy of science, such as the importance of simplicity and avoiding overfitting in formulating our theories. So for instance, one theory of consciousness might just be an exhaustive list of every physical state of the brain and what conscious experience this corresponds to. In other words, we could imagine a theory in which all of the basic phenomenological facts of consciousness are taken as individual independent axioms. While this theory will be fantastically accurate, it will be totally worthless to us, and we’d have no reason to trust its predictive validity.

So far, we really just have three criteria for evidence:

  1. Correlations between phenomenology and physics
  2. Simplicity
  3. Avoiding overfitting

As far as I’m concerned, this is all that I’m really comfortable with counting as valid evidence. But these are very much not the only sources of evidence that get referenced in the philosophical literature. There are a lot of arguments that get thrown around concerning the nature of consciousness that I find really hard to classify neatly, although often these arguments feel very intuitively appealing. For instance, one of my favorite arguments for functionalism is David Chalmers’ ‘Fading Qualia’ argument. It goes something like this:

Imagine that scientists of the future are able to produce silicon chips that are functionally identical to neurons and can replicate all of their relevant biological activity. Now suppose that you undergo an operation in which gradually, every single part of your nervous system is substituted out for silicon. If the biological substrate implementing the functional relationships is essential to consciousness, then by the end of this procedure you will no longer be conscious.

But now we ask: when did the consciousness fade out? Was it a sudden or a gradual process? Both seem deeply implausible. Firstly, we shouldn’t expect a sudden drop-out of consciousness from the removal of a single neuron or cluster of neurons, as this would be a highly unusual level of discreteness. This would also imply the ability to switch on and off the entirety of your consciousness with seemingly insignificant changes to the biological structure of your nervous system.

And secondly, if it is a gradual process, then this implies the existence of “pseudo-conscious” states in the middle of the procedure, where your experiences are markedly distinct from those of the original being but you are pretty much always wrong about your own experiences. Why? Well, the functional relationships have stayed the same! So your beliefs about your conscious states, the memories you form, the emotional reactions you have, will all be exactly as if there has been no change to your conscious states. This seems totally bizarre and, in Chalmers’ words, “we have little reason to believe that consciousness is such an ill-behaved phenomenon.”

Now, this is a fairly convincing argument to me. But I have a hard time understanding why it should be. The argument’s convincingness seems to rely on some very high-level abstract intuitions about the types of conscious experiences we imagine organisms could be having, and I can’t think of a great reason for trusting these intuitions. Maybe we could chalk it up to simplicity, and argue that the notion of consciousness entailed by substrate-dependence must be extremely unparsimonious. But even this connection is not totally clear to me.

A lot of the philosophical argumentation about consciousness feels this way to me; convincing and interesting, but hard to make sense of as genuine evidence.

One final style of argument that I’m deeply skeptical of is arguments from pure phenomenology. This is, for instance, how Giulio Tononi likes to argue for his integrated information theory of consciousness. He starts from five supposedly self-evident truths about the character of conscious experience, then attempts to infer facts about the structure of the physical systems that could produce such experiences.

I’m not a big fan of Tononi’s observations about the character of consciousness. They seem really vaguely worded and hard enough to make sense of that I have no idea if they’re true, let alone self-evident. But it is his second move that I’m deeply skeptical of. The history of philosophers trying to move from “self-evident intuitive truths” to “objective facts about reality” is pretty bad. While we might be plenty good at detailing our conscious experiences, trying to make the inferential leap to the nature of the connection between physics and consciousness is not something you can do just by looking at phenomenology.

The Scourge of Our Time

Human life must be respected and protected absolutely from the moment of conception. From the first moment of his existence, a human being must be recognized as having the rights of a person – among which is the inviolable right of every innocent being to life.

Since it must be treated from conception as a person, the embryo must be defended in its integrity, cared for, and healed, as far as possible, like any other human being.

Catechism of the Catholic Church, #2270, 2274

In this paper, Toby Ord advances a strong reductio ad absurdum of the standard pro-life position that life begins at conception. I’ve heard versions of this argument before, but hadn’t seen it laid out so clearly.

Here’s the argument:

  1. The majority (~62%) of embryos die within a few weeks of conception (mostly from failure to implant in the lining of the uterus wall). A mother of three children could be expected to also have had five spontaneous abortions.
  2. The Catholic Church promotes the premise that an embryo at conception has the same moral worth as a developed human. On this view, more than 60% of the world population dies in their first month of life, making this a more deadly condition than anything else in human history. Saving even 5% of embryos would save more lives than a cure for cancer.

  3. Given the 200 million lives per year at stake, those that think life begins at conception should be directing massive amounts of resources towards ending spontaneous abortion and see it as the Scourge of our time.

Here are two graphs of the US survival curve: first, as we ordinarily see it, and second, as the pro-lifer is obligated to see it:

Screen Shot 2018-04-05 at 2.22.12 PMScreen Shot 2018-04-05 at 2.22.22 PM

This is of course a really hard bullet for the pro-life camp to bite. If you’re like me, you see spontaneous abortions as morally neutral. Most of the time they happen before a pregnancy has been detected, leaving the mother unaware that anything even happened. It’s hard then to make a distinction between the enormous amount of spontaneous abortions naturally occurring and the comparatively minuscule number of intentional abortions.

I have previously had mixed feelings about abortion (after all, if our moral decision making ultimately comes down to trying to maximize some complicated expected value, it will likely be blind to whether is a real living being or just a “potential” living being), but this argument pretty much clinches the deal for me.

The problem with philosophy

(Epistemic status: I have a high credence that I’m going to disagree with large parts of this in the future, but it all seems right to me at present. I know that’s non-Bayesian, but it’s still true.)

Philosophy is great. Some of the clearest thinkers and most rational people I know come out of philosophy, and many of my biggest worldview-changing moments have come directly from philosophers. So why is it that so many scientists seem to feel contempt towards philosophers and condescension towards their intellectual domain? I can actually occasionally relate to the irritation, and I think I understand where some of it comes from.

Every so often, a domain of thought within philosophy breaks off from the rest of philosophy and enters the sciences. Usually when this occurs, the subfield (which had previously been stagnant and unsuccessful in its attempts to make progress) is swiftly revolutionized and most of the previous problems in the field are promptly solved.

Unfortunately, what also often happens is that the philosophers that were previously working in the field are often unaware of or ignore the change in their field, and end up wasting a lot of time and looking pretty silly. Sometimes they even explicitly challenge the scientists at the forefront of this revolution, like Henri Bergson did with Einstein after he came out with his pesky new theory of time that swept away much of the past work of philosophers in one fell swoop.

Next you get a generation of philosophy students that are taught a bunch of obsolete theories, and they are later blindsided when they encounter scientists that inform them that the problems they’re working on have been solved decades ago. And by this point the scientists have left the philosophers so far in the dust that the typical philosophy student is incapable of understanding the answers to their questions without learning a whole new area of math or something. Thus usually the philosophers just keep on their merry way, asking each other increasingly abstruse questions and working harder and harder to justify their own intellectual efforts. Meanwhile scientists move further and further beyond them, occasionally dropping in to laugh at their colleagues that are stuck back in the Middle Ages.

Part of why this happens is structural. Philosophy is the womb inside which develops the seeds of great revolutions of knowledge. It is where ideas germinate and turn from vague intuitions and hotbeds of conceptual confusion into precisely answerable questions. And once these questions are answerable, the scientists and mathematicians sweep in and make short work of them, finishing the job that philosophy started.

I think that one area in which this has happened is causality.

Statisticians now know how to model causal relationships, how to distinguish them from mere regularities, how to deal with common causes and causal pre-emption, how to assess counterfactuals and assign precise probabilities to these statements, and how to compare different causal models and determine which is most likely to be true.

(By the way, guess where I came to be aware of all of this? It wasn’t in the metaphysics class in which we spent over a month discussing the philosophy of causation. No, it was a statistician friend of mine who showed me a book by Judea Pearl and encouraged me to get up to date with modern methods of causal modeling.)

Causality as a subject has firmly and fully left the domain of philosophy. We now have a fully fleshed out framework of causal reasoning that is capable of answering all of the ancient philosophical questions and more. This is not to say that there is no more work to be done on understanding causality… just that this work is not going to be done by philosophers. It is going to be done by statisticians, computer scientists, and physicists.

Another area besides causality where I think this has happened is epistemology. Modern advances in epistemology are not coming out of the philosophy departments. They’re coming out of machine learning institutes and artificial intelligence researchers, who are working on turning the question of “how do we optimally come to justified beliefs in a posteriori matters?” into precise code-able algorithms.

I’m thinking about doing a series of posts called “X for philosophers”, in which I take an area of inquiry that has historically been the domain of philosophy, and explain how modern scientific methods have solved or are solving the central questions in this area.

For instance, here’s a brief guide to how to translate all the standard types of causal statements philosophers have debated for centuries into simple algebra problems:

Causal model

An ordered triple of exogenous variables, endogenous variables, and structural equations for each endogenous variable

Causal diagram

A directed acyclic graph representing a causal model, whose nodes represent the endogenous variables and whose edges represent the structural equations

Causal relationship

A directed edge in a causal diagram

Causal intervention

A mutilated causal diagram in which the edges between the intervened node and all its parent nodes are removed

Probability of A if B

P(A | B)

Probability of A if we intervene on B

P(A | do B) = P(AB)

Probability that A would have happened, had B happened

P(AB | -B)

Probability that B is a necessary cause of A

P(-A-B | A, B)

Probability that B is a sufficient cause of A

P(AB | -A, -B)

Right there is the guide to understanding the nature of causal relationships, and assessing the precise probabilities of causal conditional statements, counterfactual statements, and statements of necessary and sufficient causation.

To most philosophy students and professors, what I’ve written is probably chicken-scratch. But it is crucially important for them in order to not become obsolete in their causal thinking.

There’s an unhealthy tendency amongst some philosophers to, when presented with such chicken-scratch, dismiss it as not being philosophical enough and then go back to reading David Lewis’s arguments for the existence of possible worlds. It is this that, I think, is a large part of the scientist’s tendency to dismiss philosophers as outdated and intellectually behind the times. And it’s hard not to agree with them when you’ve seen both the crystal-clear beauty of formal causal modeling, and also the debates over things like how to evaluate the actual “distance” between possible worlds.

Artificial intelligence researcher extraordinaire Stuart Russell has said that he knew immediately upon reading Pearl’s book on causal modeling that it was going to change the world. Philosophy professors should either teach graph theory and Bayesian networks, or they should not make a pretense of teaching causality at all.

Argument screens off identity

In his post Argument Screens Off Authority, Eliezer Yudkowsky argues that while a person’s credentials can give you evidence as to the quality of arguments you expect to hear from them, these conclusions becomes irrelevant to the truth of their conclusion once you actually hear and understand their arguments. A true conclusion tends to have good arguments for it, and good arguments tend to become known by well-credentialed persons. But since the causal dependency from “truth” to “credentials” is indirect, you can remove the dependency by simply conditioning on the intermediate (“argument quality”).

I think this is a special case of a more general principle: Argument screens off identity. Consider the following hypothetical exchange.

Person 1: I don’t support affirmative action. Here are some arguments why.

(…)

Person 1: So those are the main reasons for my position.

Person 2: Hmm, while I don’t see anything wrong with your arguments, I think you’re forgetting that you’re not a member of <insert protected identity group>. This invalidates any arguments you make.

Person 1: Um, I don’t think that’s how that works…

Person 2: No, seriously. If you’re not black, Hispanic, Native American, or any other minority group that is affected by affirmative action, then you have no right to be talking about it. You don’t have any credibility on the issue unless you’ve lived it.

Yes, this is a bit of a straw man, but it’s not actually so far from real arguments that I’ve heard before. I don’t know of an exact name for the logical fallacy Person 2 is making here (it’s not exactly an ad hominem), but I think that it is covered nicely by the slogan that is the title of this post.

A white person who has spent lots of time carefully researching and thinking through issues of racial inequality may well have more to say on these issues than a black person who has not done so. A man who has studied the nature of sexism in our society and the social norms surrounding sexual harassment may know more than a woman who has not done so.

More to the point, when somebody presents an argument, the response “Yeah, well you aren’t a member of this specific identity group that your argument refers to, so your argument is not credible” is virtually always a distraction and hinders progress.

More formally, my claim is that in most contexts:

Pr(conclusion | argument, identity) = Pr(conclusion | argument)

While your identity might inform the types of arguments you are able to make, predisposed to make, or willing to make, the arguments themselves are the only pieces of evidence we have in evaluating your conclusion.

Facts about guns

I’ve recently come across some pretty surprising statistics regarding guns and violence, so I’ve decided to compile some of them here. I might update this if I run across more interesting things in the future.

  • Guns probably save many more lives than they end. Source (CDC and the National Research Council) and source (1995 criminology paper).
    • There are an estimated 500,000 to 3,000,000 defensive gun uses per year, and only about 300,000 violent gun crimes per year.
    • Defensive uses of guns in the US save around 162,000 lives per year (based off self-report), while overall non-suicide gun deaths only result in 11,000 deaths per year. Estimates of lives saved don’t include any military service, police work, or work as a security guard.
    • Defensive gun use reliably reduces injury rates among gun-using crime victims.

 

  • 1994 imposition of five-day waiting periods for firearms didn’t reduce the overall suicide rate. Source (paper in AMA journal).

 

  • Homicides have been on the decline for years, and guns aren’t nearly as dangerous as we think. Source (Freakonomics podcast).
    • There have been an average of 2 mass shootings and 16.5 fatalities a year from mass shootings (excluding gang shootings and armed robberies).
    • Any particular handgun in the US will kill somebody about once every 10,000 years.
    • A given swimming pool is 100 times more likely to lead to the death of a child than a particular gun is to lead to the death of a child.
    • Gun buyback programs are horribly ineffective – typically saving an estimated .0001 lives.

 

  • The “more likely to have your gun used against you” meme is super misleading; it refers to the increased chance of suicide in the home for men with guns, not intruders wielding your gun against you. Source for one of the original findings.

 

Side note: Upon reflection, I’m super suspicious of the 162,000 lives/year saved number. Obviously measuring the counterfactual “would you have died if not for X?” is hard, but the number seems impossibly large when you think about the current murder rate… it corresponds to almost an extra 50 per 100,000 where the current homicide rate is 4.9 per 100,000. The cited study looks at self-reported potential fatality, which seems quite plausibly skewed upwards (if people tend to exaggerate the lethality of their encounters).

“You don’t believe in the God you want to, and I won’t believe in the God I want to”

From my favorite book of all time:

“I’m probably just as good an atheist as you are,” she speculated boastfully. “But even I feel that we all have a great deal to be thankful for and that we shouldn’t be ashamed to show it.”

“Name one thing I’ve got to be thankful for,” Yossarian challenged her without interest.

“Well…” Lieutenant Scheisskopf’s wife mused and paused a moment to ponder dubiously. “Me.”

“Oh, come on,” he scoffed.

She arched her eyebrows in surprise. “Aren’t you thankful for me?” she asked. She frowned peevishly, her pride wounded. “I don’t have to shack up with you, you know,” she told him with cold dignity. “My husband has a whole squadron full of aviation cadets who would be only too happy to shack up with their commanding officer’s wife just for the added fillip it would give them.” Yossarian decided to change the subject. “Now you’re changing the subject,” he pointed out diplomatically. “I’ll bet I can name two things to be miserable about for every one you can name to be thankful for.”

“Be thankful you’ve got me,” she insisted.

“I am, honey. But I’m also goddam good and miserable that I can’t have Dori Duz again, too. Or the hundreds of other girls and women I’ll see and want in my short lifetime and won’t be able to go to bed with even once.”

“Be thankful you’re healthy.”

“Be bitter you’re not going to stay that way.”

“Be glad you’re even alive.”

“Be furious you’re going to die.”

“Things could be much worse,” she cried.

“They could be one hell of a lot better,” he answered heatedly.

“You’re naming only one thing,” she protested. “You said you could name two.”

“And don’t tell me God works in mysterious ways,” Yossarian continued, hurtling on over her objection. “There’s nothing so mysterious about it. He’s not working at all. He’s playing. Or else He’s forgotten all about us. That’s the kind of God you people talk about–a country bumpkin, a clumsy, bungling, brainless, conceited, uncouth hayseed. Good God, how much reverence can you have for a Supreme Being who finds it necessary to include such phenomena as phlegm and tooth decay in His divine system of creation? What in the world was running through that warped, evil, scatological mind of His when He robbed old people of the power to control their bowel movements? Why in the world did He ever create pain?”

“Pain?” Lieutenant Scheisskopf’s wife pounced upon the word victoriously. “Pain is a useful symptom. Pain is a warning to us of bodily dangers.”

“And who created the dangers?” Yossarian demanded. He laughed caustically. “Oh, He was really being charitable to us when He gave us pain! Why couldn’t He have used a doorbell instead to notify us, or one of His celestial choirs? Or a system of blue-and-red neon tubes right in the middle of each person’s forehead. Any jukebox manufacturer worth his salt could have done that. Why couldn’t He?”

“People would certainly look silly walking around with red neon tubes in the middle of their foreheads.”

“They certainly look beautiful now writhing in agony or stupefied with morphine, don’t they? What a colossal, immortal blunderer! When you consider the opportunity and power He had to really do a job, and then look at the stupid, ugly little mess He made of it instead, His sheer incompetence is almost staggering. It’s obvious He never met a payroll. Why, no self-respecting businessman would hire a bungler like Him as even a shipping clerk!” Lieutenant Scheisskopf’s wife had turned ashen in disbelief and was ogling him with alarm. “You’d better not talk that way about Him, honey,” she warned him reprovingly in a low and hostile voice. “He might punish you.”

“Isn’t He punishing me enough?” Yossarian snorted resentfully. “You know, we mustn’t let Him get away with it. Oh, no, we certainly mustn’t let Him get away scot free for all the sorrow He’s caused us. Someday I’m going to make Him pay. I know when. On the Judgment Day. Yes, That’s the day I’ll be close enough to reach out and grab that little yokel by His neck and–”

“Stop it! Stop it!” Lieutenant Scheisskopf’s wife screamed suddenly, and began beating him ineffectually about the head with both fists. “Stop it!” Yossarian ducked behind his arm for protection while she slammed away at him in feminine fury for a few seconds, and then he caught her determinedly by the wrists and forced her gently back down on the bed. “What the hell are you getting so upset about?” he asked her bewilderedly in a tone of contrite amusement. “I thought you didn’t believe in God.”

“I don’t,” she sobbed, bursting violently into tears. “But the God I don’t believe in is a good God, a just God, a merciful God. He’s not the mean and stupid God you make Him out to be.” Yossarian laughed and turned her arms loose. “Let’s have a little more religious freedom between us,” he proposed obligingly. “You don’t believe in the God you want to, and I won’t believe in the God I want to. Is that a deal?”

Joseph Heller, Catch-22

Galileo and the Schelling point improbability principle

An alternative history interaction between Galileo and his famous statistician friend

***

In the year 1609, when Galileo Galilei finished the construction of his majestic artificial eye, the first place he turned his gaze was the glowing crescent moon. He reveled in the crevices and mountains he saw, knowing that he was the first man alive to see such a sight, and his mind expanded as he saw the folly of the science of his day and wondered what else we might be wrong about.

For days he was glued to his telescope, gazing at the Heavens. He saw the planets become colorful expressive spheres and reveal tiny orbiting companions, and observed the distant supernova which Kepler had seen blinking into existence only five years prior. He discovered that Venus had phases like the Moon, that some apparently single stars revealed themselves to be binaries when magnified, and that there were dense star clusters scattered through the sky. All this he recorded in frantic enthusiastic writing, putting out sentences filled with novel discoveries nearly every time he turned his telescope in a new direction. The universe had opened itself up to him, revealing all its secrets to be uncovered by his ravenous intellect.

It took him two weeks to pull himself away from his study room for long enough to notify his friend Bertolfo Eamadin of his breakthrough. Eamadin was a renowned scholar, having pioneered at age 15 his mathematical theory of uncertainty and created the science of probability. Galileo often sought him out to discuss puzzles of chance and randomness, and this time was no exception. He had noticed a remarkable confluence of three stars that were in perfect alignment, and needed the counsel of his friend to sort out his thoughts.

Eamadin arrived at the home of Galileo half-dressed and disheveled, obviously having leapt from his bed and rushed over immediately upon receiving Galileo’s correspondence. He practically shoved Galileo out from his viewing seat and took his place, eyes glued with fascination on the sky.

Galileo allowed his friend to observe unmolested for a half-hour, listening with growing impatience to the ‘oohs’ and ‘aahs’ being emitted as the telescope swung wildly from one part of the sky to another. Finally, he interrupted.

Galileo: “Look, friend, at the pattern I have called you here to discuss.”

Galileo swiveled the telescope carefully to the position he had marked out earlier.

Eamadin: “Yes, I see it, just as you said. The three stars form a seemingly perfect line, each of the two outer ones equidistant from the central star.”

Galileo: “Now tell me, Eamadin, what are the chances of observing such a coincidence? One in a million? A billion?”

Eamadin frowned and shook his head. “It’s certainly a beautiful pattern, Galileo, but I don’t see what good a statistician like myself can do for you. What is there to be explained? With so many stars in the sky, of course you would chance upon some patterns that look pretty.”

Galileo: “Perhaps it seems only an attractive configuration of stars spewed randomly across the sky. I thought the same myself. But the symmetry seemed too perfect. I decided to carefully measure the central angle, as well as the angular distance distended by the paths from each outer star to the central one. Look.”

Galileo pulled out a sheet of paper that had been densely scribbled upon. “My calculations revealed the central angle to be precisely 180.000º, with an error of ± .003º. And similarly, I found the difference in the two angular distances to be .000º, with a margin of error of ± .002º.”

Eamadin: “Let me look at your notes.”

Galileo handed over the sheets to Eamadin. “I checked over my calculations a dozen times before writing you. I found the angular distances by approaching and retreating from this thin paper, which I placed between the three stars and me. I found the distance at which the thin paper just happened to cover both stars on one extreme simultaneously, and did the same for the two stars on the other extreme. The distance was precisely the same, leaving measurement error only for the thickness of the paper, my distance from it, and the resolution of my vision.”

Eamadin: “I see, I see. Yes, what you have found is a startlingly clear pattern. A similarity in distance and precision of angle this precise is quite unlikely to be the result of any natural phenomenon… ”

Galileo: “Exactly what I thought at first! But then I thought about the vast quantity of stars in the sky, and the vast number of ways of arranging them into groups of three, and wondered if perhaps in fact such coincidences might be expected. I tried to apply your method of uncertainty to the problem, and came to the conclusion that the chance of such a pattern having occurred through random chance is one in a thousand million! I must confess, however, that at several points in the calculation I found myself confronted with doubt about how to progress and wished for your counsel.”

Eamadin stared at Galileo’s notes, then pulled out a pad of his own and began scribbling intensely. Eventually, he spoke. “Yes, your calculations are correct. The chance of such a pattern having occurred to within the degree of measurement error you have specified by random forces is 10-9.”

Galileo: “Aha! Remarkable. So what does this mean? What strange forces have conspired to place the stars in such a pattern? And, most significantly, why?”

Eamadin: “Hold it there, Galileo. It is not reasonable to jump from the knowledge that the chance of an event is remarkably small to the conclusion that it demands a novel explanation.”

Galileo: “How so?”

Eamadin: “I’ll show you by means of a thought experiment. Suppose that we found that instead of the angle being 180.000º with an experimental error of .003º, it was 180.001º with the same error. The probability of this outcome would be the same as the outcome we found – one in a thousand million.”

Galileo: “That can’t be right. Surely it’s less likely to find a perfectly straight line than a merely nearly perfectly straight line.”

Eamadin: “While that is true, it is also true that the exact calculation you did for 180.000º ± .003º would apply for 180.001º ± .003º. And indeed, it is less likely to find the stars at this precise angle, than it is to find the stars merely near this angle. We must compare like with like, and when we do so we find that 180.000º is no more likely than any other angle!”

Galileo: “I see your reasoning, Eamadin, but you are missing something of importance. Surely there is something objectively more significant about finding an exactly straight line than about a nearly straight line, even if they have the same probability. Not all equiprobable events should be considered to be equally important. Think, for instance, of a sequence of twenty coin tosses. While it’s true that the outcome HHTHTTTTHTHHHTHHHTTH has the same probability as the outcome HHHHHHHHHHHHHHHHHHHH, the second is clearly more remarkable than the first.”

Eamadin: “But what is significance if disentangled from probability? I insist that the concept of significance only makes sense in the context of my theory of uncertainty. Significant results are those that either have a low probability or have a low conditional probability given a set of plausible hypotheses. It is this second class that we may utilize in analyzing your coin tossing example, Galileo. The two strings of tosses you mention are only significant to different degrees in that the second more naturally lends itself to a set of hypotheses in which the coin is heavily biased towards heads. In judging the second to be a more significant result than the first, you are really just saying that you use a natural hypothesis class in which probability judgments are only dependent on the ratios of heads and tails, not the particular sequence of heads and tails. Now, my question for you is: since 180.000º is just as likely as 180.001º, what set of hypotheses are you considering in which the first is much less likely than the second?”

Galileo: “I must confess, I have difficulty answering your question. For while there is a simple sense in which the number of heads and tails is a product of a coin’s bias, it is less clear what would be the analogous ‘bias’ in angles and distances between stars that should make straight lines and equal distances less likely than any others. I must say, Eamadin, that in calling you here, I find myself even more confused than when I began!”

Eamadin: “I apologize, my friend. But now let me attempt to disentangle this mess and provide a guiding light towards a solution to your problem.”

Galileo: “Please.”

Eamadin: “Perhaps we may find some objective sense in which a straight line or the equality of two quantities is a simpler mathematical pattern than a nearly straight line or two nearly equal quantities. But even if so, this will only be a help to us insofar as we have a presumption in favor of less simple patterns inhering in Nature.”

Galileo: “This is no help at all! For surely the principle of Ockham should push us towards favoring more simple patterns.”

Eamadin: “Precisely. So if we are not to look for an objective basis for the improbability of simple and elegant patterns, then we must look towards the subjective. Here we may find our answer. Suppose I were to scribble down on a sheet of paper a series of symbols and shapes, hidden from your view. Now imagine that I hand the images to you, and you go off to some unexplored land. You explore the region and draw up cartographic depictions of the land, having never seen my images. It would be quite a remarkable surprise were you to find upon looking at my images that they precisely matched your maps of the land.”

Galileo: “Indeed it would be. It would also quickly lend itself to a number of possible explanations. Firstly, it may be that you were previously aware of the layout of the land, and drew your pictures intentionally to capture the layout of the land – that is, that the layout directly caused the resemblance in your depictions. Secondly, it could be that there was a common cause between the resemblance and the layout; perhaps, for instance, the patterns that most naturally come to the mind are those that resemble common geographic features. And thirdly, included only for completion, it could be that your images somehow caused the land to have the geographic features that it did.”

Eamadin: “Exactly! You catch on quickly. Now, this case of the curious coincidence of depiction and reality is exactly analogous to your problem of the straight line in the sky. The straight lines and equal distances are just like patterns on the slips of paper I handed to you. For whatever reason, we come pre-loaded with a set of sensitivities to certain visual patterns. And what’s remarkable about your observation of the three stars is that a feature of the natural world happens to precisely align with these patterns, where we would expect no such coincidence to occur!”

Galileo: “Yes, yes, I see. You are saying that the improbability doesn’t come from any objective unusual-ness of straight lines or equal distances. Instead, the improbability comes from the fact that the patterns in reality just happen to be the same as the patterns in my head!”

Eamadin: “Precisely. Now we can break down the suitable explanations, just as you did with my cartographic example. The first explanation is that the patterns in your mind were caused by the patterns in the sky. That is, for some reason the fact that these stars were aligned in this particular way caused you to by psychologically sensitive to straight lines and equal quantities.”

Galileo: “We may discard this explanation immediately, for such sensitivities are too universal and primitive to be the result of a configuration of stars that has only just now made itself apparent to me.”

Eamadin: “Agreed. Next we have a common cause explanation. For instance, perhaps our mind is naturally sensitive to visual patterns like straight lines because such patterns tend to commonly arise in Nature. This natural sensitivity is what feels to us on the inside as simplicity. In this case, you would expect it to be more likely for you to observe simple patterns than might be naively thought.”

Galileo: “We must deny this explanation as well, it seems to me. For the resemblance to a straight line goes much further than my visual resolution could even make out. The increased likelihood of observing a straight line could hardly be enough to outweigh our initial naïve calculation of the probability being 10-9. But thinking more about this line of reasoning, it strikes me that you have just provided an explanation the apparent simplicity of the laws of Nature! We have developed to be especially sensitive to patterns that are common in Nature, we interpret such patterns as ‘simple’, and thus it is a tautology that we will observe Nature to be full of simple patterns.”

Eamadin: “Indeed, I have offered just such an explanation. But it is an unsatisfactory explanation, insofar as one is opposed to the notion of simplicity as a purely subjective feature. Most people, myself included, would strongly suggest that a straight line is inherently simpler than a curvy line.”

Galileo: “I feel the same temptation. Of course, justifying a measure of simplicity that does the job we want of it is easier said than done. Now, on to the third explanation: that my sensitivity to straight lines has caused the apparent resemblance to a straight line. There are two interpretations of this. The first is that the stars are not actually in a straight line, and you only think this because of your predisposition towards identifying straight lines. The second is that the stars aligned in a straight line because of these predispositions. I’m sure you agree that both can be reasonably excluded.”

Eamadin: “Indeed. Although it may look like we’ve excluded all possible explanations, notice that we only considered one possible form of the common cause explanation. The other two categories of explanations seem more thoroughly ruled out; your dispositions couldn’t be caused by the star alignment given that you have only just found out about it and the star alignment couldn’t be caused by your dispositions given the physical distance.”

Galileo: “Agreed. Here is another common cause explanation: God, who crafted the patterns we see in Nature, also created humans to have similar mental features to Himself. These mental features include aesthetic preferences for simple patterns. Thus God causes both the salience of the line pattern to humans and the existence of the line pattern in Nature.”

Eamadin: “The problem with this is that it explains too much. Based solely on this argument, we would expect that when looking up at the sky, we should see it entirely populated by simple and aesthetic arrangements of stars. Instead it looks mostly random and scattershot, with a few striking exceptions like those which you have pointed out.”

Galileo: “Your point is well taken. All I can imagine now is that there must be some sort of ethereal force that links some stars together, gradually pushing them so that they end up in nearly straight lines.”

Eamadin: “Perhaps that will be the final answer in the end. Or perhaps we will discover that it is the whim of a capricious Creator with an unusual habit for placing unsolvable mysteries in our paths. I sometimes feel this way myself.”

Galileo: “I confess, I have felt the same at times. Well, Eamadin, although we have failed to find a satisfactory explanation for the moment, I feel much less confused about this matter. I must say, I find this method of reasoning by noticing similarities between features of our mind and features of the world quite intriguing. Have you a name for it?”

Eamadin: “In fact, I just thought of it on the spot! I suppose that it is quite generalizable… We come pre-loaded with a set of very salient and intuitive concepts, be they geometric, temporal, or logical. We should be surprised to find these concepts instantiated in the world, unless we know of some causal connection between the patterns in our mind and the patterns in reality. And by Eamadin’s rule of probability-updating, when we notice these similarities, we should increase our strength of belief in these possible causal connections. In the spirit of anachrony, let us refer to this as the Schelling point improbability principle!”

Galileo: “Sounds good to me! Thank you for your assistance, my friend. And now I must return to my exploration of the Cosmos.”

Why “number of parameters” isn’t good enough

A friend of mine recently pointed out a curious fact. Any set of two-dimensional data whatsoever can be perfectly fit by a simple two-parameter sinusoidal model.

y(x) = A sin(Bx)

Sound wrong? Check it out:

small-sine-zoom.png

Zoomed out:small-sine.png

N = 10 pointssine-overfit.png

As you see, as the number of data points goes up, all you need to do to accommodate this is increase the frequency in your sine function, and adjust the amplitude as necessary. Ultimately, you can fit any data set with a ridiculously quickly oscillating and large-amplitude sine function.

Now, most model selection methods explicitly rely on the parameter count to estimate the potential of a model to overfit. For example, if k is the number of parameters in a model, and L is the log likelihood of the data given the model, we have:

AIC = L – k
BIC = L – k/2・log(N)

This little example represents a fantastic failure of parameter count to successfully do the job AIC and BIC ask of it. Evidently parameter count is too blunt an instrument to do the job we require of it, and we need something with more nuance.

One more example.

For any set of data, if you can perfectly fit a curve to each data point, and if your measurement error σ is an adjustable parameter, then you can take the measurement error to zero to have a fit with infinite accuracy. Now when we evaluate, you find it running off to infinity! Thus our ‘fit to data’ term L goes to infinity, while the model complexity penalty stays a small finite number.

Once again, we see the same lack of nuance dragging us into trouble. The number of parameters might do well at estimating overfitting potential for some types of well-behaved parameters, but it clearly doesn’t do the job universally. What we want is some measure that is sensitive to the potential for some parameters to capture “more” of the space of all possible distributions than others.

And lo and behold, we have such a measure! This is the purpose of information geometry and the volume of a model in the space formed by the Fisher information metric as the penalty for overfitting potential. You can learn more about it in a post I wrote here.

Bayesian Occam’s Razor

A couple of days ago I posted a question that has been bugging me; namely, does Bayes’ overfit, and if not, why not?

Today I post the solution!

There are two parts: first, explaining where my initial argument against Bayes went wrong, and second, describing the Bayesian Occam’s Razor, the key to understanding how a Bayesian deals with overfitting.

Part 1: Why I was wrong

Here’s the argument I wrote initially:

  1. Overfitting arises from an excessive focus on accommodation. (If your only epistemic priority is accommodating the data you receive, then you will over-accommodate the data, by fitting the noise in the data instead of just the underlying trend.)
  2. We can deal with overfitting by optimizing for other epistemic virtues like simplicity, predictive accuracy, or some measure of distance to truth. (For example, minimum description length and maximum entropy optimize for simplicity, and cross validation optimizes for predictive accuracy).
  3. Bayesianism is an epistemological procedure that has two steps, setting of priors and updating those priors.
  4. Updating of priors is done via Bayes’ rule, which rewards theories according to how well they accommodate their data (creating the potential for overfitting).
  5. Bayesian priors can be set in ways that optimize for other epistemic virtues, like simplicity or humility.
  6. In the limit of infinite evidence, differences in priors between empirically distinguishable theories are washed away.
  7. Thus, in the limit, Bayesianism becomes a primarily accommodating procedure, as the strength of the evidential update swamps your initial differences in priors.

Here’s a more formal version of the argument:

  1. The relative probabilities of two model given data is calculated by Bayes’ rule:
    P(M | D) / P(M’ | D)  = P(M) / P(M’)・P(D | M) / P(D | M’)
  2. If M overfits the data and M’ does not, then as the size of the data set |D| goes to infinity, the likelihood factor P(D | M) / P(D | M’) goes to infinity.
  3. Thus the posterior probability P(M | D) should go to 1 for the model that most drastically overfits the data.

This argument is wrong for a couple of reasons. For one, the argument assumes that as the size of the data set grows, the model stays the same. But this is very much not going to be true in general. The task of overfitting gets harder and harder as the number of data points go up. It’s not that there’s no longer noise in the data; it’s that the signal becomes more and more powerful.

A perfect polynomial fit on 100 data points must have, at the worst, 100 parameters. On 1000 data points: 1000 parameters. Etc. In general, as you add more data points, a model that was initially overfitting (e.g. the 100-parameter distribution) will find that it is harder and harder to ignore the signal for the noise, and the next best overfitting model will have more parameters (e.g. the 1000-parameter distribution).

But now we have a very natural solution to the problem we started with! It is true that as the number of data points increases, the evidential support for the model that overfits the data will get larger and larger. It’s also true is that the number of parameters required to overfit the data will grow as well. So if your prior in a model is a decreasing function of the number of parameters in the model, then you can in principle find a perfect balance and avoid overfitting. This perfect balance would be characterized by the following: each time you increase the number of parameters, the prior should decrease by an amount proportional to how much more you get rewarded by overfitting the data with the extra parameters.

How do we find this prior in practice? Beats me… I’d be curious to know, myself.

But what’s most interesting to me is that to solve overfitting as a Bayesian, you don’t even need the priors; the solution comes from the evidential update! It turns out that in fact, the likelihood function for updating credences in a model given data automatically incorporates in model overparameterization. Which brings us to part 2!

Part 2: Bayesian Occam’s Razor

That last sentence bears repeating. In reality, although priors can play some role by manually penalizing models with high overfitting potential, the true source of the Bayesian Occam’s razor comes from the evidential update. What we’ll find by the end of this post is that models that overfit don’t actually get a stronger evidential update than models that don’t.

You might wonder how this is possible. Isn’t it practically the definition of overfitting that it is an enhancement of the strength of an evidential update through fitting to noise in the data?

Sort of. It is super important to keep in mind the distinction between a model and a distribution. A distribution is a single probability function over your possible observable data. A model is a set of distributions, characterized by a set of parameters. When we say that some models have the potential to overfit a set of data, what we are really saying is that some models contain distributions that overfit the data.

Why is this important? Because assessing the posterior probability of the model is not the same as assessing the posterior probability of the overfitting distribution within the model! Here’s Bayes’ rule, applied to the model and to the overfitting distribution:

(1) P(M | D) = P(M)・P(D | M) / P(D)

(2) P(theta hat | D) = P(theta hat)・P(D | theta hat) / P(D)

It’s clear how to evaluate equation (2). You have some prior probability assigned to theta hat, you know how to assess the likelihood function P(D | theta hat), and P(D) is an integral that is in principle do-able. In addition, equation (2) has the scary feature we’ve been talking about: the likelihood function P(D | theta hat) is really really large if our parameter theta hat overfits the data, potentially large enough to swamp the priors and screw up our Bayesian calculus.

But what we’re really interested in evaluating is not equation (2), but equation (1)! This is, after all, model selection; we are in the end trying to assess the quality of different models, not individual distributions.

So how do we evaluate (1)? The key term is P(D | M); your prior over the models and the data you receive are not too important for the moment. What is P(D | M)? This question does not actually have an obvious answer… M is a model, a set of distributions, not a single distribution. If we were looking at one distribution, it would be easy to assess the likelihood of the data given that distribution.

So what does P(D | M) really mean?

It represents the average probability of the data, given the model. It’s as if you were to draw a distribution at random from your model, and see how well it fits the data. More precisely, you draw a distribution from your model, according to your prior distribution over the distributions in the model.

That was a mouthful. But the basic idea is simple; a model is an infinite set of distributions, each corresponding to a particular set of values for the parameters that define the model. You have a prior distribution over these values for the parameters, and you use this prior distribution to “randomly” select a distribution in your model. You then assess the probability of the data given that distribution, and voila, you have your likelihood function.

In other words…

P(D | M) = ∫ P(D | θ) P(θ | M) dθ

Now, an overfitting model has a massive space of parameters, and in some small region of this space contains distributions that fit the data really well. On the other hand, a simple model that generalizes well has a small space of parameters, and a region of this space contains distributions that fit the data well (though not as well as the overfitter).

So on average, you are much less likely to select the optimal distribution in the overfitting model than in the generalizable model. Why? Because the space of parameters you must search through to find it is so much larger!

True, when you do select the optimal distribution in the overfitting model, you get rewarded with a better fit to the data than you could have gotten from the nice model. But the balance, in the end, pushes you towards simpler and more general models.

This is the Bayesian Occam’s Razor! Models that are underparameterized do poorly on average, because they just can’t fit the data at all. Models that are overparametrized do poorly on average, because the subset of the parameter space that fits the data well is so tiny compared to the volume of the parameter space as a whole. And the models that strike the perfect balance are those that have enough parameters to fit the data well, but not too many as to excessively bloat the parameter space.

Here are some lecture slides from these great notes that have some helpful visualizations:

Screen Shot 2018-03-16 at 2.33.34 AMScreen Shot 2018-03-16 at 2.33.50 AM

Recapping in a few sentences: Simpler models are promoted, simply because they do well on average. And evidential support for a model comes down to the performance on average, not optimal performance. The likelihood in question is not P(data | best distribution in model), it’s P(data | average distribution in model). So overfitting models actually don’t get as much evidential support from data when assessing the model quality as a whole!

Ain’t that cool??

All about IQ

IQ is an increasingly controversial topic these days. I find that when it comes up, different people seem to be extremely confident in wildly different beliefs about the nature of IQ as a measure of intelligence.

Part of this has to do with education. This paper analyzed the top 29 most used introductory psychology textbooks and “found that 79.3% of textbooks contained inaccurate statements and 79.3% had logical fallacies in their sections about intelligence.” [1]

This is pretty insane, and sounds kinda like something you’d hear from an Alex Jones-style conspiracy theorist. But if you look at what the world’s experts on human intelligence say about public opinion on intelligence, they’re all in agreement: misinformation about IQ is everywhere. It’s gotten to the point where world-famous respected psychologists like Steven Pinker are being blasted as racists in articles in mainstream news outlets for citing basic points of consensus in the scientific literature.

The reasons for this are pretty clear… people are worried about nasty social and political implications of true facts about IQ. There are worthwhile points to be made about morally hazardous beliefs and the possibility that some truths should not be publicly known. At the same time, the quantification and study of human intelligence is absurdly important. The difference between us and the rest of the animal world, the types of possible futures that are open to us as a civilization, the ability to understand the structure of the universe and manipulate it to our ends; these are the types of things that the subject of human intelligence touches on. In short, intelligence is how we accomplish anything as a civilization, and the prospect of missing out on ways to reliably intervene and enhance it because we avoided or covered up research that revealed some inconvenient truths seems really bad to me.

Overall, I lean towards thinking that the misinformation is so great, and the truth so important, that it’s worthwhile to attempt to clear things up. So! The purpose of this post is just to sort through some of the mess and come up with a concise and referenced list of some of the most important things we know about IQ and intelligence.

IQ Basics

  • The most replicated finding in all of psychology is that good performance on virtually all cognitively demanding tasks is positively correlated. The name for whatever cognitive faculty causes this correlation is “general intelligence”, or g.
  • A definition of intelligence from 52 prominent intelligence researchers: [2]

Intelligence is a very general capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It is not merely book learning, a narrow academic skill, or test‑taking smarts. Rather, it reflects a broader and deeper capability for comprehending our surroundings—‘catching on’, ‘making sense’ of things, or ‘figuring out’ what to do. Intelligence, so defined, can be measured, and intelligence tests measure it well.

  • IQ tests are among the most reliable and valid of all psychological tests and assessments. [3]
    • They are designed to test general intelligence, and not character or personality.
    • Modern IQ tests have a standard error of measurement of about 3 points.
  • The distribution of IQs in a population nicely fits a Bell curve.
    • IQ is defined in such a way as to make the population mean exactly 100, and the standard deviation 15.
  • People with high IQs tend to be healthier, wealthier, live longer, and have more successful careers. [4][5][6]
    • IQ is highly predictive of educational aptitude and job performance. [7][8][9][10][11]
    • Longitudinal studies have shown that IQ “is a causal influence on future achievement measures whereas achievement measures do not substantially influence future IQ scores.” [12]

Average adult combined IQs associated with real-life accomplishments by various tests

Accomplishment IQ
MDs, JDs, and PhDs 125
College Graduates 115
1–3 years of college 104
Clerical and sales workers 100–105
High school graduates, skilled workers (e.g., electricians, cabinetmakers) 97
1–3 years of high school (completed 9–11 years of school) 94
Semi-skilled workers (e.g. truck drivers, factory workers) 90–95
Elementary school graduates (completed eighth grade) 90
Elementary school dropouts (completed 0–7 years of school) 80–85
Have 50/50 chance of reaching high school 75

(table from Wiki)

 

Table 25.1 Relationship between intelligence and measures of success (Results from meta-analyses)
Measure of success r k N Source
Academic performance in primary education 0.58 4 1791 Poropat (2009)
Educational attainment 0.56 59 84828 Strenze (2007)
Job performance (supervisory rating) 0.53 425 32124 Hunter and Hunter (1984)
Occupational attainment 0.43 45 72290 Strenze (2007)
Job performance (work sample) 0.38 36 16480 Roth et al. (2005)
Skill acquisition in work training 0.38 17 6713 Colquitt et al. (2000)
Degree attainment speed in graduate school 0.35 5 1700 Kuncel et al. (2004)
Group leadership success (group productivity) 0.33 14 Judge et al. (2004)
Promotions at work 0.28 9 21290 Schmitt et al. (1984)
Interview success (interviewer rating of applicant) 0.27 40 11317 Berry et al. (2007)
Reading performance among problem children 0.26 8 944 Nelson et al. (2003)
Becoming a leader in group 0.25 65 Judge et al. (2004)
Academic performance in secondary education 0.24 17 12606 Poropat (2009)
Academic performance in tertiary education 0.23 26 17588 Poropat (2009)
Income 0.20 31 58758 Strenze (2007)
Having anorexia nervosa 0.20 16 484 Lopez et al. (2010)
Research productivity in graduate school 0.19 4 314 Kuncel et al. (2004)
Participation in group activities 0.18 36 Mann (1959)
Group leadership success (group member rating) 0.17 64 Judge et al. (2004)
Creativity 0.17 447 Kim (2005)
Popularity among group members 0.10 38 Mann (1959)
Happiness 0.05 19 2546 DeNeve & Cooper (1998)
Procrastination (needless delay of action) 0.03 14 2151 Steel (2007)
Changing jobs 0.01 7 6062 Griffeth et al. (2000)
Physical attractiveness -0.04 31 3497 Feingold (1992)
Recidivism (repeated criminal behavior) -0.07 32 21369 Gendreau et al. (1996)
Number of children -0.11 3 Lynn (1996)
Traffic accident involvement -0.12 10 1020 Arthur et al. (1991)
Conformity to persuasion -0.12 7 Rhodes and Wood (1992)
Communication anxiety -0.13 8 2548 Bourhis and Allen (1992)
Having schizophrenia -0.26 18 Woodberry et al. (2008)

(from Gwern)

Nature of g

  • IQ scores are very stable across lifetime. [13]
    • This doesn’t mean that 30-year-old you is no smarter than 10-year-old you. It means that if you test the IQ of a bunch of children, and then later test them as adults, the rank order will remain roughly the same. A smarter-than-average 10 year old becomes a smarter-than-average 30 year old.
  • After your mid-20s, crystallized intelligence plateaus and  fluid intelligence starts declining. Obligatory terrifying graph: (source)

  • High IQ is correlated with more gray matter in the brain, larger frontal lobes, and a thicker cortex. [14][15]
    • There is a constant cascade of information being processed in the entire brain, but intelligence seems related to an efficient use of relatively few structures, where the more gray matter the better.” [16]
  • “Estimates of how much of the total variance in general intelligence can be attributed to genetic influences range from 30 to 80%.” [17]
    • Twin studies show the same results; there are substantial genetic influences on human intelligence. [18]
    • The genetic component of IQ is highly polygenic, and no specific genes have been robustly associated with human intelligence. The best we’ve found so far is a single gene that accounts for 0.1% of the variance in IQ. [17]
  • Many genes have been weakly associated with IQ. “40% of the variation in crystallized-type intelligence and 51% of the variation in fluid-type intelligence between individuals” is accounted for by genetic differences. [19]
    • Scientists can predict your IQ by looking only at your genes (not perfectly, but significantly better than random). [19]
      • This study analyzed 549,692 base pairs and found a R = .11 mean correlation between their predictions and the actual fluid intelligence of over 3500 unrelated adults. [19]

You might be wondering at this point what all the controversy regarding IQ is about. Why are so many people eager to dismiss IQ as a valid measure of intelligence? Well, we now dive straight into the heart of the controversy: intergroup variation in IQ.

It’s worth noting that, as Scott Alexander puts it: society is fixed, while biology is mutable. This fear we have that if biology factors into the underperformance of some groups, then such difference are intrinsically unalterable, makes little sense. We can do things to modify biology just as we can do things to modify society, and in fact the first is often much easier to do and more effective than the easier.

Anyway, prelude aside, we dive into the controversy.

Group differences in IQ

  • Yes, there are racial differences in IQ, both globally and within the United States. This has been studied to death, and is a universal consensus; you won’t find a single paper in a reputable psychology journal denying the numerical differences. [20]
  • Within the United States, there is a long-standing 1 SD (15 to 18 point) IQ difference between African Americans and White Americans. [2]
    • The tests in which these differences are most pronounced are those that most closely correspond to g, like Raven’s Progressive Matrices. [6] This test also is free of culturally-loaded knowledge, and only requires being able to solve visual pattern-recognition puzzles like these ones:

      • Controlling for the way the tests are formulated and administered does not affect this difference. [2]
      • IQ scores predict success equally accurately regardless of race or social class. This provides some evidence that the test is not culturally biased as a predictor. [2] [19]
  • Internationally, the lowest average IQs are found in sub-Saharan Africa and the highest average IQs are found in East Asia. The variations span a range of three standard deviations (45 IQ points). [21]
    • Malawi has an estimated average IQ of 60.
    • Singapore and Hong Kong have estimated IQs around 108.

(image from here)

  • A large survey published in one of the top psychology journals polled over 250 experts on IQ and international intelligence differences. [21]
    • On possible causes of cross-national differences in cognitive ability: “Genes were rated as the most important cause (17%), followed by educational quality (11.44%), health (10.88%), and educational quantity (10.20%).”
    • “Around 90% of experts believed that genes had at least some influence on cross-national differences in cognitive ability.”
  • Men and women have equal average IQs.
    • But: “most IQ tests are constructed so that there are no overall score differences between females and males.” [6]
    • They do this by removing items that show significant sex differences. So, for instance, men have a 1 SD (15 point) advantage on visual-spatial tasks over women. Thus mental rotation tests have been removed, in order to reduce the perception of bias. [22]
    • Males also do better on proportional and mechanical reasoning and mathematics, while females do better on verbal tests. [22]
  • Hormones are thought to play a role in sex differences in cognitive abilities. [23]
    • Females that are exposed to male hormones in utero have higher spatiotemporal reasoning scores than females that are not. [24]
    • The same thing is seen with men that have higher testosterone levels, and older males given testosterone. [25]
  • There is also some evidence of men having a higher IQ variance than women, but this seems to be disputed. If true, it would indicate more men at the very bottom and the very top of the IQ scale (helping to explain sex disparities in high-IQ professions). [26]

IQ Trends

  • In the developed world, average IQ has been increasing by 2 to 3 points per decade since 1930. This is called the Flynn effect.
    • The average IQ in the US in 1932, as measured by a 1997 IQ test, would be around 80. People with IQ 80 and below correspond to the bottom 9% of the 1997 population. [27]
  • Some studies have found that the Flynn effect seems to be waning in the developing world, and beginning in the developing world. [28]
  • A large survey of experts found that most attribute the Flynn effect to “better health and nutrition, more and better education and rising standards of living.” [29]
  • The Flynn effect is not limited to IQ tests, but is also found in memory tests, object naming, and other commonly used neuropsychological tests. [30]
  • Many studies indicate that the black-white IQ gap in the United States is closing. [23]

Can IQ be increased?

  • There are not any known interventions to reliably cause long term increases (although decreasing it is easy).
    • Essentially, you can do a handful of things to ensure that your child’s IQ is not low (give them access to education, provide them good nutrition, prevent iodine deficiency, etc), but you can’t do much beyond these.
  • Educational intervention programs have fairly unanimously failed to show long-term increases in IQ in the developed world. [23]
    • The best prekindergarten programs have a substantial short-term effect on IQ, but this effect fades by late elementary school.

Random curiosities

  • Several large-scale longitudinal studies have found that children with higher IQ are more likely to have used illegal drugs by middle age. This association is stronger for women than men. [31][32]
    • This actually makes some sense, given that IQ is positively correlated with Openness (in the Big Five personality traits breakdown).
  • The average intelligence of Marines has been significantly declining since 1980. [33]
  • “The US military has minimum enlistment standards at about the IQ 85 level. There have been two experiments with lowering this to 80 but in both cases these men could not master soldiering well enough to justify their costs.” (from Wiki)
    • This is fairly terrifying when you consider that 10% of the US population has an IQ of 80 or below; evidently, this enormous segment of humanity has an extremely limited capacity to do useful work for society.
  • Researchers used to think that IQ declined significantly starting around age 20. Subsequently this was found to be mostly a product of the Flynn effect: as average IQ increases, the normed IQ value inflates, so a constant IQ looks like it decreases. (from Wiki)
  • The popular idea that listening to classical music increases IQ has not been borne out by research. (Wiki)
  • There’s evidence that intelligence is part of the explanation for differential health outcomes across socioeconomic class.
    • “…Health workers can diagnose and treat incubating problems, such as high blood pressure or diabetes, but only when people seek preventive screening and follow treatment regimens. Many do not. In fact, perhaps a third of all prescription medications are taken in a manner that jeopardizes the patient’s health. Non-adherence to prescribed treatment regimens doubles the risk of death among heart patients (Gallagher, Viscoli, & Horwitz, 1993). For better or worse, people are substantially their own primary health care providers.”

      “For instance, one study (Williams et al., 1995) found that, overall, 26% of the outpatients at two urban hospitals were unable to determine from an appointment slip when their next appointment was scheduled, and 42% did not understand directions for taking medicine on an empty stomach. The percentages specifically among outpatients with inadequate literacy were worse: 40% and 65%, respectively. In comparison, the percentages were 5% and 24% among outpatients with adequate literacy. In another study (Williams, Baker, Parker, & Nurss, 1998), many insulin-dependent diabetics did not understand fundamental facts for maintaining daily control of their disease: Among those classified as having inadequate literacy, about half did not know the signs of very low or very high blood sugar, and 60% did not know the corrective actions they needed to take if their blood sugar was too low or too high. Among diabetics, intelligence at time of diagnosis correlates significantly (.36) with diabetes knowledge measured 1 year later (Taylor, Frier, et al., 2003).” [34]
  • IQ differences might be able to account for a significant portion of global income inequality.
    • “… in a conventional Ramsey model, between one-fourth and one-half of income differences across countries can be explained by a single factor: The steady-state effect of large, persistent differences in national average IQ on worker productivity. These differences in cognitive ability – which are well-supported in the psychology literature – are likely to be malleable through better nutrition, better education, and better health care in the world’s poorest countries. A simple calibration exercise in the spirit of Bils and Klenow (AER, 2000) and Castro (Rev. Ec. Dyn., 2005) is conducted. According to the model, a move from the bottom decile of the global IQ distribution to the top decile will cause steady-state living standards to rise by between 75 and 350 percent. I provide evidence that little of IQ-productivity relationship is likely to be due to reverse causality.” [35]
  • Exposure to lead hampers cognitive development and lowers IQ. You can calculate the economic boost the US received as a result of the dramatic reduction in children’s exposure to lead since the 1970s and the resulting increase in IQs.
    • “The base-case estimate of $213 billion in economic benefit for each cohort is based on conservative assumptions about both the effect of IQ on earnings and the effect of lead on IQ.” [36]
    • Yes. $213 billion.
  • In a 113-country analysis, IQ has been found to positively affect all main measures of institutional quality.
    • “The results show that average IQ positively affects all the measures of institutional quality considered in our study, namely government efficiency, regulatory quality, rule of law, political stability and voice and accountability. The positive effect of intelligence is robust to controlling for other determinants of institutional quality.” [37]
  • High IQ people cooperate more in repeated prisoner’s experiments; 5% to 8% more cooperation per 100 point increase in SAT score (7 pt IQ increase). [38][39]
    • The second paper also shows more patience and higher savings rates for higher IQ. [39]
  • Embryo selection is a possible way to enhance the IQ of future generations, and is already technologically feasible.
    • “Biomedical research into human stem cell-derived gametes may enable iterated embryo selection (IES) in vitro, compressing multiple generations of selection into a few years or less.” [40]
      Selection Average IQ gain
      1 in 2 4.2
      1 in 10 11.5
      1 in 100 18.8
      1 in 1000 24.3

Sources

There is a ridiculous amount of research out there on IQ, and you can easily reach any conclusion you want by just finding some studies that agree with you. I’ve tried to stick to relying on large meta-analyses, papers of historical significance, large surveys of experts, and summaries by experts of consensus views.

[1] Warne, R. T., Astle, M. C., & Hill, J. C. (2018). What Do Undergraduates Learn About Human Intelligence? An Analysis of Introductory Psychology Textbooks. Archives of Scientific Psychology, 6(1), 32-50.

[2] Gottfredson, L. S. (1997). Mainstream science on intelligence: An editorial with 52 signatories, history and bibliography. Intelligence, 24(1), 13-23.

[3] Colom, R. (2004). Intelligence Assessment. Encyclopedia of Applied Psychology, 2(2), 307–314.

[4] Batty, D. G., Deary, I. J,, Gottfredson, L. S. (2007).  Premorbid (early life) IQ and Later Mortality Risk: Systematic ReviewAnnals of Epidemiology, 17(4), 278–288.

[5] Gottfredson, L. S. (1997). Why g Matters: The Complexity of Everyday LifeIntelligence, 24(1), 79-132.

[6] Neisser, U, et al. (1996). Intelligence: Knowns and Unknowns. American Psychological Association. American Psychologist, 51(2), 77-101.

[7] Deary, I. J., et al. (2007). Intelligence and educational achievementIntelligence, 35(1), 13-21.

[8] Dumfart, B., & Neubauer, A. C. (2016). Conscientiousness is the most powerful noncognitive predictor of school achievement in adolescents. Journal of Individual Differences, 37(1), 8-15.

[9] Kuncel, N. R., & Hezlett, S. A. (2010). Fact and Fiction in Cognitive Ability Testing for Admissions and Hiring DecisionsCurrent Directions in Psychological Science, 19(6), 339-345.

[10] Schmidt, F. L., Hunter, J. E. (1998). The Validity and Utility of Selection Methods in Personnel Psychology: Practical and Theoretical Implications of 85 Years of Research FindingsPsychological Bulletin, 124(2), 262-274.

[11] Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performancePsychological Bulletin, 96(1), 72-98.

[12] Watkins, M. W., Lei, P., Canivez, G. L. (2007). Psychometric intelligence and achievement: A cross-lagged panel analysisIntelligence, 35(1), 59-68.

[13] Deary, I. J., et al. (2000). The stability of individual differences in mental ability from childhood to old age: follow-up of the 1932 Scottish Mental Survey. Intelligence, 28(1), 49–55.

[14] Frangou, S., Chitins, X., Williams, S. C. R. (2004). Mapping IQ and gray matter density in healthy young people.  NeuroImage, 23(3), 800-805.

[16] Narr, K., et al. (2007). Relationships between IQ and Regional Cortical Gray Matter Thickness in Healthy Adults. Cerebral Cortex, 17(9), 2163–2171.

[15] University Of California – Irvine. “Human Intelligence Determined By Volume And Location Of Gray Matter Tissue In Brain.” ScienceDaily, 20 July 2004.

[17] Deary, I. J., Penke, L., Johnson, W. (2010) The neuroscience of human intelligence differencesNature Reviews Neuroscience, 11(3), 201–211.

[18] Deary, I. J., Johnson, W., Houlihan, L. M. (2009). Genetic foundations of human intelligenceHuman Genetics, 126(1), 215-232.

[19] Davies, G., et al. (2011). Genome-wide association studies establish that human intelligence is highly heritable and polygenicMol Psychiatry, 16(10), 996–1005.

[20] Rushton, J. P., Jensen, A. R. (2005). Thirty Years of Research on Race Differences in Cognitive AbilityPsychology, Public Policy, 11(2), 235-294.

[21] Rindermann, H., Becker, D., Coyle, T. R. (2016). Survey of Expert Opinion on Intelligence: Causes of International Differences in Cognitive Ability TestsFrontiers in Psychology, 7.

[22] Ellis, L., et al. (2008). Sex Differences: Summarizing More than a Century of Scientific Research. Psychology Press.

[23] Nisbett, R. E., et al. (2012). Intelligence: New Findings and Theoretical DevelopmentsAmerican Psychologist, 67(2), 129.

[24] Resnick, S. M., et al. (1986). Early hormonal influences on cognitive functioning in congenital adrenal hyperplasiaDevelopmental Psychology, 22(2), 191-198.

[25] Janowsky, J. S., Oviatt, S. K., Orwoll, E. S. (1994) Testosterone influences spatial cognition in older men. Behavioral Neuroscience, 108(2), 325-332.

[26] Lynn, R., Kanazawa, S. (2011). A longitudinal study of sex differences in intelligence at ages 7, 11 and 16 yearsPersonality and Individual Differences, 51(3), 321–324.

[27] Neisser, U. (1997). Rising Scores on Intelligence Tests. American Scientist, 85(5), 440-447.

[28] Pietschnig, J., Voracek, M. (2015). One Century of Global IQ Gains: A Formal Meta-Analysis of the Flynn Effect (1909-2013)Perspectives on Psychological Science, 10(3), 282-306.

[29] Rindermann, H., Becker, D., Coyle, T. R. (2017). Survey of expert opinion on intelligence: The Flynn effect and the future of intelligence. Personality and Individual Differences, 106, 242-247.

[30] Trahan, L. H., et al. (2014). The Flynn Effect: A Meta-analysisPsychological Bulletin, 140(5), 1332-1360.

[31] White, J., Gale, C. R., Batty, D. G. (2012). Intelligence quotient in childhood and the risk of illegal drug use in middle-age: the 1958 National Child Development SurveyAnnals of Epidemiology, 22(9), 654-657.

[32] White, J., Batty, D. G. (2011). Intelligence across childhood in relation to illegal drug use in adulthood: 1970 British Cohort Study. Journal of Epidemiology & Community Health, 66(9).

[33] Cancian, M. F., Klein, M. W. (2015). Military Officer Quality in the All-Volunteer ForceNational Bureau of Economic Research, WP 21372.

[34] Gottfredson, L.S. (2004). Intelligence: Is it the epidemiologists’ elusive fundamental cause of social class inequalities in health?Journal of Personality and Social Psychology86(1), 174-199.

[35] Jones, G. (2005). IQ in the Ramsey Model: A Naive Calibration. George Mason University.

[36] Grosse, S. D., et al. (2002). Economic Gains Resulting from the Reduction in Children’s Exposure to Lead in the United StatesEnvironmental Health Perspectives, 110(6), 563-569.

[37] Kalonda-Kanyama, I. & Kodila-Tedika, O. (2012). Quality of Institutions: Does Intelligence Matter?Working Papers Series in Theoretical and Applied Economics 201206, University of Kansas, Department of Economics.

[38] Jones, G. (2008). Are Smarter Groups More Cooperative? Evidence from Prisoner’s Dilemma Experiments, 1959-2003Journal of Economic Behavior & Organization 68(3–4), 489-497.

[39] Jones, G. (2011). National IQ and National Productivity: The Hive Mind Across Asia. Asian Development Review, 28(1), 51-71.

[40] Shulman, C. & Bostrom, N. (2014). Embryo Selection for Cognitive Enhancement: Curiosity or Game-changer?. Global Policy 5(1), 85-92.