If you’ve watched some of the popular movies out there about the 2008 financial crisis, chances are that you’ve been misled about one or two things. (I’m looking at you, Big Short.) For example:
Entertaining? Sure! Accurate? No, very much not so. This analogy is very much off the mark, as you’ll see in a minute.
Here’s a quote from Inside Job, often described as the most rigorous and well-researched of the popular movies on the crisis:
In the early 2000s, there was a huge increase in the riskiest loans, called subprime. But when thousands of subprime loans were combined to create CDOs, many of them still received AAA ratings.
The tone that this is stated in is one of disbelief at the idea that by combining subprime loans you can create extremely safe loans. And maybe this idea does sound pretty crazy if you haven’t studied much finance! But it’s actually correct. You can, by combining subprime loans, generate enormously safe investments, and thus the central conceit of a CDO is actually entirely feasible.
The overall attitude taken by many of these movies is that the financial industry in the early 2000s devoted itself to the scamming of investors for short-term profits through creation of complicated financial instruments like CDOs. As these movies describe, the premise of a CDO is that by combining a bunch of risky loans and slicing-and-dicing them a bit, you can produce a mixture of new investment opportunities including many that are extremely safe. This is all described in a tone that is supposed to convey a sense that this premise is self-evidently absurd.
I want to convince you that the premise of CDOs is not self-evidently absurd, and that in fact it is totally possible to pool risky mortgages to generate extremely safe investments.
So, why think that it should be possible to pool risky investments and decrease overall risk? Well first of all, that’s just what happens when you pool assets! Risk always decreases when you pool assets, with the only exception being the case where the assets are all perfectly correlated (which never happens in real life anyway).
As an example, imagine that we have two independent and identical bets, each giving a 90% chance of a $1000 return and a 10% chance of nothing.
Now put these two together, and split the pool into two new bets, each an average of the original two:
Take a look at what we’ve obtained. Now we have only a 1% chance of getting nothing (because both bets have to fail for this to happen). We do, however, have only a 81% chance of getting $1000, as opposed to the 90% we had earlier. But what about risk? Are we higher or lower risk than before?
The usual way of measuring risk is to look at standard deviations. So what are the standard deviations of the original bet and the new one?
Mean = 90% ($1000) + 10% ($0) = $900
Variance = 90% (100^2) + 10% (900^2) = 90,000
Standard deviation = $300
Mean = 81% ($1000) + 18% ($500) + 10% ($0) = $900
Variance = 81% (100^2) + 18% (400^2) + 1% (900^2) = 45,000
Standard deviation = $216.13
And look at what we see: risk has dropped, and fairly dramatically so, just by pooling independent bets! This concept is one of the core lessons of financial theory, and it goes by the name of diversification. The more bets we pool, the further the risk goes down, and in the limit of infinite independent bets, the risk goes to zero.
So if you’re watching a movie and somebody says something about combining risky assets to produce safe assets as if that idea is self-evidently absurd, you know that they have no understanding of basic financial concepts, and especially not a complex financial instrument like a CDO.
In fact, let’s move on to CDOs now. The setup I described above of simply pooling bets does decrease risk, but it’s not how CDOs work. At least, not entirely. CDOs still take advantage of diversification, but they also incorporate an additional clever trick to distribute risk.
The idea is that you take a pool of risky assets, and you create from it a new pool of non-identical assets with a spectrum of risk profiles. Previously all of the assets that we generated were identical to each other, but what we’ll do now with the CDO is that we’ll split up our assets into non-identical assets in such a way as to allocate the risk, so that some of the assets that we get will have very high risk (they’ll have more of the risk allocated to them), and some of them will have very little risk.
Alright, so that’s the idea: from a pool of equally risky assets, you can get a new pool of assets that have some variation in riskiness. Some of them are actually very safe, and some of them are very, very risky. How do we do this? Well let’s go back to our starting example where we had two identical bets, each with 90% chance of paying out $1000, and put them together in a pool. But this time, instead of creating two new identical bets, we are going to differentiate the two bets by placing an order priority of payout on them. In other words, one bet will be called the “senior tranche”, and will be paid first. And the other bet will be called the “junior tranche”, and will be paid only if there is still money left over after the senior tranche has been paid. What do the payouts for these two new bets look like?
The senior tranche gets paid as long as at least one of the two bets pays out, which happens with 99% probability. Remember, we started with only a 90% probability of paying out. This is a dramatic change! In terms of standard deviation, this is $99.49, less than a third of what we started with!
And what about the junior tranche? Its probability of getting paid is just the probability that both people don’t default, which is 81%. And its risk has gone up, with a standard deviation of $392.30. So essentially, all we’ve done is split up our risk. We originally had 90%/90%, and now we have 99%/81%. In the process, what we’ve done is we’ve created a very very safe bet and a very very risky bet.
The important thing is that these two bets have to both be sold. You can’t just sell the senior tranche to people who want safe things (pension funds), you have to also sell the junior tranche. So how do you do that? Well, you just lower its price! A higher rate of return awaits the taker of the junior tranche in exchange for taking on more risk.
Now if you think about it, this new lower level risk we’ve obtained, this 1% chance of defaulting that we got out of two bets that had a 10% chance of defaulting each, that’s a real thing! There really is a 1% chance that both bets default if they are independent, and so the senior tranche really can expect to get paid 99% of the time! There isn’t a lie or a con here, a pension funds that gets sold these senior tranches of CDOs is actually getting a safe bet! It’s just a clever way of divvying up risk among two assets.
I think the idea of a CDO is cool enough by itself, but I think that the especially cool thing about CDOs is that they open up the market to new customers. Previously, if you wanted to get a mortgage, then you had to find basically a bank that was willing to accept your level of risk, whatever it happens to be. And it could be that if you’re too high risk, then nobody wants to give you a mortgage, and you’d just be out of luck. Even prior to CDOs, when you had mortgage pooling but no payment priority, you have to have investors that are interested in the level of risk of your pool. The novelty of CDOS is in allowing you to alter the risk profile of your pool of mortgages at will.
:Let’s say that you have 100 risky loans, and there’s only enough demand for you to sell 50 of them. What you can do is create a CDO with 50 safe loans and 50 risky loans. Now you get to not only sell your risky loans, but you can also sell your safe loans to interested customers like pension funds! This is the primary benefit of the new financial technology of CDOs: it allows banks to generate tailor-made risk levels for the set of investors that are interested in buying, so that they can sell more mortgage-backed securities and get more people homes. And if everything is done exactly as I described it, then everything should work out fine.
But of course, things weren’t done exactly as I described them. The risk levels of individual mortgages were given increasingly optimistic ratings with stated-income loans, no-down-payment loans, and no-income no-asset loans. CDOs were complex and their risk level was often difficult to assess, resulting in less transparency and more ability for banks to over-report their safety. And crucially, the different tranches of any given CDO are highly dependent on each other, even after they’ve been sold to investors that have nothing to do with each other.
Let’s go back to our simple example of the two $1000 bets for illustration. Suppose that one of the two bets doesn’t pay out (which could correspond to one home-owner defaulting on their monthly payment). Now the senior tranche owner’s payment is entirely dependent on how the other bet performs. The senior tranche owner will get $1000 only if that remaining bet pays out, which happens with just 90% probability. So his chance of getting $1000 has dropped from 99% to 90%.
That’s a simple example of a more general point: that in a CDO, once the riskier tranches fail, the originally safe tranches suddenly become a lot riskier (so that what was originally AA is now maybe BBB). This helps to explain why once the housing bubble had popped, all levels of CDOs began losing value, not just the junior levels. Ordinary mortgage backed securities don’t behave this way! A AA-rated mortgage is rated that way because of some actual underlying fact about the reliability of the homeowner, which doesn’t necessarily change when less reliable homeowners start defaulting. A AA-rated CDO tranche might be rated that way entirely because it has payment priority, even though all the mortgages in its pool are risky.
Another way to say this: An ordinary mortgage backed security decreased risk just because of diversification (many mortgages pooled together make for a less risky bet than a single mortgage). But a CDO gets decreased risk because of both diversification and (in the upper tranches) the order priority (getting paid first). In both cases, as some of the mortgages in the pool fail, you lose some of the diversification benefit. But in the CDO case, you also lose the order priority benefit in the upper tranches (because, for example, if it takes 75 defaults in your pool for you to lose your money and 50 have already failed, then you are at a much higher risk of losing your money than if none of them have failed). Thus there is more loss of value in safe CDOs than in safe MBSs as default rates rise.
Merry Christmas! In the spirit of the season, let’s talk about altruism.
Anybody familiar with the effective altruism movement knows about the concept of earning to give. The idea is that for some people, the ideal altruistic career path might involve them making lots of money in a non-charitable role and then donating a significant fraction of that to effective charities. Estimates from GiveWell for the current lowest cost to save a life put it around $2,300, which indicates that choosing a soulless corporate job that allows you to donate $150,000 a year could actually be better than choosing an altruistic career in which you are on average saving 65 lives per year, or more than a life per week!
What I’m curious about lately is the concept of earning to save up to give. At the time of writing, the rate of return on US treasury bills is 1.53%. US treasury bills are considered to be practically riskless – you get a return on your investment no matter what happens to the economy. Assuming that this rate of return and the $2,300 figure both stay constant, what this means is that by holding off on donating your $150,000 for five years, you expect to be able to donate about $11,830 more, which corresponds to saving 5 extra lives. And if you hold off for twenty years, you will be able to save 23 more lives!
If you choose to take on bigger risk, you can do even better than this. Instead of investing in treasury bills, you could put your money into a diversified portfolio of stocks and expect to get a rate of return of around 7% (the average annualized total return of the S&P 500 for the past 90 years, adjusted for inflation). Now you run the risk of the stock market being in a slump at the end of five years, but even if that happens you can just wait it out longer until the market rises again. In general, as your time horizon for when you’re willing to withdraw your investment expands, your risk drops. If you invest your $150,000 in stocks and withdraw after five years, you expect to save an extra 26 lives than you would by donating right away!
In general, you can choose your desired level of risk and get the best available rate of return by investing in an appropriate linear combination of treasury bills and stocks. And if you want a higher rate of return than 7%, you can take on more risk by leveraging the market (short selling the risk-free asset and using the money to invest more in stocks). In this model, the plan for maximizing charitable giving would be to continuously invest your income at your chosen level of risk, donating nothing until some later date at which time you make an enormous contribution to your choice of top effective charities. In an extreme case, you could hold off on donations your entire life and then finally make the donation in your will!
Alright, so let’s discuss some factors in favor and against this plan.
Factors in favor
Decreasing moral and factual uncertainty
Shrinking frontier of low hanging fruit
Personal moral regression
Compounding interest and vanishing low hanging fruits
First, and most obviously, waiting to give means more money, and more money means more lives. And since your investment grows exponentially, being sufficiently patient can mean large increases in amount donated and lives saved. There’s a wrinkle in this argument though. While your money grows with time, it might be that the effective cost of improving the world grows even quicker. This could result from the steady improvement of the world: problems are getting solved and low hanging altruistic fruits are being taken one at a time, leaving us with a set of problems that take more money to solve, making them less effective in terms of impact per dollar donated. If the benefit of waiting is a 5% annual return on your investment, but the cost is a 6% decrease in the effective number of lives saved per dollar donated, then it is best to donate as soon as possible, so as to ensure that your dollars have the greatest impact.
Estimates of the trends in effectiveness of top charities are hard to come by, but crucially important for deciding when to give and when to wait.
Moral and factual uncertainty
Say that today you donate 50,000 dollars to Charity X, and tomorrow an exposé comes out revealing that this charity is actually significantly less effective than previously estimated. That money is gone now, and you can’t redirect it to a better charity in response to this new information. But if you had waited to donate, you would be able to respond and more accurately target your money to the most effective charities. The longer you wait to donate, the more time there is to gather the relevant data, analyze it, and come to accurate conclusions about the effectiveness of charities. This is a huge benefit of waiting to give.
That was an example of factual uncertainty. You might also be concerned about moral uncertainty, and think that in the future you will have better values than today. For instance, you may think that your future values, after having had more time to reflect and consider new arguments, will be more rational and coherent than your current values. This jumps straight into tricky meta-ethical territory; if you are a moral realist then it makes sense to talk about improving your values, but as a non-realist this is harder to make sense of. Regardless, there certainly is a very common intuition that individuals can make moral progress, and that we can “improve our values” by thinking deeply about ethics.
William MacAskill has talked about a related concept, applied on a broader civilizational level. Here’s a quote from his 80,000 Hours interview:
Different people have different sets of values. They might have very different views for what an optimal future looks like. What we really want ideally is a convergent goal between different sorts of values so that we can all say, “Look, this is the thing that we’re all getting behind that we’re trying to ensure that humanity…” Kind of like this is the purpose of civilization. The issue, if you think about purpose of civilization, is just so much disagreement. Maybe there’s something we can aim for that all sorts of different value systems will agree is good. Then, that means we can really get coordination in aiming for that.
I think there is an answer. I call it the long reflection, which is you get to a state where existential risks or extinction risks have been reduced to basically zero. It’s also a position of far greater technological power than we have now, such that we have basically vast intelligence compared to what we have now, amazing empirical understanding of the world, and secondly tens of thousands of years to not really do anything with respect to moving to the stars or really trying to actually build civilization in one particular way, but instead just to engage in this research project of what actually is a value. What actually is the meaning of life? And have, maybe it’s 10 billion people, debating and working on these issues for 10,000 years because the importance is just so great. Humanity, or post-humanity, may be around for billions of years. In which case spending a mere 10,000 is actually absolutely nothing.
Personal moral regression
On the other side of the issue of changing values over time, we have the problem of personal moral regression. If I’m planning to save up for an eventual donation decades down the line, I might need to seriously worry about the possibility that when the time comes to donate I have become a more selfish person or have lost interest in effective altruism. Plausibly, as you age you might become more attached to your money or expect a higher standard of living than when you were younger. This is another of these factors that is hard to estimate, and depends a lot on the individual.
Earning to give is already a concept that draws criticism in some quarters, and I think that waiting to give may look worse in some ways. I could easily see the mainstream revile the idea of a community of self-proclaimed altruists that mostly sit around and build up wealth with the promise to donate it some time into the future. Tied in with this is the concern that by removing the signaling value of your form of altruism, some of the motivation to actually be altruistic in the first place is lost.
Economists commonly talk about temporal discounting, the idea of weighting current value higher than future value. Give somebody a choice between an ice cream today and two ice creams in a month, and they will likely choose the ice cream today. This indicates that there is some discount rate on future value to make somebody indifferent to a nominally identical current value. This discount rate is often thought of purely descriptively, as a way to model a particular aspect of human psychology, but it is also sometimes factored into recommendations for policy.
For instance, some economists talk about a social discount rate, which represents the idea of valuing future generations less than the current generation. This discount rate actually also factors importantly into the calculation of the appropriate value of a carbon tax. Most major calculations of the carbon tax explicitly use a non-zero social discount rate, meaning that they assume that future generations matter less than current generations. For instance, William Nordhaus’s hugely influential work on carbon pricing used a “3 percent social discount rate that slowly declines to 1 percent in 300 years.”
I don’t think that this makes much moral sense. If you apply a constant discount rate to the future, you can end up saying things like “I’d rather get a nickel today than save the entire planet a thousand years from now.” It seems to me that this form of discounting is simply prejudice against those people are most remote from us: those that are in our future and as such do not exist yet. This paper by Tyler Cowen and Derek Parfit argues against a social discount rate. From the paper:
Remoteness in time roughly correlates with a whole range of morally important facts. So does remoteness in space. Those to whom we have the greatest obligation, our own family, often live with us in the same building. We often live close to those to whom we have other special obligations, such as our clients, pupils, or patients. Most of our fellow citizens live closer to us than most aliens. But no one suggests that, because there are such correlations, we should adopt a spatial discount rate. No one thinks that we would be morally justified if we cared less about the long-range effects of our acts, at some rate of n percent per yard. The temporal discount rate is, we believe, as little justified.
There’s one final consideration I want to bring up, which is more abstract than the previous ones. Earlier I imagined somebody who decides to save up their entire life and make their donation in their will. But we can naturally ask: why not set up your will to wait another twenty years and then donate to the top-rated charities from your estimate of the most reliable charity evaluator? And then why stop at just twenty years? Why not keep on investing, letting your money build up more and more, all to the end of making some huge future donation and makes an absolutely enormous benefit to some future generation? The concern is that this line of reasoning might never terminate.
While this is a fun thought experiment, I think there are a few fairly easy holes that can be poked in it. In reality, you will eventually run up against decreasing marginal value of your money. At some point, the extra money you get by waiting another year actually doesn’t go far enough to make up for the human suffering you could have prevented. Additionally, the issue of vanishing low hanging fruits will become more and more pressing, pushing you towards
We can neatly sum up the previous considerations with the following formula.
V = Q − p(1 − d)(Q + I − F)(R − p)
If V > 0, then giving now is preferable to giving in a year. If V < 0, then you should wait for at least a year.
Q = Current lives saved per dollar
R = Rate of return on investment
I = Reflection factor: yearly increase in lives saved per dollar from better information and longer reflection
p = Regression factor: expected percentage less you are willing to give in a year
F = Low hanging fruit factor: yearly decrease in lives saved per dollar
d = Temporal discount factor: percentage less that lives are valued each year
The ideal setting for waiting to give is where F is near zero (the world’s issues are only very slowly being sorted out), I is large (time for reflection is sorely needed to bring empirical data and moral clarity), p is near zero (no moral regression), R is large (you get a high return on your investment), and d is near zero (you have little to no moral preference for helping current people over future people).
“The only free lunch in finance is diversification”
Suppose you have two assets that you can invest in, and $1000 dollars total to split between them. By the end of year 1, Asset A doubles in price and Asset B halves in price. And by the end of year 2, A halves and B doubles (so that they each return to their starting point).
If you had initially invested all $1000 in Asset A, then after year 1 you would have $2000 total. But after year 2, that $2000 is cut in half and you end up back at $1000. If you had invested the $1000 in Asset B, then you would go down to $500 after year 1 and then back up to $1000 after year 2. Either way, you don’t get any profit.
Now, the question is: can you find some way to distribute the $1000 dollars across Assets A and B such that by the end of year 2 you have made a profit? For fairness sake, you cannot change your distribution at the end of year 1 (as that would allow you to take advantage of the advance knowledge of how the prices will change). Whatever weights you choose initially for Assets A and B, at the end of year 1 you must move around your money so that you ensure it’s still distributed with the exact same weights as before.
So what do you think? Does it seem impossible to make a profit without changing your distribution of money between years 1 and 2? Amazingly, the answer is that it’s not impossible; you can make a profit!
Consider a 50/50 mix of Assets A and B. So $500 initially goes into Asset A and $500 to Asset B. At the end of year 1, A has doubled in price (netting you $500) and B has halved (losing you $250). So at the end of year 1 you have gained 25%.
To keep the weights the same at the start of year 2 as they were at the start of year 1, you redistribute your new total of $1250 across A and B according to the same 50/50 mix ($625 in each). What happens now? Now Asset A halves (losing you $312.50) and Asset B doubles (gaining you $625). And by the end of year 2, you end up with $312.50 in A and $1250 in B, for a total of $1562.50! You’ve gained $562.50 by investing in two assets whose prices ultimately are the same as where they started!
This is the magic of diversification. By investing in multiple independent assets instead of just one, you can up your rate of return and decrease your risk, sometimes dramatically.
Another example: In front of you are two fair coins, A and B. You have in your hand $100 that you can distribute between the two coins any way you like, so long as all $100 is on the table. Now, each of the coins will be tossed. If a coin lands heads, the amount of money beside it will be doubled and returned to you. But if the coin lands tails, then all the money beside it will be destroyed.
In front of you are two coins, A and B. You have $100 dollars to distribute between these two coins. You get to choose how to distribute the $100, but at the end every dollar bill must be beside one coin or the other.
Coin A has a 60% chance of landing H. If it lands H, the amount of money placed beside it will be multiplied by 2.1 and returned to you. But if it lands T, then the money placed beside it will be lost.
Coin B has only a 40% chance of landing H. But the H outcome also has a higher reward! If the coin lands H, then the amount of money beside it will be multiplied by 2.5 and returned to you. And just like Coin A, if it lands T then the money beside it will be destroyed.
Coin A: .6 chance of getting 2.1x return
Coin B: .4 chance of getting 2.5x return
The coins are totally independent. How should you distribute your money in order to maximize your return, given a specific level of risk?
(Think about it for a moment before reading on.)
If you looked at the numbers for a few moments, you might have noticed that Coin A has a higher expected return than Coin B (126% vs 100%) and is also the safer of the two. So perhaps your initial guess was that putting everything into Coin A would minimize risk and maximize return. Well, that’s incorrect! Let’s do the math.
We’ll start by giving the relevant quantities some names.
X = amount of money that is put by Coin A
Y = amount of money that is put by Coin B
(All your money is put down, so Y = 100 – X)
We can easily compute the expected amount of money you end up with, as a function of X:
Alright, so clearly your expected return is maximized by making X as large as possible (by putting all of your money by Coin A). This makes sense, since Coin A’s expected return is higher than Coin B’s. But we’re not just interested in return, we’re also interested in risk. Could we possibly find a better combination of risk and reward by mixing our investments? It might initially seem like the answer is no; after all Coin A is the safer of the two. How could we possibly decrease risk by mixing in a riskier asset to our investments?
The key insight is that even though Coin A is the safer of the two, the risk of Coin B is uncorrelated with the risk of Coin A. If you invest everything in Coin A, then you have a 40% chance of losing it all. But if you split your investments between Coin A and Coin B, then you only lose everything if both coins come up heads (which happens with probability .6*.4 = 24%, much lower than 40%!)
Let’s go through the numbers. We’ll measure risk by the standard deviation of the possible outcomes.
This function is just a parabola (to be precise, Risk2 is a parabola, which means that Risk(x) is a hyperbola). Here’s a plot of Risk(X) vs X (amount placed beside A):
Looking at this plot, you can see that risk is actually minimized at a roughly even mix of A and B, with slightly more in B. You can also see this minimum risk on a plot of return vs risk:
Notice that somebody that puts most of their money on coin B (these mixes are in the bottom half of the curve) is making a strategic choice is strictly dominated. That is, they could choose a different mix that has a higher rate of return for the same risk!
Little did you know, but you’ve actually just gotten an introduction to modern portfolio theory! Instead of putting money beside coins, portfolio managers consider investing in assets with various risks and rates of return. The curve of reward vs risk is famous in finance as the Markowitz Bullet. The upper half of the curve is the set of portfolios that are not strictly dominated. This section of the curve is known as the efficient frontier, the basic idea being that no rational investor would put themselves on the lower half.
Let’s reframe the problem in terms that would be familiar to somebody in finance.
We have two assets, A and B. We’ll model our knowledge of the rate of return of each asset as a normal distribution with some known mean and standard deviation. The mean of the distribution represents the expected rate of return on a purchase of the asset, and the standard deviation represents the risk of purchasing the asset. Asset A has an expected rate of return of 1.2, which means that for every dollar you put in you expect (on average) to get back $0.20 a year from now. Asset B’s expected rate of return is 1.3, so it has a higher average payout. But Asset B is riskier; the standard deviation for A is 0.5, while B’s standard deviation is 0.8. There’s also a risk-free asset that you can invest in, which we’ll call Asset F. This asset has an expected rate of return of 1.1.
Asset F: R = 1.1, σ = 0
Asset A: R = 1.2, σ = 0.5
Asset B: R = 1.3, σ = 0.8
Suppose that you have $1000 that you want to invest in some combination of these three assets in such a way as to maximize your expected rate of return and minimize your risk. Since rate of return and risk will in general be positively correlated, you have to decide the highest risk that you’re comfortable with. Let’s say that you decide that the highest risk you’ll accept is 0.6. Now, how much of each asset should you purchase?
First of all, let’s disregard the risk-free asset and just consider combinations of A and B. A portfolio of A and B is represented by the weighted sum of A and B. wA is the percentage of your investment in the portfolio that goes to just A, and wB is the percentage that goes towards B. Since we’re just considering a combination of A and B for now, wA + wB = 1. The mean and standard deviation of the new distribution for this portfolio will in general depend on the correlation between A and B, which we’ll call ρ. Perfectly correlated assets have ρ = 1, uncorrelated assets have ρ = 0, and perfectly anti-correlated assets have ρ = -1. Correlation between assets is bad for investors, because it destroys the benefit of diversification. If two assets are perfectly correlated, then you don’t get any lower risk by combining them. On the other hand, if they are perfectly anti-correlated, you can entirely cancel the risk from one with the risk from the other, and get fantastically low risks for great rates of return.
RP = wARA + wBRB
σP2 = wA2σA2 + wB2σB2 + 2ρwAwBσAσB
Let’s suppose that the correlation between A and B is ρ = .2. Since both RP and σP are functions of wA and wB, we can visualize the set of all possible portfolios as a curve on a plot of return-vs-risk:
The Markowitz Bullet again! Each point on the curve represents the rate of return and risk of a particular portfolio obtained by mixing A and B. Just like before, some portfolios dominate others and thus should never be used, regardless of your desired level of risk. In particular, for any portfolio that weights asset A (the less risky one) too highly, there are other portfolios that give a higher rate of return with the exact same risk.
In other words, you should pretty much never purchase only a low-risk low-return item. If your portfolio consists entirely of Asset A, then by mixing in a little bit of the higher-risk item, you can actually end up massively decreasing your risk and upping your rate of return. Of course, this drop in risk is only because the two assets are not perfectly correlated. And it would be even more extreme if we had negatively correlated assets; indeed with perfect negative correlation (as we saw in the puzzle I started this post with), your risk can drop to zero!
Now, we can get our desired portfolio with a risk of 0.6 by just looking at the point on this curve that has σP = 0.6 and calculating which values of wA and wB give us this value. But notice that we haven’t yet used our riskless asset! Can we do better by adding in a little of Asset F to the mix? It turns out that yes, we can. In fact, every portfolio is weakly dominated by some mix of that portfolio and a riskless asset!
We can easily calculate what we get by combining a riskless asset with some other asset X (which can in general be a portfolio consisting of multiple assets):
So σP = wXσX, from which we get that RP = (σP/σX)RX + (1 – σP/σX)RF = RF + σP (RX – RF)/σX
What we find is that RP(σP) is just a line whose slope depends on the rate of return and risk of asset X. So essentially, for any risky asset or portfolio you choose, you can easily visualize all the possible ways it can be combined with a risk-free asset by stretching a line from (0, RF) – the risk and return of the risk-free asset – to (sX, RX) – the risk and return of the risky asset. We can even stretch the line beyond this second point by borrowing some of the risk-free asset in order to buy more of the risky asset, which corresponds to a negative weighting wF.
So, we have a quadratic curve representing the possible portfolios obtained from two risky assets, and a line representing the possible portfolios obtained from a risky asset and a risk-free asset. What we can do now is consider the line that starts at (0, RF) and just barely brushes against the quadratic curve – the tangent line to the curve that passes through (0, RF). This point where the curves meet is known as the tangency portfolio.
Every point on this line is a possible combination of Assets A, B, and F. Why? Well, because the points on the line can be thought of as portfolios consisting of Asset F and the tangency portfolio. And here’s the crucial point: this line is above the curve everywhere except at that single point! What this means is that the combination of A, B, and F dominates combinations of just A and B. For virtually any level of desired risk, you do better by choosing a portfolio on the line than by choosing the portfolio on the quadratic curve! (The only exception to this is the point at which the two curves meet, and in that case you do equally well.)
And that is how you optimize your rate of return for a desired level of risk! First generate the hyperbola for portfolios made from your risky assets, then find the tangent to that curve that passes through the point representing the risk-free asset, and then use that line to calculate the optimal portfolio at your chosen level of risk!
If in the future we develop the ability to make accurate simulations of other humans, a lot of things will change for the weirder. In many situations where agents with access to simulations of each other interact, a strange type of apparent backward causality will arise. For instance…
Imagine that you’re competing in a prisoner’s dilemma against an agent that you know has access to a scarily accurate simulation of you. Prior to your decision to either defect or cooperate, you’re mulling over arguments for one course of action over the other. In such a situation, you have to continuously face the fact that for each argument you come up with, your opponent has already anticipated and taken measures to respond to it. As soon as you think up a new line of reasoning, you must immediately update as if your opponent has just heard your thoughts and adjusted their strategy accordingly. Even if your opponent’s decision has already been made and is set in stone, you have no ability to do anything that your opponent hasn’t already incorporated into their decision. Though you are temporally ahead of them, they are in some sense causally ahead of you. Their decision is a response to your (yet-to-be-made) decision, and your decision is not a response to theirs. (I don’t actually believe and am not claiming that this apparent backwards causality is real backwards causality. The arrow of time still points in only one direction and causality follows suit. But it’s worth it in these situations to act as if your opponent has backwards causation abilities.)
When you have two agents that both have access to simulations of the other, things get weird. In such situations, there’s no clear notion of whose decision is a response to the other (as both are responding to each other’s future decision), and so there’s no clear notion of whose decision is causally first. But the question of “who comes first” (in this strange non-temporal sense) turns out to be very important to what strategy the various agents should take!
Let’s consider some examples.
Two agents are driving head-on towards each other. Each has a choice to swerve or to stay driving straight ahead. If they both stay, then they crash and die, the worst outcome for all. If one stays and the other swerves, then the one that swerves pays a reputational cost and the one that stays gains some reputation. And if both swerve, then neither gains or loses any reputation. To throw some numerical values on these outcomes, here’s a payoff matrix:
This is the game of chicken. It is an anti-cooperation game, in that if one side knows what the other is going to do, then they want to do the opposite. The (swerve, swerve) outcome is unstable, as both players are incentivized to stay if they know that their opponent will swerve. But so is the (stay, stay) outcome, as this is the worst possible outcome for both players and they both stand to gain by switching to swerve. There are two pure strategy Nash equilibria (swerve, stay) and (stay, swerve), and one mixed strategy equilibria (with the payoff matrix above, it corresponds to swerving with probability 90% and staying with probability 10%).
That’s all standard game theory, in a setting where you don’t have access to your opponent’s algorithm. But now let’s take this thought experiment to the future, where each player is able to simulate the other. Imagine that you’re one of the agents. What should you do?
The first thought might be the following: you have access to a simulation of your opponent. So you can just observe what the simulation of your opponent does, and do the opposite. If you observe the simulation swerving you stay, and if you observe the simulation staying you swerve. This has the benefit of avoiding the really bad (stay, stay) outcomes, while also exploiting opponents that decide to swerve.
The issue is that this strategy is exploitable. While you’re making use of your ability to simulate your opponent, you are neglecting the fact that your opponent is also simulating you. Your opponent can see that this is your strategy, so they know that whatever they decide to play, you’ll play the opposite. So if they decide to tear off their steering wheel to ensure that they will not swerve no matter what, they know that you’ll fall in line and swerve, thus winning them +1 utility and losing you -1 utility. This is a precommitment: a strategy that an agent uses that restricts the number of future choices available to them. It’s quite unintuitive and cool that this sort of tying-your-hands ends up being an enormously powerful weapon for those that have access to it.
In other words, if Agent 1 sees that Agent 2 is treating their decision as a fixed fact and responding to it accordingly, then Agent 1 gets an advantage, as they can precommit to staying and force Agent 2 to yield to them. But if Agent 2 now sees Agent 1 as responding to their algorithm rather than the other way around, then Agent 2 benefits by precommitting to stay. If there’s a fact about which agent precommits “first”, then we can conclusively say that this agent does better, as they can force the outcome they want. But again, this is not a temporal first. Suppose that Agent 2 is asleep at the wheel, about to wake up, and Agent 1 is trying to decide what to do. Agent 1 simulates them and sees that once they wake up they will tear out their steering wheel without even considering what Agent 2 does. Now Agent 1’s hand is forced; he will swerve in response to Agent 2’s precommitment, even though it hasn’t yet been made. It appears that for two agents in a chicken-like scenario, with access to simulations of one another, the best action is to precommit as quickly and firmly as possible, with as little regard for their opponents’ precommitments as they can manage (the best-performing agent is the one that tears off their steering wheel without even simulating their opponent and seeing their precommitments, as this agent puts themselves fully causally behind anybody that simulates them). But this obviously just leads straight to the (stay, stay) outcome!
This pattern of precommitting, then precommitting to not respond to precommitments, then precommiting to not respond to precommitments to not respond to precommitments, and so on, shows up all over the place. Let’s have another example, from the realm of economics.
Company Coordination and Boycotts
In my last post, I talked about the Cournot model of firms competing to produce a homogenous good. We saw that competing firms face a coordination problem with respect to the prices they set: every firm sees it in their rational self-interest to undercut other firms to take their customers, but then other firms follow suit, ending up with the price dropping for everybody. That’s good for consumers, but bad for producers! The process of undercutting and then re-equilibrating continues until the price is at the bare minimum that it takes for a producer to be willing to make the good – essentially just minutely above the cost of production. At this point, producers are making virtually no profit and consumer surplus is maximized.
This coordination problem, like all coordination problems, could be solved if only the firms had the ability to precommit. Imagine that the heads of all the companies meet up at some point. They all see the problem that they’re facing, and recognize that if they can stop the undercutting, they’ll all be much richer. So they sign on to a vow to never undercut each other. Of course, signing a piece of paper doesn’t actually restrict your future options. Every company is still just as incentivized as before to break the agreement and undercut their competitors. It helps if they have plausible deniability; the ability to say that their price drop was actually not intended to undercut, but a response to some unrelated change in the market. All that the meeting does is introduce some social cost to undercutting and breaking the vow that wasn’t there before.
To actually permanently fix the coordination problem, the companies need to be able to sign on to something that truly and irrevocably ties their hands, giving them no ability to back out later on (equivalent to the tearing-off-the-steering-wheel as a credible precommitment). Maybe they all decide to put some money towards the creation of a final-check mechanism that looks over all price changes and intervenes to stop any changes that it detects to be intended to undercut opponents. This is precommitment in the purer sense of literally removing an option that the firms previously had. And if this type of tying-of-hands was actually possible, then each company would be rationally incentivized to sign on! (Of course, they’d all be looking for ways to cheat the system and break the mechanism at every step, which would make its actual creation a tad bit difficult.)
So, if you give all companies the ability to jointly sign on to a credible precommitment to not undercut their opponents, then they will take that opportunity. This will keep prices high and keep profits flowing in to the companies. Producer surplus will be maximized, and consumers will get the short end of the stick. Is there any way for the consumers to fight back?
Sure there is! All they need is the ability to precommit as well. Suppose that all consumers are now given the opportunity to come together and boycott any and all companies that precommit to not undercutting each other. If every consumer signs on, and if producers know this, then it’s no longer worth it for them to put in place the price-monitoring mechanism, as they’d just lose all their customers! Of course, the consumers now face their own coordination problem; many of them will still value the product at a price higher than that which is being offered by the companies, even if they’re colluding. And each individual reasons that as long as everybody else is still boycotting the companies, it makes little difference if just one mutually beneficial trade is made with them. So the consumers will themselves face the problem of how to enforce the boycott. But let’s assume that the consumers work this out so that they credibly precommit to never buying from a company that credibly precommits to not undercutting its competitors. Now the market price swings back in their favor, dropping to the cost of production! The consumers win! Whoohoo!
But we’re not done yet. It was only worth it for the consumers to sign on to this precommitment because they predicted that the companies would respond to their precommitment. But what if the companies, seeing the boycott-tactic coming, credibly precommit to never yielding to boycotters? Then the consumers, responding to this precommitment, will realize that boycotting will have no effect on prices, and will just cause them all to lose out on mutually beneficial trades! So they won’t boycott, and therefore the producers get the surplus once more. And just like before, this swings back and forth, with the outcome at each stage depending on which agent treats the other agent’s precommitment as being more primal. But if they each run their apparently-best strategy (that is, making their precommitments with no regard to the precommitments of the other party so as to force their hand and place their own precommitments at the beginning of the causal chain), then we end up with the worst possible outcome for all: producers don’t produce anything and consumers don’t consume, and everybody loses out.
This question of how agents that can simulate one another AND precommit to courses of action should ultimately behave is something that I find quite puzzling and am not sure how to resolve.
The Cournot model is a simple economic model used to describe what happens when multiple companies compete with one another to produce some homogenous product. I’ve been playing with it a bit and ended up solving the general linear case. I assume that this solution is already known by somebody, but couldn’t find it anywhere. So I will post it here! It gives some interesting insight into the way that less-than-perfectly-competitive markets operate. First let’s talk about the general structure of the Cournot model.
Suppose we have n firms. Each produces some quantity of the product, which we’ll label as . The total amount of product on the market will be given the label . Since the firms are all selling identical products, it makes sense to assume that the consumer demand function will just be a function of the total quantity of the product that is on the market: . (This means that we’re also disregarding effects like customer loyalty to a particular company or geographic closeness to one company location over another. Essentially, the only factor in a consumer’s choice of which company to go to is the price at which that company is selling the product.)
For each firm, there is some cost to producing the good. We capture this by giving each firm a cost function . Now we can figure out the profit of each firm for a given set of output values . We’ll label the profit of the kth firm as . This profit is just the amount of money they get by selling the product minus the cost of producing the product: .
If we now assume that all firms are maximizing profit, we can find the outputs of each firm by taking the derivative of the profit and setting it to zero. . This is a set of n equations with n unknown, so solving this will fully specify the behavior of all firms!
Of course, without any more assumptions about the functions and , we can’t go too much further with solving this equation in general. To get some interesting general results, we’ll consider a very simple set of assumptions. Our assumptions will be that both consumer demand and producer costs are linear. This is the linear Cournot model, as opposed to the more general Cournot model.
In the linear Cournot model, we write that (for some a and b) and . As an example, we might have that P(Q) = $100 – $2 × Q, which would mean that at a price of $40, 30 units of the good will be bought total.
The constants represent the marginal cost of production for each firm, and the linearity of the cost function means that the cost of producing the next unit is always the same, regardless of how many have been produced before. (This is unrealistic, as generally it’s cheaper per unit to produce large quantities of a good than to produce small quantities.)
Now we can write out the profit-maximization equations for the linear Cournot model. . Rewriting, we get . We can’t immediately solve this for , because remember that Q is the sum of all the quantities produced. All n of the quantities we’re trying to solve are in each equation, so to solve the system of equations we have to do some linear algebra!
Translating this to a matrix equation…
Now if we could only find the inverse of the first matrix, we’d have our solution!
I found the inverse of this matrix by using the symmetry in the matrix to decompose it into two matrices that were each easier to work with:
As a hypothesis, suppose that the inverse matrix has a similar form (one value for the diagonal elements, and another value for all off-diagonal elements). This allows us to write an equation for the inverse matrix:
To solve this, we’ll use the following easily proven identities.
Alright awesome! Our hypothesis turned out to be true! (And it would have even if the entries in our matrix hadn’t been 1s and 2s. This is a really cool general method to find inverses of this family of matrices.) Now we just use this inverse matrix to solve for the output from each firm!
And there we have it, the full solution to the general linear Cournot model! Let’s discuss some implications of these results. First of all, let’s look at the two extreme cases: monopoly and perfect competition.
Monopoly: n = 1
Perfect Competition: n → ∞
The first observation is that the behavior of the market under monopoly looks very different from the case of perfect competition. For one thing, notice that the price under perfect competition is always going to be lower than the price under monopoly. This is a nice demonstration of the so-called monopoly markup. The quantity intuitively corresponds to the highest possible price you could get for the product (the most that the highest bidder would pay). And the quantity , the production cost, is the lowest possible price at which the product would be sold. So the monopoly price is the average of the highest price you could get for the good and the lowest price at which it could be sold.
The flip side of the monopoly markup is that less of the good is produced and sold under a monopoly than under perfect competition. There are trades that could be happening (trades which would be mutually beneficial!) which do not occur. Think about it: the monopoly price is halfway between the cost of production and the highest bidder’s price. This means that there are a bunch of people that would buy the product at above the cost of production but below the monopoly price. And since the price they would buy it for is above the cost of production, this would be a profitable exchange for both sides! But alas, the monopoly doesn’t allow these trades to occur, as it would involve lowering the price for everybody, including those who are willing to pay a higher price, and thus decreasing net profit.
Things change as soon as another firm joins the market. This firm can profitably sell the good at a lower price than the monopoly price and snatch up all of their business. This introduces a downward pressure on the price. Here’s the exact solution for the case of duopoly.
Duopoly: n = 2
Interestingly, in the duopoly case the market price still rests at a value above the marginal cost of production for either firm. As more and more firms enter the market, competition pushes the price down further and further until, in the limit of perfect competition, it converges to the cost of production.
The implication of this is that in the limit of perfect competition, firms do not make any profit! This may sound a little unintuitive, but it’s the inevitable consequence of the line of argument above. If a bunch of companies were all making some profit, then their price is somewhere above the cost of production. But this means that one company could slightly lower its price, thus snatching up all the customers and making massively more money than its competitors. So its competitors will all follow suit, pushing down their prices to get back their customers. And in the end, all the firms will have just decreased their prices and their profits, even though every step in the sequence appeared to be the rational and profitable action by each firm! This is just an example of a coordination problem. If the companies could all just agree to hold their price fixed at, say, the monopoly price, then they’d all be better off. But each individual has a strong monetary incentive to lower their price and gather all the customers. So the price will drop and drop until it can drop no more (that is, until it has reached the cost of production, at which point it is no longer profitable for a company to lower their price).
This implies that in some sense, the limit of perfect competition is the best possible outcome for consumers and the worst outcome for producers. Every consumer that values the product above the cost of its production will get it, and they will all get it at the lowest possible price. So the consumer surplus will be enormous. And companies producing the product make no net profit; any attempt to do so immediately loses them their entire customer base. (In which case, what is the motivation for the companies to produce the product in the first place? This is known as the Bertrand paradox.)
We can also get the easier-to-solve special case where all firms have the same cost of production.
Equal Production Costs
It’s curious that in the Cournot model, prices don’t immediately drop to production levels as soon you go from a monopoly to a duopoly. After all, the intuitive argument I presented before works for two firms: if both firms are pricing the goods at any value above zero, then each stands to gain by lowering the price a slight bit and getting all the customers. And this continues until the price settles at the cost of production. We didn’t build in any ability of the firms to collude to the model, so what gives? What the Cournot model tells us is certainly more realistic (we don’t expect a duopoly to behave like a perfectly competitive market), but where does this realism come from?
The answer is that in a certain sense we did build in collusion between firms from the start, in the form of agreement on what price to sell at. Notice that our model did not allow different firms to set different prices. In this model, firms compete only on quantity of goods sold, not prices. The price is set automatically by the consumer demand function, and no single individual can unilaterally change their price. This constraint is what gives us the more realistic-in-character results that we see, and also what invalidates the intuitive argument I’ve made here.
One final observation. Consider the following procedure. You line up a representative from each of the n firms, as well as the highest bidder for the product (representing the highest price at which the product could be sold). Each of the firms states their cost of production (the lowest they could profitably bring the price to), and the highest bidder states the amount that he values the product (the highest price at which he would still buy it). Now all of the stated costs are averaged, and the result is set as the market price of the good. Turns out that this procedure gives exactly the market price that the linear Cournot model predicts! This might be meaningful or just a curious coincidence. But it’s quite surprising to me that the slope of the demand curve () doesn’t show up at all in the ultimate market price, only the value that the highest bidder puts on the product!
Is there a paradox in the continued existence of prediction markets? Recently I’ve been wondering this. Let me start with a little background for those that are unfamiliar with the concept of prediction markets.
Prediction markets are markets that allow you to bet on the outcomes of real-life events. This gives financial incentives to predict accurately, and as such the market price of a given bet reflects a kind of aggregate credence for that event occurring. There’s a whole bunch of results, theoretical and applied, that indicate that prediction markets serve to give robustly accurate probability estimates for real-world events.
Here’s a great paper by Robin Hanson about a political system based on prediction markets, named futarchy. Essentially, the idea is that voters determine a nation’s values, so as to generate some average national welfare metric, and then betting markets are used to decide policy. Some quotes:
On info-failures as a primary problem for democracy
According to many experts in economics and development, governments often choose policies that are “inefficient” in the sense that most everyone could expect to gain from other feasible policies. Many other kinds of experts also see existing policies as often clearly inferior to known alternatives.
If inferior policies would not have been adopted had most everyone known they are inferior, and if someone somewhere knew or could have learned that they are inferior, then we can blame inferior policies on a failure of our “info” institutions. By “info” here I just mean clues and analysis that should change our beliefs. Our info institutions are those within which we induce, express, and evaluate the acquiring and sharing of info. They include public relations teams, organized interest groups, news media, conversation forums, think tanks, universities, journals, elite committees, and state agencies. Inferior policies happen because our info institutions fail to induce people to acquire and share relevant info with properly-motivated decision makers.
Where might we find better info institutions? According to most experts in economics and finance, speculative markets are exemplary info institutions. That is, active speculative markets do very well at inducing people to acquire info, share it via trades, and collect that info into consensus prices that persuade wider audiences. This great success suggests that we should consider augmenting our political info institutions with speculative market institutions. That is, perhaps we should encourage people to create, trade in, and heed policy-relevant speculative markets, instead of discouraging such markets as we do today via anti-gambling laws.
Laying out the proposal
In futarchy, democracy would continue to say what we want, but betting markets would now say how to get it. That is, elected representatives would formally define and manage an after-the-fact measurement of national welfare, while market speculators would say which policies they expect to raise national welfare. The basic rule of government would be:
When a betting market clearly estimates that a proposed policy would increase expected national welfare, that proposal becomes law.
Futarchy is intended to be ideologically neutral; it could result in anything from an extreme socialism to an extreme minarchy, depending on what voters say they want, and on what speculators think would get it for them.
Futarchy seems promising if we accept the following three assumptions:
Democracies fail largely by not aggregating available information.
It is not that hard to tell rich happy nations from poor miserable ones.
Betting markets are our best known institution for aggregating information.
On the success of prediction markets
Betting markets, and speculative markets more generally, seem to do very well at aggregating information. To have a say in a speculative market, you have to “put your money where your mouth is.” Those who know they are not relevant experts shut up, and those who do not know this eventually lose their money, and then shut up. Speculative markets in essence offer to pay anyone who sees a bias in current market prices to come and correct that bias.
Speculative market estimates are not perfect. There seems to be a long-shot bias when there are high transaction costs, and perhaps also excess volatility in long term aggregate price movements. But such markets seem to do very well when compared to other institutions. For example, racetrack market odds improve on the predictions of racetrack experts, Florida orange juice commodity futures improve on government weather forecasts, betting markets beat opinion polls at predicting U.S. election results, and betting markets consistently beat Hewlett Packard official forecasts at predicting Hewlett Packard printer sales. In general, it is hard to find information that is not embodied in market prices.
On the possibility of manipulation of prediction markets
We want policy-related info institutions to resist manipulation, that is, to resist attempts to influence policy via distorted participation. Speculative markets do well here because they deal well with “noise trading,” that is, trading for reasons other than info about common asset values. When other traders can’t predict noise trading exactly, they compensate for its expected average by an opposite average trade, and compensate for its expected variation by trading more, and by working harder to find relevant info. Theory says that if trader risk-aversion is mild, and if more effort gives more info, then increased noise trading increases price accuracy. And in fact, the most accurate real speculative markets tend to be those with the most noise trading.
What do noise traders have to do with manipulators? Manipulators, who trade hoping to distort prices, are noise traders, since they trade for reasons other than asset value info. Thus adding manipulators to speculative markets doesn’t reduce average price accuracy. This has been verified in theory, in laboratory experiments, and in the field.
Futarchy remains for me one of the coolest and most exciting ideas I’ve heard in political philosophy, and prediction markets fascinate me. But for today, I have the following question about their feasibility:
If the only individuals that are able to consistently profit off the prediction market are the best predictors, then why wouldn’t the bottom 50% of predictors continuously drop out as they lose money on the market? If so, then as the population of market participants dwindles you would end up with a small fraction of really good predictors, each of whom sometimes gets lucky and makes money and sometimes is unlucky and loses some. On average, these people won’t be able to make money any more (as the ability to make money relies on the participation of inferior predictors in the market), so they’ll drop out as well.
If this line of reasoning is right, then it seems like prediction markets should inevitably collapse as their user base drops out. Why, then, do sites like PredictIt keep functioning?
One possibility is that there’s something wrong with the argument. This is honestly where most of my credence lies; tons of smart people endorse the idea, and this seems like a fairly obviously central flaw in the concept for them all to miss. If this argument isn’t wrong, though, then we have an interesting phenomenon to explain.
One explanation that came to my mind is that the continued survival of prediction markets is only possible because of a bug in human psychology, namely, a lack of epistemic humility. People are on average overly confident in their beliefs, and so uninformed people will continue confidently betting on propositions, even when they are generally betting against individuals with greater expertise.
Is this really what’s going on? I’m not sure. I would be surprised if humans were actually overconfident enough to continue betting on a market that they are consistently losing money on. Maybe they’d find some way to rationalize dropping out of the market that doesn’t amount to them admitting “My opinion is not worth as much as I thought it was”, but surely they would eventually stop betting after enough losses (putting aside whatever impulses drive people to gamble on guaranteed negative-expected-value games until they lose all their money.) On the other hand, it could be that the traffic of less-informed individuals does not consist of the same individuals betting over and over, and instead a constant crowd of new sheep coming in to be exploited by those more knowledgeable. What do you think? How do you explain this?
IQ is an increasingly controversial topic these days. I find that when it comes up, different people seem to be extremely confident in wildly different beliefs about the nature of IQ as a measure of intelligence.
Part of this has to do with education. This paper analyzed the top 29 most used introductory psychology textbooks and “found that 79.3% of textbooks contained inaccurate statements and 79.3% had logical fallacies in their sections about intelligence.” 
This is pretty insane, and sounds kinda like something you’d hear from an Alex Jones-style conspiracy theorist. But if you look at what the world’s experts on human intelligence say about public opinion on intelligence, they’re all in agreement: misinformation about IQ is everywhere. It’s gotten to the point where world-famous respected psychologists like Steven Pinker are being blasted as racists in articles in mainstream news outlets for citing basic points of consensus in the scientific literature.
The reasons for this are pretty clear… people are worried about nasty social and political implications of true facts about IQ. There are worthwhile points to be made about morally hazardous beliefs and the possibility that some truths should not be publicly known. At the same time, the quantification and study of human intelligence is absurdly important. The difference between us and the rest of the animal world, the types of possible futures that are open to us as a civilization, the ability to understand the structure of the universe and manipulate it to our ends; these are the types of things that the subject of human intelligence touches on. In short, intelligence is how we accomplish anything as a civilization, and the prospect of missing out on ways to reliably intervene and enhance it because we avoided or covered up research that revealed some inconvenient truths seems really bad to me.
Overall, I lean towards thinking that the misinformation is so great, and the truth so important, that it’s worthwhile to attempt to clear things up. So! The purpose of this post is just to sort through some of the mess and come up with a concise and referenced list of some of the most important things we know about IQ and intelligence.
The most replicated finding in all of psychology is that good performance on virtually all cognitively demanding tasks is positively correlated. The name for whatever cognitive faculty causes this correlation is “general intelligence”, or g.
A definition of intelligence from 52 prominent intelligence researchers: 
Intelligence is a very general capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It is not merely book learning, a narrow academic skill, or test‑taking smarts. Rather, it reflects a broader and deeper capability for comprehending our surroundings—‘catching on’, ‘making sense’ of things, or ‘figuring out’ what to do. Intelligence, so defined, can be measured, and intelligence tests measure it well.
IQ tests are among the most reliable and valid of all psychological tests and assessments. 
They are designed to test general intelligence, and not character or personality.
Modern IQ tests have a standard error of measurement of about 3 points.
The distribution of IQs in a population nicely fits a Bell curve.
IQ is defined in such a way as to make the population mean exactly 100, and the standard deviation 15.
People with high IQs tend to be healthier, wealthier, live longer, and have more successful careers. 
IQ is highly predictive of educational aptitude and job performance. 
Longitudinal studies have shown that IQ “is a causal influence on future achievement measures whereas achievement measures do not substantially influence future IQ scores.” 
Average adult combined IQs associated with real-life accomplishments by various tests
MDs, JDs, and PhDs
1–3 years of college
Clerical and sales workers
High school graduates, skilled workers (e.g., electricians, cabinetmakers)
1–3 years of high school (completed 9–11 years of school)
This doesn’t mean that 30-year-old you is no smarter than 10-year-old you. It means that if you test the IQ of a bunch of children, and then later test them as adults, the rank order will remain roughly the same. A smarter-than-average 10 year old becomes a smarter-than-average 30 year old.
After your mid-20s, crystallized intelligence plateaus and fluid intelligence starts declining. Obligatory terrifying graph: (source)
High IQ is correlated with more gray matter in the brain, larger frontal lobes, and a thicker cortex. 
“There is a constant cascade of information being processed in the entire brain, but intelligence seems related to an efficient use of relatively few structures, where the more gray matter the better.” 
“Estimates of how much of the total variance in general intelligence can be attributed to genetic influences range from 30 to 80%.” 
Twin studies show the same results; there are substantial genetic influences on human intelligence. 
The genetic component of IQ is highly polygenic, and no specific genes have been robustly associated with human intelligence. The best we’ve found so far is a single gene that accounts for 0.1% of the variance in IQ. 
Many genes have been weakly associated with IQ. “40% of the variation in crystallized-type intelligence and 51% of the variation in fluid-type intelligence between individuals” is accounted for by genetic differences. 
Scientists can predict your IQ by looking only at your genes (not perfectly, but significantly better than random). 
This study analyzed 549,692 base pairs and found a R = .11 mean correlation between their predictions and the actual fluid intelligence of over 3500 unrelated adults. 
You might be wondering at this point what all the controversy regarding IQ is about. Why are so many people eager to dismiss IQ as a valid measure of intelligence? Well, we now dive straight into the heart of the controversy: intergroup variation in IQ.
It’s worth noting that, as Scott Alexander puts it: society is fixed, while biology is mutable. This fear we have that if biology factors into the underperformance of some groups, then such difference are intrinsically unalterable, makes little sense. We can do things to modify biology just as we can do things to modify society, and in fact the first is often mucheasier to do and more effective than the easier.
Anyway, prelude aside, we dive into the controversy.
Group differences in IQ
Yes, there are racial differences in IQ, both globally and within the United States. This has been studied to death, and is a universal consensus; you won’t find a single paper in a reputable psychology journal denying the numerical differences. 
Within the United States, there is a long-standing 1 SD (15 to 18 point) IQ difference between African Americans and White Americans. 
The tests in which these differences are most pronounced are those that most closely correspond to g, like Raven’s Progressive Matrices.  This test also is free of culturally-loaded knowledge, and only requires being able to solve visual pattern-recognition puzzles like these ones:
Controlling for the way the tests are formulated and administered does not affect this difference. 
IQ scores predict success equally accurately regardless of race or social class. This provides some evidence that the test is not culturally biased as a predictor.  
Internationally, the lowest average IQs are found in sub-Saharan Africa and the highest average IQs are found in East Asia. The variations span a range of three standard deviations (45 IQ points). 
Malawi has an estimated average IQ of 60.
Singapore and Hong Kong have estimated IQs around 108.
A large survey published in one of the top psychology journals polled over 250 experts on IQ and international intelligence differences. 
On possible causes of cross-national differences in cognitive ability: “Genes were rated as the most important cause (17%), followed by educational quality (11.44%), health (10.88%), and educational quantity (10.20%).”
“Around 90% of experts believed that genes had at least some influence on cross-national differences in cognitive ability.”
Men and women have equal average IQs.
But: “most IQ tests are constructed so that there are no overall score differences between females and males.” 
They do this by removing items that show significant sex differences. So, for instance, men have a 1 SD (15 point) advantage on visual-spatial tasks over women. Thus mental rotation tests have been removed, in order to reduce the perception of bias. 
Males also do better on proportional and mechanical reasoning and mathematics, while females do better on verbal tests. 
Hormones are thought to play a role in sex differences in cognitive abilities. 
Females that are exposed to male hormones in utero have higher spatiotemporal reasoning scores than females that are not. 
The same thing is seen with men that have higher testosterone levels, and older males given testosterone. 
There is also some evidence of men having a higher IQ variance than women, but this seems to be disputed. If true, it would indicate more men at the very bottom and the very top of the IQ scale (helping to explain sex disparities in high-IQ professions). 
In the developed world, average IQ has been increasing by 2 to 3 points per decade since 1930. This is called the Flynn effect.
The average IQ in the US in 1932, as measured by a 1997 IQ test, would be around 80. People with IQ 80 and below correspond to the bottom 9% of the 1997 population. 
Some studies have found that the Flynn effect seems to be waning in the developing world, and beginning in the developing world. 
A large survey of experts found that most attribute the Flynn effect to “better health and nutrition, more and better education and rising standards of living.” 
The Flynn effect is not limited to IQ tests, but is also found in memory tests, object naming, and other commonly used neuropsychological tests. 
Many studies indicate that the black-white IQ gap in the United States is closing. 
Can IQ be increased?
There are not any known interventions to reliably cause long term increases (although decreasing it is easy).
Essentially, you can do a handful of things to ensure that your child’s IQ is not low (give them access to education, provide them good nutrition, prevent iodine deficiency, etc), but you can’t do much beyond these.
Educational intervention programs have fairly unanimously failed to show long-term increases in IQ in the developed world. 
The best prekindergarten programs have a substantial short-term effect on IQ, but this effect fades by late elementary school.
Several large-scale longitudinal studies have found that children with higher IQ are more likely to have used illegal drugs by middle age. This association is stronger for women than men. 
This actually makes some sense, given that IQ is positively correlated with Openness (in the Big Five personality traits breakdown).
The average intelligence of Marines has been significantly declining since 1980. 
“The US military has minimum enlistment standards at about the IQ 85 level. There have been two experiments with lowering this to 80 but in both cases these men could not master soldiering well enough to justify their costs.” (from Wiki)
This is fairly terrifying when you consider that 10% of the US population has an IQ of 80 or below; evidently, this enormous segment of humanity has an extremely limited capacity to do useful work for society.
Researchers used to think that IQ declined significantly starting around age 20. Subsequently this was found to be mostly a product of the Flynn effect: as average IQ increases, the normed IQ value inflates, so a constant IQ looks like it decreases. (from Wiki)
The popular idea that listening to classical music increases IQ has not been borne out by research. (Wiki)
There’s evidence that intelligence is part of the explanation for differential health outcomes across socioeconomic class.
“…Health workers can diagnose and treat incubating problems, such as high blood pressure or diabetes, but only when people seek preventive screening and follow treatment regimens. Many do not. In fact, perhaps a third of all prescription medications are taken in a manner that jeopardizes the patient’s health. Non-adherence to prescribed treatment regimens doubles the risk of death among heart patients (Gallagher, Viscoli, & Horwitz, 1993). For better or worse, people are substantially their own primary health care providers.” “For instance, one study (Williams et al., 1995) found that, overall, 26% of the outpatients at two urban hospitals were unable to determine from an appointment slip when their next appointment was scheduled, and 42% did not understand directions for taking medicine on an empty stomach. The percentages specifically among outpatients with inadequate literacy were worse: 40% and 65%, respectively. In comparison, the percentages were 5% and 24% among outpatients with adequate literacy. In another study (Williams, Baker, Parker, & Nurss, 1998), many insulin-dependent diabetics did not understand fundamental facts for maintaining daily control of their disease: Among those classified as having inadequate literacy, about half did not know the signs of very low or very high blood sugar, and 60% did not know the corrective actions they needed to take if their blood sugar was too low or too high. Among diabetics, intelligence at time of diagnosis correlates significantly (.36) with diabetes knowledge measured 1 year later (Taylor, Frier, et al., 2003).” 
IQ differences might be able to account for a significant portion of global income inequality.
“… in a conventional Ramsey model, between one-fourth and one-half of income differences across countries can be explained by a single factor: The steady-state effect of large, persistent differences in national average IQ on worker productivity. These differences in cognitive ability – which are well-supported in the psychology literature – are likely to be malleable through better nutrition, better education, and better health care in the world’s poorest countries. A simple calibration exercise in the spirit of Bils and Klenow (AER, 2000) and Castro (Rev. Ec. Dyn., 2005) is conducted. According to the model, a move from the bottom decile of the global IQ distribution to the top decile will cause steady-state living standards to rise by between 75 and 350 percent. I provide evidence that little of IQ-productivity relationship is likely to be due to reverse causality.” 
Exposure to lead hampers cognitive development and lowers IQ. You can calculate the economic boost the US received as a result of the dramatic reduction in children’s exposure to lead since the 1970s and the resulting increase in IQs.
“The base-case estimate of $213 billion in economic benefit for each cohort is based on conservative assumptions about both the effect of IQ on earnings and the effect of lead on IQ.” 
Yes. $213 billion.
In a 113-country analysis, IQ has been found to positively affect all main measures of institutional quality.
“The results show that average IQ positively affects all the measures of institutional quality considered in our study, namely government efficiency, regulatory quality, rule of law, political stability and voice and accountability. The positive effect of intelligence is robust to controlling for other determinants of institutional quality.” 
High IQ people cooperate more in repeated prisoner’s experiments; 5% to 8% more cooperation per 100 point increase in SAT score (7 pt IQ increase). 
The second paper also shows more patience and higher savings rates for higher IQ. 
Embryo selection is a possible way to enhance the IQ of future generations, and is already technologically feasible.
“Biomedical research into human stem cell-derived gametes may enable iterated embryo selection (IES) in vitro, compressing multiple generations of selection into a few years or less.” 
Average IQ gain
1 in 2
1 in 10
1 in 100
1 in 1000
There is a ridiculous amount of research out there on IQ, and you can easily reach any conclusion you want by just finding some studies that agree with you. I’ve tried to stick to relying on large meta-analyses, papers of historical significance, large surveys of experts, and summaries by experts of consensus views.
(This post is a summary of the main things I found while diving into the economics literature on income inequality. Will try to condense my findings as much as possible, but there’s a lot to talk about. TL;DR at the end for lazy folk)
First, a note on terminology
Before getting into the published research on this topic, I started by surveying articles from popular news sources. I was curious to ultimately compare the standard media presentation to what I’d find in the scientific literature.
A large portion of what I read consisted of debates about the meanings of terms – one person says that capitalism is a lightly regulated market with a social safety net, another says any social safety net is socialism and therefore not capitalism, another says that a free market with any form of government regulation is corporatism, not capitalism, and they all yell at each other about terms and don’t get anything done.
By contrast, the terminology used in the economics and public policy literature was consistent, straightforward, and clear. I’ll define the controversial terms right here at the start to avoid confusion. These definitions are in line with the way that the terms are used in the literature.
Economic freedom: A combination of factors including limited regulation of businesses, protected rights to own private property, trade freedom, and small government.
(Preliminary post – am planning to write this all up more digestibly in a future post)
Free markets and income inequality
Capital in the 21st Century (Piketty)
When the rate of return on capital is greater than the rate of economic growth (as tends to occur in a free market given time), this leads to a concentration of wealth.
Wage Inequality: A Story of Policy Choices (Mishel, Scmitt, Shierholz)
Income inequality is the result of erosion of the minimum wage value, decreased union power, industrial deregulation, traid policy, failure to use fiscal spending to stimulate the economy, bad monetary policy by the Fed, and rent-seeking behaviors from CEOs.
Controversies about the Rise of American Inequality: A Survey(Gordon, Dew-Becker)
Rising inequality is due to a low minimum wage, the decline in unionization, audience magnification, generous stock options, and unregulated corporate wage practices, not imports, immigration, or a lower labor share of income.
Declining Labor and Capital Shares (Barkai)
Capital shares have declined faster than labor shares in the last 30 years, and the decline of labor shares is due entirely to an increase in markups, which decreases output and consumer welfare.
Why Hasn’t Democracy Slowed Rising Inequality? (Bonica, McCarty, Poole, Rosenthal)
Democracy hasn’t slowed the rise in inequality because of a political acceptance of free-market capitalism, immigration and a low turnout of poor voters, rising real income and wealth making social insurance less attractive, money influencing politics, and distortion of democracy through gerrymandering.
Billionaire Bonanza (Collins, Hoxie)
The people at the top are crazy rich and we should tax them.
Economic Freedom of the World: 2017 Annual Report (Gwartney, Lawson, Hall)
Economic freedom is strongly correlated with rapid growth, higher average income per capita, lower poverty rates, higher income amount/share for the poorest 10%, higher life expectancy, more civil liberties and political rights, more gender equality, greater happiness, and better access to electricity, gas, and water supplies.
The great Chinese inequality turnaround (Kanbur, Wang, Zhang)
Drop in Chinese inequality is due to tightening of rural labor markets from migration, government investment in infrastructure in the rural sector, minimum wage policies, and social programs.
(Some speculative rambling about stuff I’ve been thinking about recently.)
There’s a fallacy that I have committed hundreds of times, and that I have only really recently internalized as a fallacy. Perhaps it is not a fallacy, but a confused pattern of thought. In any case, I’ll call it “the incomprehensibility of the complex.”
Here’s the context in which I would make the mistake:
Somebody brings up some political or economic question, say “Should we have left Iraq?” or “Should we raise the minimum wage?”
This sparks a fierce debate. Somebody says that removing the troops left the region defenseless against takeover by extremist groups, or that extra wages given to workers go back into the economy and stimulate the economy. Another objects that our troops were ultimately the source of the instability, or cite the broken-window fallacy.
And I would think: “The world is crazily complicated. Physicists can barely understand complex atoms. Now scale that complexity up to interactions between hundreds of millions of humans, each one a system of a hundred trillion trillion atoms. This should put into perspective the proper degree of epistemic humility we should hold when discussing the minimum wage.”
Basically: If we can’t understand atoms, then we sure as hell can’t understand economic systems or international relations.
Observing that this is a bad argument is not too profound or interesting.
What’s interesting to me is the fact that this is a bad argument. That is, the fact that we can scale up the complexity of the system we are studying by a factor of 10^30, squint our eyes, and then get to work at creating fantastically simple and accurate models of the system. This is absolutely insane, and tells us something about the type of universe that we live in.
Recently I watched a lecture on Marginal Revolution University about gun buyback programs and slave redemption policies. The gist of it is this:
Starting in 1993, some humanitarian groups got in their head that they could save Sudanese slaves by buying them from their owners and then freeing them. This maybe sounds like a good idea, until you learn about supply and demand curves.
In truth, what the slave redeemers ended up doing was increasing demand for slaves, resulting in new slaves being captured and tens of thousands of dollars ending up in the hands of slave-owners. Fresh revenue funded weapons purchases, further enabling slave traders to raid villages and capture new slaves.
A similar thing can happen with gun buyback programs. These programs involve the buying of guns in large quantities from gun owners in order to melt them down, the thought being that this will get the guns off of the street. The effect of this?
Well, the gun producers thank their new customers for the money and start manufacturing more guns to supply their larger customer base. In some cases violent crime rates jumped, and a study measuring if these programs actually decrease violent crime rates overall found no statistically significant effects.
Now, I’m ashamed to say that these programs actually initially seemed like fine ideas to me. This is really a statement of my failure to have internalized how supply and demand curves work. In my defense, this is not always a totally horrible policy idea. When demand is much more elastic than supply, the price of the good will jump and many of the original buyers will be priced out of the market. In other words, if the producers have a harder time scaling up their operations than the consumers have buying less of the good, then the world will actually end up freer of slaves/guns.
But that is not how these markets actually work. Demand for guns is in fact less elastic than supply of guns, so the gun nuts are barely affected and the ungun-nuts are handing over free money to the gun manufacturers.
And one more example from Marginal Revolution. Sorry, but we’re on the topic of unintuitive basic econ and it’s just too good to leave out.
In 1990 the United States passed a policy that applied a tax on luxury goods like yachts. The idea, it seems, was, “The federal budget deficit is too high, and if we tax the rich on their fancy luxury goods, we can reduce the deficit without really hurting anybody.” Sounds good, yes?
But what actually happened was that as the price of yachts increased, rich people bought less, and thousands of laborers in the yacht industry lost their jobs. When all was said and done, the government ended up paying more in increased unemployment benefits than they gained in tax revenue from the policy! The government quickly wised up and repealed the tax a few years after it was put in place.
How to understand this? Easy! Draw a graph of supply and demand. Which one has a steeper slope? Well, rich people can fairly easily just spend their money differently if yacht prices increase. They care less about one less yacht than the workers that survive off of the wage they got making that yacht.
So the yacht-buyers will more easily leave the market than the yacht-producers, which means the demand for yachts is more elastic than the supply, which means that the producers are hurt more by the tax.
The point is, the model works! It makes weird-sounding and unintuitive predictions, and it turns out to be right. Literally just draw two lines and assess their relative slopes, and you can understand why a tax will sometimes burn consumers and other times burn producers. (You can also do better than the US government in 1990 apparently, but maybe this shouldn’t be surprising)
A simple model of our economy as a bunch of supply and demand curves with varying elasticities has enormous explanatory power. This is a breathtakingly simple model of a breathtakingly complex system. And it tells us something important about the world that it works at all.
Okay, enough fun with econ. All of this was just to say that I feel thoroughly rebutted in my old view that things like interactions of humans are too complex to be understood by anybody. So we have our mystery: how does simplicity arise out of complexity?
Here’s my attempt at an answer: simplicity arises when the universe is playing an optimization game with a simple target.
If every few seconds God scanned the universe, erased the least macroscopically circular shapes, and duplicated the rest, then you would quickly expect to be able the universe to consist of only circles. More to the point, it would quickly become possible to accurately model the universe as a bunch of circles of various sizes at various locations.
The clearest real world example of something like this is natural selection. Natural selection is a process that is optimizing biological systems for a simple target – reproductive fitness. It kills off variation and only lets those few forms that are able to reproduce successfully survive into the next generation.
In this sense, natural selection prunes down the complexity of the world, replacing the incomprehensible with the comprehensible. What was initially a high-entropy system, describable only at the level of fundamental physics, becomes a low-entropy system, describable by a few simple biological principles. Instead of having to describe the organism in full glorious detail at the level of quarks and electrons, we just need to explain how it won the optimization game of natural selection.
Gravity gives us another example of an optimization game our universe plays. Once you get enough mass in one place, gravity will crush it inward towards the center of mass, gradually inching diverse macroscopic shapes towards sphericity.
Which is why every large object you’ll see in the sky looks perfectly spherical. Any large objects that started off clunky and non-spherical were ruthlessly optimized into sphericity. (Actually they are oblate spheroids, but that’s because technically the optimization game they’re playing is gravity + angular momentum)
So why do supply and demand curves do a great job at predicting interactions between massive numbers of humans? The implied answer is that humans are the result of an optimization game that has made our behaviors simply describable in terms of supply and demand curves.
What exactly does this mean? Perhaps a trait that enhances reproductive fitness in organisms like us is the cognitive skill to make tradeoffs between different desires, and this gives rise to some type of universal comparison metric between very different goods. Now we can sensibly say things like “I want ice cream less than I want to enjoy a beautiful sunset. Except orange custard chocolate chip ice cream. I’d trade off the sunset for orange custard chocolate chip ice cream any day.”
Then somebody comes along with a bright idea called ‘money’, and suddenly we have a great generalization about human behavior: “Everybody wants more money.” From this, some basic notions like a downward-sloping demand curve, an upward-sloping supply curve, and a push towards equilibrium follow quite nicely. And we have a crazily simple high-level explanation of the crazily complex phenomenon of human interaction.