Six Case Studies in Consequentialist Reasoning

Consequentialism is a family of moral theories that say that an act is moral or immoral based on its consequences. If an act has overall good consequences then it is moral, and if it has bad consequences then it is immoral. What precisely counts as a “good” or “bad” consequence is what distinguishes one consequentialist theory from another. For instance, act utilitarians say that the only morally relevant feature of the consequences of our actions is the aggregate happiness and suffering produced, while preference utilitarians say that the relevant feature of the consequences is the number and strength of desires satisfied. Another form of consequentialism might strike a balance between aggregate happiness and social equality.

What all these different consequentialist theories have in common is that the ultimate criteria being used to evaluate the moral status of an action is only a function of the consequences of that action, as opposed to, say, the intentions behind the action, or whether the action is an instance of a universalizable Kantian rule.

In this essay, we’ll explore some puzzles in consequentialist theories that force us to take a more nuanced and subtle view of consequentialism. These puzzles are all adapted from Derek Parfit’s Reasons and Persons, with very minor changes.

First, we’ll consider a simple puzzle regarding how exactly to evaluate the consequences of one’s actions, when one is part of a collective that jointly accomplishes some good.

Case 1: There are 100 miners stuck in a mineshaft with flood waters rising. These men can be brought to the surface in a lift raised by weights on long levers. The leverage is such that just four people can stand on a platform and provide sufficient weight to raise the lift and save the lives of the hundred men. But if any fewer than four people stand on the platform, it will not be enough to raise the lift. As it happens, you and three other people happen to be standing there. The four of you stand on the platform, raising the lift and saving the lives of the hundred men.

The question for us to consider is, how many lives did you save by standing on the platform? The answer to this question matters, because to be a good consequentialist, each individual needs to be able to compare their contribution here to the contribution they might make by going elsewhere. As a first thought, we might say that you saved 100 lives by standing on the platform. But the other three people were in the same position as you, and it seems a little strange to say that all four of you saved 100 lives each (since there weren’t 400 lives saved total). So perhaps we want to say that each of you saved one quarter of the total: 25 lives each.

Parfit calls this the Share-of-the-Total View. We can characterize this view as saying that in general, if you are part of a collective of N people who jointly save M lives, then your share of lives saved is M/N.

There are some big problems with this view. To see this, let’s amend Case 1 slightly by adding an opportunity cost.

Case 2: Just as before, there are 100 miners stuck in a mineshaft with flood waters rising, and they can be saved by four or more people standing on a platform. This time though, you and four other people happen to be standing there. The other four are going to stand on the platform no matter what you do. Your choice is either to stand on the platform, or to go elsewhere to save 10 lives. What should you do?

The correct answer here is obviously that you should leave to save the 10 lives. The 100 miners will be saved whether you stay or leave, and the 10 lives will be lost if you stick around. But let’s consider what the Share-of-the-Total View says. According to this view, if you stand on the platform, your share of the lives saved is 100/5 = 20. And if you leave to go elsewhere, you only save 10 lives. So you save more lives by staying and standing on the platform!

This is a reductio of the Share-of-the-Total View. We must revise this view to get a sensible consequentialist theory. Parfit’s suggestion is that we say that when you join others who are doing good, the good that you do is not just your own share of the total benefit. You should also add to your share the change that you caused in the shares of the benefits produced by each other by joining. On their own, the four would each have a share of 25 lives. So by joining, you have a share of 20 lives, minus the 5 lives that have been reduced from the share of each of the other four. In other words, by joining, you have saved 20 – 5(4) lives, in other words, 0 lives. And of course, this is the right answer, because you have done nothing at all by stepping onto the platform!

Applying our revised view to Case 1, we see that if you hadn’t stepped onto the platform, zero lives would be saved. By stepping onto the platform, 100 lives are saved. So your share of those lives is 25, plus 25 lives for each of the others that would have had zero without you. So your share is actually 100 lives! The same applies to the others, so in our revised view, each of the four is responsible for saving all 100 lives. Perhaps on reflection this is not so unintuitive; after all, it’s true for each of them that if they change their behavior, 100 lives are lost.

Case 3: Just as in Case 2, there are 100 miners stuck in a mineshaft. You and four others are standing on the platform while the miners are slowly being raised up. Each of you know of an opportunity to save 10 lives elsewhere (a different 10 lives for each of you), but to successfully save the lives you have to leave immediately, before the miners are rescued. The five of you have to make your decision right away, without communicating with each other.

We might think that if each of the five of you reasons as before, each of you will go off and save the other 10 lives (as by staying, they see that they are saving zero lives). In the end, 50 lives will be saved and 100 lost. This is not good! But in fact, it’s not totally clear that this is the fault of our revised view. The problem here is lack of information. If each of the five knew what the other four planned on doing, then they would make the best decision (if all four planned to stay then the fifth would leave, and if one of the other four planned to leave then the fifth would stay). As things stand, perhaps the best outcome would be that all five stay on the platform (losing the opportunity to save 10 extra lives, but ensuring the safety of the 100). If they can use a randomized strategy, then the optimal strategy is to each stay on the platform with probability 97.2848% (saving an expected 100.66 lives)

Miners Consequentialism

Let’s move on to another type of scenario.

Case 4: X and Y simultaneously shoot and kill me. Either shot, by itself, would have killed.

The consequence of X’s action is not that I die, because if X had not shot, I would have died by Y’s bullet. And the same goes for Y. So if we’re evaluating the morality of X or Y’s action based on its consequences, it seems that we have to say that neither one did anything immoral. But of course, the two of them collectively did do something immoral by killing me. What this tells us that the consequentialist’s creed cannot be “an act is immoral if its consequences are bad”, as an act can also be immoral if it is part of a set of acts whose collective consequences are bad.

Inheriting immorality from group membership has some problems, though. X and Y collectively did something immoral. But what about the group X, Y, and Barack Obama, who was napping at home when this happened? The collective consequences of their actions were bad as well. So did Obama do something immoral too? No. We need to restrict our claim to the following:

“When some group together harm or benefit other people, this group is the smallest group of whom it is true that, if they had all acted differently, the other people would not have been harmed, or benefited.” -Parfit

A final scenario involves the morality of actions that produce imperceptible consequences.

Case 5: One million torturers stand in front of one million buttons. Each button, if pushed, induces a tiny stretch in each of a million racks, each of which has a victim on it. The stretch induced by a single press of the button is so minuscule that it is imperceptible. But the stretch induced by a million button presses produces terrible pain in all the victims.

Clearly we want to say that each torturer is acting immorally. But the problem is that the consequences of each individual torturer’s action are imperceptible! It’s only when enough of the torturers press the button that the consequence becomes perceptible. So what we seem to be saying is that it’s possible to act immorally, even though your action produces no perceptible change in anybody’s conscious experience, if your action is part of a collection of actions that together produce negative changes in conscious experiences.

This is already unintuitive. But we can make it even worse.

Case 6: Consider the final torturer of the million. At the time that he pushes his button, the victims are all in terrible agony, and his press doesn’t make their pain any perceptibly worse. Now, imagine that instead of there being 999,999 other torturers, there are zero. There is just the one torturer, and the victims have all awoken this morning in immense pain, caused by nobody in particular. The torturer presses the button, causing no perceptible change in the victims’ conditions. Has the torturer done something wrong?

It seems like we have to say the same thing about the torturer in Case 6 as we did in Case 5. The only change is that Nature has done the rest of the harm instead of other human beings, but this can’t matter for the morality of the torturer’s action. But if we believe this, then the scope of our moral concerns is greatly expanded, to a point that seems nonsensical. My temptation here is to say “all the worse for consequentialism, then!” and move to a theory that inherently values intentions, but I am curious if there is a way to make a consequentialist theory workable in light of these problems.

Advertisements

Will we ever build a Galactic Empire?

I devoured science fiction as a kid. One of my favorite books was Asimov’s Foundation, which told the story of a splintering Galactic Empire and the seeds of a new civilization that would grow from its ashes. I marveled and was filled with joy at the notion that in our far future humanity might spread across the galaxy, settling countless new planets and bringing civilization to every corner of it.

Then I went to college and learned physics, especially special relativity and cosmology, and learned that there are actually some hugely significant barriers in the way of these visions ever being manifest in reality. Ever since, it has seemed obvious to me that the idea of a galactic civilization, or of humans “colonizing the universe” are purely fantasy. But I am continuously surprised when I hear those interested in futurism throw around these terms casually, as if their occurrence were an inevitability so long as humans stay around long enough. This is deeply puzzling to me. Perhaps these people are just being loose with their language, and when they refer to a galactic civilization, they mean something much more modest, like a loosely-connected civilization spread across a few nearby stars. Or maybe there is a fundamental misunderstanding of both the limitations imposed on us by physics and the the enormity of the scales in question. Either way, I want to write a post explaining exactly why these science fiction ideas are almost certainly never going to become reality.

My argument in brief:

  1. We are really slow.
  2. The galaxy is really big.
  3. The universe is even bigger.

Back in 1977, humans launched the Voyager 1 spacecraft, designed to explore the outer solar system and then to head for the stars. Its trajectory was coordinated to slingshot around Jupiter and then Saturn, picking up speed at each stage, and then finally to launch itself out of the solar system into the great beyond. 36 years later, in 2012, it has finally left the Sun’s heliopause, marking the first steps into interstellar space. It is now 11.7 billion miles from Earth, which sounds great until you realize that this distance is still less than two-tenths of a single percent of one light year.

Compare this to, say, the distance to the nearest star Alpha Centauri, we find that it has traveled less than .05% of the distance. At this rate, it would take another 80,000 years to make contact with Alpha Centauri (if it were aimed in that direction)! On the scale of distances between stars, our furthest current exploration has gotten virtually nowhere. It’s the equivalent of somebody who started in the center of the Earth hoping to burrow up all the way to the surface, and takes 5 days to travel a single meter. Over 42 years, they would have travelled just under 3000 meters.

OK, you say, but that’s unfair. The Voyager was designed in the 70s! Surely we could do better with modern space shuttle design. To which my reply is, sure we can do better now, but not better enough. The fastest thing humans have ever designed is the Helios 2 shuttle, which hit a speed of 157,078 mph at its fastest (.023% of the speed of light). If we packed some people in this shuttle (which we couldn’t) and sent it off towards the nearest star (assuming that it stayed at this max speed the entire journey), guess how long it would take? Over 18 thousand years.

And keep in mind, this is only talking about the nearest star, let alone spreading across the galaxy! It should be totally evident to everybody that the technology we would need to be able to reach the stars is still far far away. But let’s put aside the limits of our current technology. We are still in our infancy as a space-faring species in many ways, and it is totally unfair to treat our current technology level as if it’s something set in stone. Let’s give the futurists the full benefit of the doubt, and imagine that in the future humans will be able to harness incredible quantities of energy that vastly surpass anything we have today. If we keep increasing our energy capacity more and more, is there any limitation on how fast we could get a spacecraft going? Well, yes there is! There is a fundamental cosmic speed limit built into the of the universe, which is of course the speed of light. The speed-vs-energy curve has a vertical asymptote at this value; no finite amount of energy can get you past this speed.

So much for arbitrarily high speeds. What can we do with spacecraft traveling near the speed of light? It turns out, a whole lot more! Suppose we can travel at 0.9 times the speed of light, and grant also that the time to accelerate up to this speed is insignificant. Now it only takes us 4.7 years to get to the nearest star! The next closest star takes us 6.6 years. Next is 12.3 years. This is not too bad! Humans in the past have made years-long journeys to discover new lands. Surely we could do it again to settle the stars.

But now let’s consider the trip to the center of the galaxy. At 90% the speed of light, this journey would take 29,000 years. As you can see, there’s a massive difference between jumping to nearby stars and attempting to actually traverse the galaxy. The trouble is that most people don’t realize just how significant this change in distance scale is. When you hear “distance to Alpha Centauri” and “distance to the center of the Milky Way”, you are probably not intuitively grasping how hugely different these two quantities are. Even if we had a shuttle traveling at 99.99% times the speed of light, it would take over 100,000 years to travel the diameter of the Milky Way.

You might be thinking “Ok, that is quite a long time. But so what! Surely there will be some intrepid explorers that are willing to leave behind the safety of settled planets to bring civilization to brand new worlds. After all, taking into account time dilation, a 1000 year journey from Earth’s perspective only takes 14 years to pass from the perspective of a passenger on a ship traveling at 99.99% percent of the speed of light.”

And I accept that! It doesn’t seem crazy that if humans develop shuttle that travel a significant percentage of the speed of light, we could end up with human cities scattered all across the galaxy. But now the issue is with the idea of a galactic CIVILIZATION. Any such civilization faces a massive problem, which is the issue of communication. Even having a shared society between Earth and a planet around our nearest star would be enormously tricky, being that any information sent from one planet to the other would take more than four years to arrive. The two societies would be perpetually four years behind each other, and this raises some serious issues for any central state attempting to govern both. And it’s safe to say that no trade can exist between two planets that have a 10,000 year delay in communication. Nor can any diplomacy, or common leadership, or shared culture or technology, or ANY of the necessary prerequisites for a civilization.

I think that for these reasons, it’s evident that the idea of a Galactic Empire like Asimov’s could never come into being. The idea of such a widespread civilization must rely on an assumption that humans will at some point learn to send faster than light messages. Which, sure, we can’t rule out! But everything in physics teaches us that we should be betting very heavily against it. Our attitude towards the idea of an eventual real life Galactic Empire should be similar to our attitude towards perpetual motion machines, as both rely on a total rewriting of the laws of physics as we know them.

But people don’t stop at a Galactic Empire. Not naming names, but I hear people that I respect throwing around the idea that humans can settle the universe. This is total madness. The jump in distance scale from diameters of galaxies to the size of the observable universe, is 40 times larger than the jump in distance scale we previously saw from nearby stars to galactic diameters. The universe is really really big, and as it expands, every second more of it is vanishing from our horizon of observable events.

Ok, so let’s take the claim to be something much more modest than the literal ‘settle galaxies all across the observable universe’. Let’s take it that they just mean that humans will spread across our nearest neighborhood of galaxies. The problem with this is again that the distance scale in question is just unimaginably large. Clearly we can’t have a civilization across galaxies (as we can’t even have a civilization across a single galaxy). But now even making the trip is a seemingly insurmountable hurdle. The Andromeda Galaxy (the nearest spiral galaxy to the Milky Way) is 2 million light years away. Even traveling at 99.99% of the speed of light, this would be a 28,000 year journey for those within the ship. From one edge of the Virgo supercluster (our cluster of neighbor galaxies) to the other takes 110 million years at the speed of light. To make this trip doable in a single lifetime requires a speed of 99.999999999999% of c (which would result in the trip taking 16 years inside the spacecraft). The kinetic energy required to get a mass m to these speeds is given by KE = (γ – 1) mc2. If our shuttle and all the people on it have a mass of, say, 1000 kg, the required energy ends up being approximately the entire energy output of the Sun per second.

Again, it’s possible that if humans survive long enough and somehow get our hands on the truly enormous amounts of energy required to get this close to the speed of light, then we could eventually have human societies springing up in nearby galaxies, in total isolation from one another. But (1) it’s far from obvious that we will ever be capable of mastering such enormous amounts of energy, and (2) I think that this is not what futurists are visualizing when they talk about settling the universe.

Let me just say that I still love science fiction, love futurism, and stand in awe when I think of all the incredible things the march of technological progress will likely bring us in the future. But this seems to be one area where the way that our future prospects are discussed is really far off the mark, and where many people would do well to significantly adjust their estimates of the plausibility of the future trajectories they are imagining. Though science and future technology may bring us many incredible new abilities and accomplishments as a species, a galactic or intergalactic civilization is almost certainly not one of them.

Earning to give or waiting to give

Merry Christmas! In the spirit of the season, let’s talk about altruism.

Anybody familiar with the effective altruism movement knows about the concept of earning to give. The idea is that for some people, the ideal altruistic career path might involve them making lots of money in a non-charitable role and then donating a significant fraction of that to effective charities. Estimates from GiveWell for the current lowest cost to save a life put it around $2,300, which indicates that choosing a soulless corporate job that allows you to donate $150,000 a year could actually be better than choosing an altruistic career in which you are on average saving 65 lives per year, or more than a life per week!

What I’m curious about lately is the concept of earning to save up to give. At the time of writing, the rate of return on US treasury bills is 1.53%. US treasury bills are considered to be practically riskless – you get a return on your investment no matter what happens to the economy. Assuming that this rate of return and the $2,300 figure both stay constant, what this means is that by holding off on donating your $150,000 for five years, you expect to be able to donate about $11,830 more, which corresponds to saving 5 extra lives. And if you hold off for twenty years, you will be able to save 23 more lives!

If you choose to take on bigger risk, you can do even better than this. Instead of investing in treasury bills, you could put your money into a diversified portfolio of stocks and expect to get a rate of return of around 7% (the average annualized total return of the S&P 500 for the past 90 years, adjusted for inflation). Now you run the risk of the stock market being in a slump at the end of five years, but even if that happens you can just wait it out longer until the market rises again. In general, as your time horizon for when you’re willing to withdraw your investment expands, your risk drops. If you invest your $150,000 in stocks and withdraw after five years, you expect to save an extra 26 lives than you would by donating right away!

In general, you can choose your desired level of risk and get the best available rate of return by investing in an appropriate linear combination of treasury bills and stocks. And if you want a higher rate of return than 7%, you can take on more risk by leveraging the market (short selling the risk-free asset and using the money to invest more in stocks). In this model, the plan for maximizing charitable giving would be to continuously invest your income at your chosen level of risk, donating nothing until some later date at which time you make an enormous contribution to your choice of top effective charities. In an extreme case, you could hold off on donations your entire life and then finally make the donation in your will!

Alright, so let’s discuss some factors in favor and against this plan.

Factors in favor
Compounding interest
Decreasing moral and factual uncertainty

Factors against
Shrinking frontier of low hanging fruit
Personal moral regression
Bad PR
Future discounting
Infinite procrastination

Compounding interest and vanishing low hanging fruits

First, and most obviously, waiting to give means more money, and more money means more lives. And since your investment grows exponentially, being sufficiently patient can mean large increases in amount donated and lives saved. There’s a wrinkle in this argument though. While your money grows with time, it might be that the effective cost of improving the world grows even quicker. This could result from the steady improvement of the world: problems are getting solved and low hanging altruistic fruits are being taken one at a time, leaving us with a set of problems that take more money to solve, making them less effective in terms of impact per dollar donated. If the benefit of waiting is a 5% annual return on your investment, but the cost is a 6% decrease in the effective number of lives saved per dollar donated, then it is best to donate as soon as possible, so as to ensure that your dollars have the greatest impact.

Estimates of the trends in effectiveness of top charities are hard to come by, but crucially important for deciding when to give and when to wait.

Moral and factual uncertainty

Say that today you donate 50,000 dollars to Charity X, and tomorrow an exposé comes out revealing that this charity is actually significantly less effective than previously estimated. That money is gone now, and you can’t redirect it to a better charity in response to this new information. But if you had waited to donate, you would be able to respond and more accurately target your money to the most effective charities. The longer you wait to donate, the more time there is to gather the relevant data, analyze it, and come to accurate conclusions about the effectiveness of charities. This is a huge benefit of waiting to give.

That was an example of factual uncertainty. You might also be concerned about moral uncertainty, and think that in the future you will have better values than today. For instance, you may think that your future values, after having had more time to reflect and consider new arguments, will be more rational and coherent than your current values. This jumps straight into tricky meta-ethical territory; if you are a moral realist then it makes sense to talk about improving your values, but as a non-realist this is harder to make sense of. Regardless, there certainly is a very common intuition that individuals can make moral progress, and that we can “improve our values” by thinking deeply about ethics.

William MacAskill has talked about a related concept, applied on a broader civilizational level. Here’s a quote from his 80,000 Hours interview:

Different people have different sets of values. They might have very different views for what an optimal future looks like. What we really want ideally is a convergent goal between different sorts of values so that we can all say, “Look, this is the thing that we’re all getting behind that we’re trying to ensure that humanity…” Kind of like this is the purpose of civilization. The issue, if you think about purpose of civilization, is just so much disagreement. Maybe there’s something we can aim for that all sorts of different value systems will agree is good. Then, that means we can really get coordination in aiming for that.

I think there is an answer. I call it the long reflection, which is you get to a state where existential risks or extinction risks have been reduced to basically zero. It’s also a position of far greater technological power than we have now, such that we have basically vast intelligence compared to what we have now, amazing empirical understanding of the world, and secondly tens of thousands of years to not really do anything with respect to moving to the stars or really trying to actually build civilization in one particular way, but instead just to engage in this research project of what actually is a value. What actually is the meaning of life? And have, maybe it’s 10 billion people, debating and working on these issues for 10,000 years because the importance is just so great. Humanity, or post-humanity, may be around for billions of years. In which case spending a mere 10,000 is actually absolutely nothing.

Personal moral regression

On the other side of the issue of changing values over time, we have the problem of personal moral regression. If I’m planning to save up for an eventual donation decades down the line, I might need to seriously worry about the possibility that when the time comes to donate I have become a more selfish person or have lost interest in effective altruism. Plausibly, as you age you might become more attached to your money or expect a higher standard of living than when you were younger. This is another of these factors that is hard to estimate, and depends a lot on the individual.

Bad PR

Earning to give is already a concept that draws criticism in some quarters, and I think that waiting to give may look worse in some ways. I could easily see the mainstream revile the idea of a community of self-proclaimed altruists that mostly sit around and build up wealth with the promise to donate it some time into the future. Tied in with this is the concern that by removing the signaling value of your form of altruism, some of the motivation to actually be altruistic in the first place is lost.

Future discounting

Economists commonly talk about temporal discounting, the idea of weighting current value higher than future value. Give somebody a choice between an ice cream today and two ice creams in a month, and they will likely choose the ice cream today. This indicates that there is some discount rate on future value to make somebody indifferent to a nominally identical current value. This discount rate is often thought of purely descriptively, as a way to model a particular aspect of human psychology, but it is also sometimes factored into recommendations for policy.

For instance, some economists talk about a social discount rate, which represents the idea of valuing future generations less than the current generation. This discount rate actually also factors importantly into the calculation of the appropriate value of a carbon tax. Most major calculations of the carbon tax explicitly use a non-zero social discount rate, meaning that they assume that future generations matter less than current generations. For instance, William Nordhaus’s hugely influential work on carbon pricing used a “3 percent social discount rate that slowly declines to 1 percent in 300 years.”

I don’t think that this makes much moral sense. If you apply a constant discount rate to the future, you can end up saying things like “I’d rather get a nickel today than save the entire planet a thousand years from now.” It seems to me that this form of discounting is simply prejudice against those people are most remote from us: those that are in our future and as such do not exist yet. This paper by Tyler Cowen and Derek Parfit argues against a social discount rate. From the paper:

Remoteness in time roughly correlates with a whole range of morally important facts. So does remoteness in space. Those to whom we have the greatest obligation, our own family, often live with us in the same building. We often live close to those to whom we have other special obligations, such as our clients, pupils, or patients. Most of our fellow citizens live closer to us than most aliens. But no one suggests that, because there are such correlations, we should adopt a spatial discount rate. No one thinks that we would be morally justified if we cared less about the long-range effects of our acts, at some rate of n percent per yard. The temporal discount rate is, we believe, as little justified.

Infinite procrastination

There’s one final consideration I want to bring up, which is more abstract than the previous ones. Earlier I imagined somebody who decides to save up their entire life and make their donation in their will. But we can naturally ask: why not set up your will to wait another twenty years and then donate to the top-rated charities from your estimate of the most reliable charity evaluator? And then why stop at just twenty years? Why not keep on investing, letting your money build up more and more, all to the end of making some huge future donation and makes an absolutely enormous benefit to some future generation? The concern is that this line of reasoning might never terminate.

While this is a fun thought experiment, I think there are a few fairly easy holes that can be poked in it. In reality, you will eventually run up against decreasing marginal value of your money. At some point, the extra money you get by waiting another year actually doesn’t go far enough to make up for the human suffering you could have prevented. Additionally, the issue of vanishing low hanging fruits will become more and more pressing, pushing you towards

Summing up

We can neatly sum up the previous considerations with the following formula.

V = Q − p(1 − d)(Q + I − F)(R − p)
If V > 0, then giving now is preferable to giving in a year. If V < 0, then you should wait for at least a year.

Q = Current lives saved per dollar
R = Rate of return on investment
I = Reflection factor: yearly increase in lives saved per dollar from better information and longer reflection
p = Regression factor: expected percentage less you are willing to give in a year
F = Low hanging fruit factor: yearly decrease in lives saved per dollar
d = Temporal discount factor: percentage less that lives are valued each year

The ideal setting for waiting to give is where F is near zero (the world’s issues are only very slowly being sorted out), I is large (time for reflection is sorely needed to bring empirical data and moral clarity), p is near zero (no moral regression), R is large (you get a high return on your investment), and d is near zero (you have little to no moral preference for helping current people over future people).

Diversification is magic

“The only free lunch in finance is diversification”

Harry Markowitz

Suppose you have two assets that you can invest in, and $1000 dollars total to split between them. By the end of year 1, Asset A doubles in price and Asset B halves in price. And by the end of year 2, A halves and B doubles (so that they each return to their starting point).

Diversify

If you had initially invested all $1000 in Asset A, then after year 1 you would have $2000 total. But after year 2, that $2000 is cut in half and you end up back at $1000. If you had invested the $1000 in Asset B, then you would go down to $500 after year 1 and then back up to $1000 after year 2. Either way, you don’t get any profit.

Now, the question is: can you find some way to distribute the $1000 dollars across Assets A and B such that by the end of year 2 you have made a profit? For fairness sake, you cannot change your distribution at the end of year 1 (as that would allow you to take advantage of the advance knowledge of how the prices will change). Whatever weights you choose initially for Assets A and B, at the end of year 1 you must move around your money so that you ensure it’s still distributed with the exact same weights as before.

So what do you think? Does it seem impossible to make a profit without changing your distribution of money between years 1 and 2? Amazingly, the answer is that it’s not impossible; you can make a profit!

Consider a 50/50 mix of Assets A and B. So $500 initially goes into Asset A and $500 to Asset B. At the end of year 1, A has doubled in price (netting you $500) and B has halved (losing you $250). So at the end of year 1 you have gained 25%.

To keep the weights the same at the start of year 2 as they were at the start of year 1, you redistribute your new total of $1250 across A and B according to the same 50/50 mix ($625 in each). What happens now? Now Asset A halves (losing you $312.50) and Asset B doubles (gaining you $625). And by the end of year 2, you end up with $312.50 in A and $1250 in B, for a total of $1562.50! You’ve gained $562.50 by investing in two assets whose prices ultimately are the same as where they started!

Diversify 2

This is the magic of diversification. By investing in multiple independent assets instead of just one, you can up your rate of return and decrease your risk, sometimes dramatically.

Another example: In front of you are two fair coins, A and B. You have in your hand $100 that you can distribute between the two coins any way you like, so long as all $100 is on the table. Now, each of the coins will be tossed. If a coin lands heads, the amount of money beside it will be doubled and returned to you. But if the coin lands tails, then all the money beside it will be destroyed.

In front of you are two coins, A and B. You have $100 dollars to distribute between these two coins. You get to choose how to distribute the $100, but at the end every dollar bill must be beside one coin or the other.

Coin A has a 60% chance of landing H. If it lands H, the amount of money placed beside it will be multiplied by 2.1 and returned to you. But if it lands T, then the money placed beside it will be lost.

Coin B has only a 40% chance of landing H. But the H outcome also has a higher reward! If the coin lands H, then the amount of money beside it will be multiplied by 2.5 and returned to you. And just like Coin A, if it lands T then the money beside it will be destroyed.

Coin A: .6 chance of getting 2.1x return
Coin B: .4 chance of getting 2.5x return

The coins are totally independent. How should you distribute your money in order to maximize your return, given a specific level of risk?

(Think about it for a moment before reading on.)

If you looked at the numbers for a few moments, you might have noticed that Coin A has a higher expected return than Coin B (126% vs 100%) and is also the safer of the two. So perhaps your initial guess was that putting everything into Coin A would minimize risk and maximize return. Well, that’s incorrect! Let’s do the math.

We’ll start by giving the relevant quantities some names.

X = amount of money that is put by Coin A
Y = amount of money that is put by Coin B
(All your money is put down, so Y = 100 – X)

We can easily compute the expected amount of money you end up with, as a function of X:

Expected return (X)
= (0.6)(0.4)(2.1X + 2.5Y) + (0.6)(0.6)(2.1X) + (0.4)(0.4)(2.5Y) + (0.4)(0.6)(0)
= 100 + .26 X

Alright, so clearly your expected return is maximized by making X as large as possible (by putting all of your money by Coin A). This makes sense, since Coin A’s expected return is higher than Coin B’s. But we’re not just interested in return, we’re also interested in risk. Could we possibly find a better combination of risk and reward by mixing our investments? It might initially seem like the answer is no; after all Coin A is the safer of the two. How could we possibly decrease risk by mixing in a riskier asset to our investments?

The key insight is that even though Coin A is the safer of the two, the risk of Coin B is uncorrelated with the risk of Coin A. If you invest everything in Coin A, then you have a 40% chance of losing it all. But if you split your investments between Coin A and Coin B, then you only lose everything if both coins come up heads (which happens with probability .6*.4 = 24%, much lower than 40%!)

Let’s go through the numbers. We’ll measure risk by the standard deviation of the possible outcomes.

Risk(X)2
= (0.6)(0.4)(2.1X + 2.5Y – 100 – .26X)2 +  (0.6)(0.6)(2.1X – 100 – .26X)2 +  (0.4)(0.4)(2.5Y – 100 – .26X)2 +  (0.4)(0.6)(0 – 100 – .26X)2

This function is just a parabola (to be precise, Risk2 is a parabola, which means that Risk(x) is a hyperbola). Here’s a plot of Risk(X) vs X (amount placed beside A):

Diversify Risk v X

Looking at this plot, you can see that risk is actually minimized at a roughly even mix of A and B, with slightly more in B. You can also see this minimum risk on a plot of return vs risk:

Diversify Bullet

Notice that somebody that puts most of their money on coin B (these mixes are in the bottom half of the curve) is making a strategic choice is strictly dominated. That is, they could choose a different mix that has a higher rate of return for the same risk!

Little did you know, but you’ve actually just gotten an introduction to modern portfolio theory! Instead of putting money beside coins, portfolio managers consider investing in assets with various risks and rates of return. The curve of reward vs risk is famous in finance as the Markowitz Bullet. The upper half of the curve is the set of portfolios that are not strictly dominated. This section of the curve is known as the efficient frontier, the basic idea being that no rational investor would put themselves on the lower half.

Let’s reframe the problem in terms that would be familiar to somebody in finance.

We have two assets, A and B. We’ll model our knowledge of the rate of return of each asset as a normal distribution with some known mean and standard deviation. The mean of the distribution represents the expected rate of return on a purchase of the asset, and the standard deviation represents the risk of purchasing the asset. Asset A has an expected rate of return of 1.2, which means that for every dollar you put in you expect (on average) to get back $0.20 a year from now. Asset B’s expected rate of return is 1.3, so it has a higher average payout. But Asset B is riskier; the standard deviation for A is 0.5, while B’s standard deviation is 0.8. There’s also a risk-free asset that you can invest in, which we’ll call Asset F. This asset has an expected rate of return of 1.1.

Asset F: R = 1.1, σ = 0
Asset A: R = 1.2, σ = 0.5
Asset B: R = 1.3, σ = 0.8

Suppose that you have $1000 that you want to invest in some combination of these three assets in such a way as to maximize your expected rate of return and minimize your risk. Since rate of return and risk will in general be positively correlated, you have to decide the highest risk that you’re comfortable with. Let’s say that you decide that the highest risk you’ll accept is 0.6. Now, how much of each asset should you purchase?

First of all, let’s disregard the risk-free asset and just consider combinations of A and B. A portfolio of A and B is represented by the weighted sum of A and B. wA is the percentage of your investment in the portfolio that goes to just A, and wB is the percentage that goes towards B. Since we’re just considering a combination of A and B for now, wA + wB = 1. The mean and standard deviation of the new distribution for this portfolio will in general depend on the correlation between A and B, which we’ll call ρ. Perfectly correlated assets have ρ = 1, uncorrelated assets have ρ = 0, and perfectly anti-correlated assets have ρ = -1. Correlation between assets is bad for investors, because it destroys the benefit of diversification. If two assets are perfectly correlated, then you don’t get any lower risk by combining them. On the other hand, if they are perfectly anti-correlated, you can entirely cancel the risk from one with the risk from the other, and get fantastically low risks for great rates of return.

RP = wARA + wBRB
σP2 = wA2σA2 + wB2σB2 + 2ρwAwBσAσB

Let’s suppose that the correlation between A and B is ρ = .2. Since both RP and σP are functions of wA and wB, we can visualize the set of all possible portfolios as a curve on a plot of return-vs-risk:

Diversify 3 Bullet

The Markowitz Bullet again! Each point on the curve represents the rate of return and risk of a particular portfolio obtained by mixing A and B. Just like before, some portfolios dominate others and thus should never be used, regardless of your desired level of risk. In particular, for any portfolio that weights asset A (the less risky one) too highly, there are other portfolios that give a higher rate of return with the exact same risk.

In other words, you should pretty much never purchase only a low-risk low-return item. If your portfolio consists entirely of Asset A, then by mixing in a little bit of the higher-risk item, you can actually end up massively decreasing your risk and upping your rate of return. Of course, this drop in risk is only because the two assets are not perfectly correlated. And it would be even more extreme if we had negatively correlated assets; indeed with perfect negative correlation (as we saw in the puzzle I started this post with), your risk can drop to zero!

Now, we can get our desired portfolio with a risk of 0.6 by just looking at the point on this curve that has σP = 0.6 and calculating which values of wA and wB give us this value. But notice that we haven’t yet used our riskless asset! Can we do better by adding in a little of Asset F to the mix? It turns out that yes, we can. In fact, every portfolio is weakly dominated by some mix of that portfolio and a riskless asset!

We can easily calculate what we get by combining a riskless asset with some other asset X (which can in general be a portfolio consisting of multiple assets):

RP = wXRX + wFRF
σP2 = wX2σX2 + wF2σF2 + 2ρwXwFσXσF = wX2σX2

So σP = wXσX, from which we get that RP = (σPX)RX + (1 – σPX)RF = RF + σP (RX – RF)/σX

What we find is that RPP) is just a line whose slope depends on the rate of return and risk of asset X. So essentially, for any risky asset or portfolio you choose, you can easily visualize all the possible ways it can be combined with a risk-free asset by stretching a line from (0, RF) – the risk and return of the risk-free asset – to (sX, RX) – the risk and return of the risky asset. We can even stretch the line beyond this second point by borrowing some of the risk-free asset in order to buy more of the risky asset, which corresponds to a negative weighting wF.

So, we have a quadratic curve representing the possible portfolios obtained from two risky assets, and a line representing the possible portfolios obtained from a risky asset and a risk-free asset. What we can do now is consider the line that starts at (0, RF) and just barely brushes against the quadratic curve – the tangent line to the curve that passes through (0, RF). This point where the curves meet is known as the tangency portfolio.

Diversify-Tangent.png

Every point on this line is a possible combination of Assets A, B, and F. Why? Well, because the points on the line can be thought of as portfolios consisting of Asset F and the tangency portfolio. And here’s the crucial point: this line is above the curve everywhere except at that single point! What this means is that the combination of A, B, and F dominates combinations of just A and B. For virtually any level of desired risk, you do better by choosing a portfolio on the line than by choosing the portfolio on the quadratic curve! (The only exception to this is the point at which the two curves meet, and in that case you do equally well.)

And that is how you optimize your rate of return for a desired level of risk! First generate the hyperbola for portfolios made from your risky assets, then find the tangent to that curve that passes through the point representing the risk-free asset, and then use that line to calculate the optimal portfolio at your chosen level of risk!

 

Paradoxical Precommitment

If in the future we develop the ability to make accurate simulations of other humans, a lot of things will change for the weirder. In many situations where agents with access to simulations of each other interact, a strange type of apparent backward causality will arise. For instance…

Imagine that you’re competing in a prisoner’s dilemma against an agent that you know has access to a scarily accurate simulation of you. Prior to your decision to either defect or cooperate, you’re mulling over arguments for one course of action over the other. In such a situation, you have to continuously face the fact that for each argument you come up with, your opponent has already anticipated and taken measures to respond to it. As soon as you think up a new line of reasoning, you must immediately update as if your opponent has just heard your thoughts and adjusted their strategy accordingly. Even if your opponent’s decision has already been made and is set in stone, you have no ability to do anything that your opponent hasn’t already incorporated into their decision. Though you are temporally ahead of them, they are in some sense causally ahead of you. Their decision is a response to your (yet-to-be-made) decision, and your decision is not a response to theirs. (I don’t actually believe and am not claiming that this apparent backwards causality is real backwards causality. The arrow of time still points in only one direction and causality follows suit. But it’s worth it in these situations to act as if your opponent has backwards causation abilities.)

When you have two agents that both have access to simulations of the other, things get weird. In such situations, there’s no clear notion of whose decision is a response to the other (as both are responding to each other’s future decision), and so there’s no clear notion of whose decision is causally first. But the question of “who comes first” (in this strange non-temporal sense) turns out to be very important to what strategy the various agents should take!

Let’s consider some examples.

Chicken

Two agents are driving head-on towards each other. Each has a choice to swerve or to stay driving straight ahead. If they both stay, then they crash and die, the worst outcome for all. If one stays and the other swerves, then the one that swerves pays a reputational cost and the one that stays gains some reputation. And if both swerve, then neither gains or loses any reputation. To throw some numerical values on these outcomes, here’s a payoff matrix:

Game-of-Chicken.jpg

This is the game of chicken. It is an anti-cooperation game, in that if one side knows what the other is going to do, then they want to do the opposite. The (swerve, swerve) outcome is unstable, as both players are incentivized to stay if they know that their opponent will swerve. But so is the (stay, stay) outcome, as this is the worst possible outcome for both players and they both stand to gain by switching to swerve. There are two pure strategy Nash equilibria (swerve, stay) and (stay, swerve), and one mixed strategy equilibria (with the payoff matrix above, it corresponds to swerving with probability 90% and staying with probability 10%).

That’s all standard game theory, in a setting where you don’t have access to your opponent’s algorithm. But now let’s take this thought experiment to the future, where each player is able to simulate the other. Imagine that you’re one of the agents. What should you do?

The first thought might be the following: you have access to a simulation of your opponent. So you can just observe what the simulation of your opponent does, and do the opposite. If you observe the simulation swerving you stay, and if you observe the simulation staying you swerve. This has the benefit of avoiding the really bad (stay, stay) outcomes, while also exploiting opponents that decide to swerve.

The issue is that this strategy is exploitable. While you’re making use of your ability to simulate your opponent, you are neglecting the fact that your opponent is also simulating you. Your opponent can see that this is your strategy, so they know that whatever they decide to play, you’ll play the opposite. So if they decide to tear off their steering wheel to ensure that they will not swerve no matter what, they know that you’ll fall in line and swerve, thus winning them +1 utility and losing you -1 utility. This is a precommitment: a strategy that an agent uses that restricts the number of future choices available to them. It’s quite unintuitive and cool that this sort of tying-your-hands ends up being an enormously powerful weapon for those that have access to it.

In other words, if Agent 1 sees that Agent 2 is treating their decision as a fixed fact and responding to it accordingly, then Agent 1 gets an advantage, as they can precommit to staying and force Agent 2 to yield to them. But if Agent 2 now sees Agent 1 as responding to their algorithm rather than the other way around, then Agent 2 benefits by precommitting to stay. If there’s a fact about which agent precommits “first”, then we can conclusively say that this agent does better, as they can force the outcome they want. But again, this is not a temporal first. Suppose that Agent 2 is asleep at the wheel, about to wake up, and Agent 1 is trying to decide what to do. Agent 1 simulates them and sees that once they wake up they will tear out their steering wheel without even considering what Agent 2 does. Now Agent 1’s hand is forced; he will swerve in response to Agent 2’s precommitment, even though it hasn’t yet been made. It appears that for two agents in a chicken-like scenario, with access to simulations of one another, the best action is to precommit as quickly and firmly as possible, with as little regard for their opponents’ precommitments as they can manage (the best-performing agent is the one that tears off their steering wheel without even simulating their opponent and seeing their precommitments, as this agent puts themselves fully causally behind anybody that simulates them). But this obviously just leads straight to the (stay, stay) outcome!

This pattern of precommitting, then precommitting to not respond to precommitments, then precommiting to not respond to precommitments to not respond to precommitments, and so on, shows up all over the place. Let’s have another example, from the realm of economics.

Company Coordination and Boycotts

In my last post, I talked about the Cournot model of firms competing to produce a homogenous good. We saw that competing firms face a coordination problem with respect to the prices they set: every firm sees it in their rational self-interest to undercut other firms to take their customers, but then other firms follow suit, ending up with the price dropping for everybody. That’s good for consumers, but bad for producers! The process of undercutting and then re-equilibrating continues until the price is at the bare minimum that it takes for a producer to be willing to make the good – essentially just minutely above the cost of production. At this point, producers are making virtually no profit and consumer surplus is maximized.

This coordination problem, like all coordination problems, could be solved if only the firms had the ability to precommit. Imagine that the heads of all the companies meet up at some point. They all see the problem that they’re facing, and recognize that if they can stop the undercutting, they’ll all be much richer. So they sign on to a vow to never undercut each other. Of course, signing a piece of paper doesn’t actually restrict your future options. Every company is still just as incentivized as before to break the agreement and undercut their competitors. It helps if they have plausible deniability; the ability to say that their price drop was actually not intended to undercut, but a response to some unrelated change in the market. All that the meeting does is introduce some social cost to undercutting and breaking the vow that wasn’t there before.

To actually permanently fix the coordination problem, the companies need to be able to sign on to something that truly and irrevocably ties their hands, giving them no ability to back out later on (equivalent to the tearing-off-the-steering-wheel as a credible precommitment). Maybe they all decide to put some money towards the creation of a final-check mechanism that looks over all price changes and intervenes to stop any changes that it detects to be intended to undercut opponents. This is precommitment in the purer sense of literally removing an option that the firms previously had. And if this type of tying-of-hands was actually possible, then each company would be rationally incentivized to sign on! (Of course, they’d all be looking for ways to cheat the system and break the mechanism at every step, which would make its actual creation a tad bit difficult.)

So, if you give all companies the ability to jointly sign on to a credible precommitment to not undercut their opponents, then they will take that opportunity. This will keep prices high and keep profits flowing in to the companies. Producer surplus will be maximized, and consumers will get the short end of the stick. Is there any way for the consumers to fight back?

Sure there is! All they need is the ability to precommit as well. Suppose that all consumers are now given the opportunity to come together and boycott any and all companies that precommit to not undercutting each other. If every consumer signs on, and if producers know this, then it’s no longer worth it for them to put in place the price-monitoring mechanism, as they’d just lose all their customers! Of course, the consumers now face their own coordination problem; many of them will still value the product at a price higher than that which is being offered by the companies, even if they’re colluding. And each individual reasons that as long as everybody else is still boycotting the companies, it makes little difference if just one mutually beneficial trade is made with them. So the consumers will themselves face the problem of how to enforce the boycott. But let’s assume that the consumers work this out so that they credibly precommit to never buying from a company that credibly precommits to not undercutting its competitors. Now the market price swings back in their favor, dropping to the cost of production! The consumers win! Whoohoo!

But we’re not done yet. It was only worth it for the consumers to sign on to this precommitment because they predicted that the companies would respond to their precommitment. But what if the companies, seeing the boycott-tactic coming, credibly precommit to never yielding to boycotters? Then the consumers, responding to this precommitment, will realize that boycotting will have no effect on prices, and will just cause them all to lose out on mutually beneficial trades! So they won’t boycott, and therefore the producers get the surplus once more. And just like before, this swings back and forth, with the outcome at each stage depending on which agent treats the other agent’s precommitment as being more primal. But if they each run their apparently-best strategy (that is, making their precommitments with no regard to the precommitments of the other party so as to force their hand and place their own precommitments at the beginning of the causal chain), then we end up with the worst possible outcome for all: producers don’t produce anything and consumers don’t consume, and everybody loses out.

This question of how agents that can simulate one another AND precommit to courses of action should ultimately behave is something that I find quite puzzling and am not sure how to resolve.

Solving the Linear Cournot Model

The Cournot model is a simple economic model used to describe what happens when multiple companies compete with one another to produce some homogenous product. I’ve been playing with it a bit and ended up solving the general linear case. I assume that this solution is already known by somebody, but couldn’t find it anywhere. So I will post it here! It gives some interesting insight into the way that less-than-perfectly-competitive markets operate. First let’s talk about the general structure of the Cournot model.

Suppose we have n firms. Each produces some quantity of the product, which we’ll label as q_1, q_2, ..., q_n . The total amount of product on the market will be given the label Q = q_1 + q_2 + ... + q_n . Since the firms are all selling identical products, it makes sense to assume that the consumer demand function P(q_1, q_2, ..., q_n) will just be a function of the total quantity of the product that is on the market: P(q_1, q_2, ..., q_n) = P(q_1 + q_2 + ... + q_n) = P(Q). (This means that we’re also disregarding effects like customer loyalty to a particular company or geographic closeness to one company location over another. Essentially, the only factor in a consumer’s choice of which company to go to is the price at which that company is selling the product.)

For each firm, there is some cost to producing the good. We capture this by giving each firm a cost function C_1(q_1), C_2(q_2), ..., C_n(q_n) . Now we can figure out the profit of each firm for a given set of output values q_1, q_2, ..., q_n . We’ll label the profit of the kth firm as \Pi_k . This profit is just the amount of money they get by selling the product minus the cost of producing the product: \Pi_k = q_k P(Q) - C_k(q_k).

If we now assume that all firms are maximizing profit, we can find the outputs of each firm by taking the derivative of the profit and setting it to zero. \frac{d\Pi_k}{dq_k} = P(Q) + q_k \frac{dP}{dQ} - \frac{dC_k}{dq_k} = 0. This is a set of n equations with n unknown, so solving this will fully specify the behavior of all firms!

Of course, without any more assumptions about the functions P and C_k , we can’t go too much further with solving this equation in general. To get some interesting general results, we’ll consider a very simple set of assumptions. Our assumptions will be that both consumer demand and producer costs are linear. This is the linear Cournot model, as opposed to the more general Cournot model.

In the linear Cournot model, we write that P(Q) = a - bQ (for some a and b) and C_k(q_k) = c_k q_k . As an example, we might have that P(Q) = $100 – $2 × Q, which would mean that at a price of $40, 30 units of the good will be bought total.

demand curve.png

The constants c_k represent the marginal cost of production for each firm, and the linearity of the cost function means that the cost of producing the next unit is always the same, regardless of how many have been produced before. (This is unrealistic, as generally it’s cheaper per unit to produce large quantities of a good than to produce small quantities.)

Now we can write out the profit-maximization equations for the linear Cournot model. \frac{d\Pi_k}{dq_k} = P(Q) + q_k \frac{dP}{dQ} - \frac{dC_k}{dq_k} = a - bQ - b q_k - c_k = 0 . Rewriting, we get q_k + Q = \frac{a - c_k}{b}. We can’t immediately solve this for q_k , because remember that Q is the sum of all the quantities produced. All n of the quantities we’re trying to solve are in each equation, so to solve the system of equations we have to do some linear algebra!

2q_1 + q_2 + q_3 + ... + q_n = \frac{a - c_1}{b} \\ q_1 + 2q_2 + q_3 + ... + q_n = \frac{a - c_2}{b} \\ q_1 + q_2 + 2q_3 + ... + q_n = \frac{a - c_2}{b} \\ \ldots \\ q_1 + q_2 + q_3 +... + 2q_n = \frac{a - c_n}{b}

Translating this to a matrix equation…

\begin{bmatrix} 2 & 1 & 1 & 1 & 1 & \ldots \\ 1 & 2 & 1 & 1 \\ 1 & 1 & 2 & & \ddots \\ 1 & 1 &  & \ddots \\ 1 &  & \ddots \\ \vdots \end{bmatrix} \begin{bmatrix} q_1 \\ q_2 \\ q_3 \\ \vdots \\ q_{n-2} \\ q_{n-1} \\ q_n \end{bmatrix}  = \frac{1}{b} \begin{bmatrix} a - c_1 \\ a - c_2 \\ a - c_3 \\ \vdots \\ a - c_{n-2} \\ a - c_{n-1} \\ a - c_n \end{bmatrix}

Now if we could only find the inverse of the first matrix, we’d have our solution!

\begin{bmatrix} q_1 \\ q_2 \\ q_3 \\ \vdots \\ q_{n-2} \\ q_{n-1} \\ q_n \end{bmatrix}  = \begin{bmatrix} 2 & 1 & 1 & 1 & 1 & \ldots \\ 1 & 2 & 1 & 1 \\ 1 & 1 & 2 & & \ddots \\ 1 & 1 &  & \ddots \\ 1 &  & \ddots \\ \vdots \end{bmatrix} ^{-1} \frac{1}{b} \begin{bmatrix} a - c_1 \\ a - c_2 \\ a - c_3 \\ \vdots \\ a - c_{n-2} \\ a - c_{n-1} \\ a - c_n \end{bmatrix}

I found the inverse of this matrix by using the symmetry in the matrix to decompose it into two matrices that were each easier to work with:

\mathcal{I} = \begin{bmatrix} 1 & 0 & 0 & 0 \ldots \\ 0 & 1 & 0 \\ 0 & 0 & \ddots \\ 0 \\ \vdots \end{bmatrix} 

\mathcal{J} = \begin{bmatrix} 1 & 1 & 1 & 1 \ldots \\ 1 & 1 & 1 \\ 1 & 1 & \ddots \\ 1 \\ \vdots \end{bmatrix} 

\begin{bmatrix} 2 & 1 & 1 & 1 & 1 & \ldots \\ 1 & 2 & 1 & 1 \\ 1 & 1 & 2 & & \ddots \\ 1 & 1 &  & \ddots \\ 1 &  & \ddots \\ \vdots \end{bmatrix} = \mathcal{I} + \mathcal{J} 

As a hypothesis, suppose that the inverse matrix has a similar form (one value for the diagonal elements, and another value for all off-diagonal elements). This allows us to write an equation for the inverse matrix:

(\mathcal{I} + \mathcal{J}) (A \mathcal{I} + B \mathcal{J}) = \mathcal{I}

To solve this, we’ll use the following easily proven identities.

\mathcal{I} \cdot \mathcal{I} = \mathcal{I} \\ \mathcal{I} \cdot \mathcal{J} = \mathcal{J} \\ \mathcal{J} \cdot \mathcal{I} = \mathcal{J} \\ \mathcal{J} \cdot \mathcal{J} = n \mathcal{J} \\

(\mathcal{I} + \mathcal{J}) (A \mathcal{I} + B \mathcal{J}) \\ = A \mathcal{I} + A \mathcal{J} + B \mathcal{J} + nB \mathcal{J} \\ = A \mathcal{I} + \left( A + B(n+1) \right) \mathcal{J} \\ = \mathcal{I}

A = 1 \\ A + B(n+1) = 0

A = 1 \\ B = - \frac{1}{n+1}

(\mathcal{I} + \mathcal{J})^{-1} = \mathcal{I} - \frac{1}{n+1} \mathcal{J} = \frac{1}{n+1} \begin{bmatrix} n & -1 & -1 & -1 & -1 & \ldots \\ -1 & n & -1 & -1 \\ -1 & -1 & n & & \ddots \\ -1 & -1 &  & \ddots \\ -1 &  & \ddots \\ \vdots \end{bmatrix}

Alright awesome! Our hypothesis turned out to be true! (And it would have even if the entries in our matrix hadn’t been 1s and 2s. This is a really cool general method to find inverses of this family of matrices.) Now we just use this inverse matrix to solve for the output from each firm!

\begin{bmatrix} q_1 \\ q_2 \\ q_3 \\ \vdots \\ q_{n-2} \\ q_{n-1} \\ q_n \end{bmatrix} = (\mathcal{I} - \frac{1}{n+1} \mathcal{J}) \ \frac{1}{b} \begin{bmatrix} a - c_1 \\ a - c_2 \\ a - c_3 \\ \vdots \\ a - c_{n-2} \\ a - c_{n-1} \\ a - c_n \end{bmatrix}  

Define: \mathcal{C} = \sum_{i=1}^{n}{c_i}

q_k = \frac{1}{b} (a - c_k - \frac{1}{n+1} \sum_{i=1}^{n}{(a - c_i)}) \\ ~~~~ = \frac{1}{b} (a - c_k - \frac{1}{n+1} (an - \mathcal{C})) \\ ~~~~ = \frac{1}{b} (\frac{a + \mathcal{C}}{n+1} - c_k)

Q^* = \sum_{k=1}^n {q_k} \\ ~~~~~ = \frac{1}{b} \sum_{k=1}^n ( \frac{a + \mathcal{C}}{n+1} - c_k ) \\ ~~~~~ = \frac{1}{b} \left( \frac{n}{n+1} (a + \mathcal{C}) - \mathcal{C} \right) \\ ~~~~~ = \frac{1}{b} \left( \frac{n}{n+1} a - \frac{\mathcal{C}}{n+1} \right) \\ ~~~~~ = \frac {an - \mathcal{C}} {b(n+1)}

P^* = a - bQ^* \\ ~~~~~ = a - \frac{an - \mathcal{C}}{n+1} \\ ~~~~~ = \frac{a + \mathcal{C}}{n+1}

\Pi_k^* = q_k^* P^* - c_k q_k^* \\ ~~~~~ = \frac{1}{b} (\frac{a+\mathcal{C}}{n+1} - c_k) \frac{a + \mathcal{C}}{n+1} - \frac{c_k}{b} (\frac{a + \mathcal{C}}{n+1} - c_k) \\ ~~~~~ = \frac{1}{b} \left( \left( \frac{a + \mathcal{C}}{n+1} \right)^2 - 2c_k\left( \frac{a + \mathcal{C}}{n+1} \right) + c_k^2 \right) \\ ~~~~~ = \frac{1}{b} \left( \frac{a + \mathcal{C}}{n+1} - c_k \right)^2

And there we have it, the full solution to the general linear Cournot model! Let’s discuss some implications of these results. First of all, let’s look at the two extreme cases: monopoly and perfect competition.

Monopoly: n = 1
Q^* = \frac{1}{2b} (a - c) \\ P^* = \frac{1}{2} (a + c) \\ \Pi^* = \frac{1}{b} \left( \frac{a - c}{2} \right)^2

Perfect Competition: n → ∞
q_k^* \rightarrow \frac{1}{b} (\bar c - c_k) \\ Q^* \rightarrow \frac{1}{b} (a - \bar c) \\ P^* \rightarrow \bar c \\ \Pi^* \rightarrow \frac{1}{b} (\bar c - c_k)^2

The first observation is that the behavior of the market under monopoly looks very different from the case of perfect competition. For one thing, notice that the price under perfect competition is always going to be lower than the price under monopoly. This is a nice demonstration of the so-called monopoly markup. The quantity a intuitively corresponds to the highest possible price you could get for the product (the most that the highest bidder would pay). And the quantity c , the production cost, is the lowest possible price at which the product would be sold. So the monopoly price is the average of the highest price you could get for the good and the lowest price at which it could be sold.

The flip side of the monopoly markup is that less of the good is produced and sold under a monopoly than under perfect competition. There are trades that could be happening (trades which would be mutually beneficial!) which do not occur. Think about it: the monopoly price is halfway between the cost of production and the highest bidder’s price. This means that there are a bunch of people that would buy the product at above the cost of production but below the monopoly price. And since the price they would buy it for is above the cost of production, this would be a profitable exchange for both sides! But alas, the monopoly doesn’t allow these trades to occur, as it would involve lowering the price for everybody, including those who are willing to pay a higher price, and thus decreasing net profit.

Things change as soon as another firm joins the market. This firm can profitably sell the good at a lower price than the monopoly price and snatch up all of their business. This introduces a downward pressure on the price. Here’s the exact solution for the case of duopoly.

Duopoly: n = 2
q_1 = \frac{1}{3b} (a - 2c_1 + c_2) \\ q_2 = \frac{1}{3b} (a + c_1 - 2c_2) \\ Q^* = \frac{1}{3b} (2a - c_1 - c_2) \\ P^* = \frac{1}{3} (a + c_1 + c_2) \\ \Pi_1^* = \frac{1}{3b} (a - 2c_1 + c_2)^2 \\ \Pi_2^* = \frac{1}{3b} (a + c_1 - 2c_2)^2 \\

Interestingly, in the duopoly case the market price still rests at a value above the marginal cost of production for either firm. As more and more firms enter the market, competition pushes the price down further and further until, in the limit of perfect competition, it converges to the cost of production.

The implication of this is that in the limit of perfect competition, firms do not make any profit! This may sound a little unintuitive, but it’s the inevitable consequence of the line of argument above. If a bunch of companies were all making some profit, then their price is somewhere above the cost of production. But this means that one company could slightly lower its price, thus snatching up all the customers and making massively more money than its competitors. So its competitors will all follow suit, pushing down their prices to get back their customers. And in the end, all the firms will have just decreased their prices and their profits, even though every step in the sequence appeared to be the rational and profitable action by each firm! This is just an example of a coordination problem. If the companies could all just agree to hold their price fixed at, say, the monopoly price, then they’d all be better off. But each individual has a strong monetary incentive to lower their price and gather all the customers. So the price will drop and drop until it can drop no more (that is, until it has reached the cost of production, at which point it is no longer profitable for a company to lower their price).

This implies that in some sense, the limit of perfect competition is the best possible outcome for consumers and the worst outcome for producers. Every consumer that values the product above the cost of its production will get it, and they will all get it at the lowest possible price. So the consumer surplus will be enormous. And companies producing the product make no net profit; any attempt to do so immediately loses them their entire customer base. (In which case, what is the motivation for the companies to produce the product in the first place? This is known as the Bertrand paradox.)

We can also get the easier-to-solve special case where all firms have the same cost of production.

Equal Production Costs
\forall k (c_k = c)
q_k^* = \frac{1}{n+1} \frac{a - c}{b} \\ Q^* = \frac{n}{n+1} \frac{a - c}{b} \\ P^* = \frac{a + nc}{n + 1} \\ \Pi^* = \frac{1}{b} \left( \frac{a - c}{n+1} \right)^2

It’s curious that in the Cournot model, prices don’t immediately drop to production levels as soon you go from a monopoly to a duopoly. After all, the intuitive argument I presented before works for two firms: if both firms are pricing the goods at any value above zero, then each stands to gain by lowering the price a slight bit and getting all the customers. And this continues until the price settles at the cost of production. We didn’t build in any ability of the firms to collude to the model, so what gives? What the Cournot model tells us is certainly more realistic (we don’t expect a duopoly to behave like a perfectly competitive market), but where does this realism come from?

The answer is that in a certain sense we did build in collusion between firms from the start, in the form of agreement on what price to sell at. Notice that our model did not allow different firms to set different prices. In this model, firms compete only on quantity of goods sold, not prices. The price is set automatically by the consumer demand function, and no single individual can unilaterally change their price. This constraint is what gives us the more realistic-in-character results that we see, and also what invalidates the intuitive argument I’ve made here.

One final observation. Consider the following procedure. You line up a representative from each of the n firms, as well as the highest bidder for the product (representing the highest price at which the product could be sold). Each of the firms states their cost of production (the lowest they could profitably bring the price to), and the highest bidder states the amount that he values the product (the highest price at which he would still buy it). Now all of the stated costs are averaged, and the result is set as the market price of the good. Turns out that this procedure gives exactly the market price that the linear Cournot model predicts! This might be meaningful or just a curious coincidence. But it’s quite surprising to me that the slope of the demand curve (b ) doesn’t show up at all in the ultimate market price, only the value that the highest bidder puts on the product!

Building an analog calculator

Op-amps are magical little devices that allow us to employ the laws of electromagnetism for our purposes, giving us the power to do ultimately any computation using physics. To explain this, we need to take three steps: first transistors, then differential amplifiers, and then operational amplifiers. The focus of this post will be on purely analog calculations, as that turns out to be the most natural type of computation you get out of op amps.

1. Transistor

Here’s a transistor:

IMG_1022.jpg

It has three ports, named (as labelled) the base, the collector and the emitter. Intuitively, you can think about the base as being like a dial that sensitively controls the flow of large quantities of current from the collector to the emitter. This is done with only very small trickles of current flowing into the base.

More precisely, a transistor obeys the following rules:

  1. The transistor is ON if the voltage at the collector is higher than the voltage at the emitter. It is OFF otherwise.
  2. When the transistor is in the ON state, which is all we’ll be interested in, the following regularities are observed:
    1. Current flows along only two paths: base-to-emitter and collector-to-emitter. A transistor does not allow current to flow from the emitter to either the collector or the base.
    2. The current into the collector is proportional to the current into the base, with a constant of proportionality labelled β whose value is determined by the make of the transistor. β is typically quite large, so that the current into the collector is much larger than the current into the base.
    3. The voltage at the emitter is 0.6V less than the voltage at the base.

IMG_1021.jpg

Combined, these rules allow us to solve basic transistor circuits. Here’s a basic circuit that amplifies input voltages using a transistor:

IMG_1023

Applying the above rules, we derive the following relationship between the input and output voltages:

IMG_1024

The +15V port at the top is important for the functioning of the amplifier because of the first rule of transistors, regarding when they are ON and OFF. If we just had it grounded, then the voltage at the collector would be negative (after the voltage drop from the resistor), so the transistor would switch off. Having the voltage start at +15V allows VC to be positive even after the voltage drop at the resistor (although it does set an upper bound on the input currents for which the circuit will still operate).

And the end result is that any change in the input voltage will result in a corresponding change in the output voltage, amplified by a factor of approximately -RC/RE. Why do we call this amplification? Well, because we can choose whatever values of resistance for RC and RE that we want! So we can make RC a thousand times larger than RE, and we have created an amplifier that takes millivolt signals to volt signals. (The amplification tops out at +15V, because a signal larger than that would result in the collector voltage going negative and the transistor turning off.)

Ok, great, now you’re a master of transistor circuits! We move on to Step 2: the differential amplifier.

2. Differential Amplifier

A differential amplifier is a circuit that amplifies the difference between two voltages and suppresses the commonalities. Here’s how to construct a differential amplifier from two transistors:

IMG_1027.jpg

Let’s solve this circuit explicitly:

Vin = 0.6  + REiE+ Rf (iE+ iE’)
Vin’ = 0.6 + REiE’ + R(iE+ iE’)

∆(Vin – Vin’) = RE(i– iE’) = R(β+1)(iB – iB’)
∆(Vin + Vin’) = (RE + 2Rf)(iE + iE’) = (RE + 2Rf)(β+1)(iB + iB’)

Vout = 15 – RiC
Vout’ = 15 – RC iC

∆(Vout – Vout’) = – RC(i– iC’) = – Rβ (iB– iB’)
∆(Vout + Vout’) = – RC(i+ iC’) = – RC β (i+ iB’)

ADM = Differential Amplification = ∆(Vout – Vout’) / ∆(Vin – Vin’) = -β/(β+1) RC/RE
ACM = “Common Mode” Amplification = ∆(Vout + Vout’) / ∆(Vin + Vin’) = -β/(β+1) RC/(R+ 2Rf)

We can solve this directly for changes in one particular output voltage:

∆Vout = ADM ∆(Vin – Vin‘) + ACM ∆(Vin + Vin‘)

To make a differential amplifier, we require that ADM be large and ACM be small. We can achieve this if Rf >> R>> RC. Notice that since the amplification is a function of the ratio of our resistors, we can easily make the amplification absolutely enormous.

Here’s one way this might be useful: say that Alice wants to send a signal to Bob over some distance, but there is a source of macroscopic noise along the wire connecting the two. Perhaps the wire happens to pass through a region in which large magnetic disturbances sometimes occur. If the signal is encoded in a time-varying voltage on Alice’s end, then what Bob gets may end up being a very warped version of what Alice sent.

But suppose that instead Alice sends Bob two signals, one with the original message and the other just a static fixed voltage. Now the difference between the two signals represents Alice’s message. And crucially, if these two signals are sent on wires that are right side-by-side, then they will pick up the same noise!

75223836_434765827156596_5577175764417118208_n

This means that while the original message will be warped, so will the static signal, and by roughly the same amount! Which means that the difference between the two signals will still carry the information of the original message. This allows Alice to communicate with Bob through the noise, so long as Bob takes the two wires on his end and plugs them into a differential amplifier to suppress the common noise factor.

3. Operational Amplifier

To make an operational amplifier, we just need to make some very slight modifications to our differential amplifier. The first is that we’ll make our amplification as large as possible. We can get this by putting multiple stages of differential amplifiers side by side (so if your differential amplifier amplifies by 100, two of them will amplify by 10,000, and three by 1,000,000).

What’ll happen now if we send in two voltages, with Vin very slightly larger than Vin‘? Well, suppose that the difference is on the order of a single millivolt (mV). Then the output voltage will be on the order of 1 mV multiplied by our amplication factor of 1,000,000. This will be around 1000V. So… do we expect to get out an output signal of 1000V? No, clearly not, because our maximum possible voltage at the output is +15V! We can’t create energy from nowhere. Remember that the output voltage Vout is equal to 15V – RCiC, and that the transistor only accepts current traveling from the collector to the emitter, not the other way around. This means that iC cannot be negative, so Vout cannot be larger that +15V. In addition, we know that Vout cannot be smaller than 0V (as this would turn off the transistor via Rule 1 above).

What this all means is that if Vin is even the slightest bit smaller than Vin‘, Vout will “max out”, jumping to the peak voltage of +15V (remember that ADM is negative). And similarly, if Vin is even the slightest bit larger than Vin‘, then Vout will bottom out, dropping to 0V. The incredible sensitivity of the instrument is given to us by the massive amplification factor, so that it will act as a binary measurement of which of the voltages is larger, even if that difference is just barely detectable. Often, the bottom voltage will be set to -15V rather than ground (0V), so that the signal we get out is either +15V (if Vin is smaller than Vin‘) or -15V (if Vin is greater than Vin‘). That way, perfect equality of the inputs will be represented as 0V. That’s the convention we’ll use for the rest of the post. Also, instead of drawing out the whole diagram for this modified differential amplifier, we will use the following simpler image:

IMG_1029

Ok, we’re almost to an operational amplifier. The final step is to apply negative feedback!

78675994_2469781120006676_3367768905735995392_n

What we’re doing here is just connecting the output voltage to the Vin‘ input. Let’s think about what this does. Suppose that Vin is larger than Vin‘. Then Vout will quickly become negative, approaching -15V. But as Vout decreases, so does Vin‘! So Vin‘ will decrease, getting closer to Vin. Once it passes Vin, the quantity Vin – Vin‘ suddenly becomes negative, so Vout will change direction, becoming more positive and approaching +15V. So Vin‘ will begin to increase! This will continue until it passes Vin again, at which point Vout will change signs again and Vin‘ will start decreasing. The result of this process will be that no matter what Vin‘ starts out as, it will quickly adjust to match Vin to a degree dictated by the resolution of our amplifier. And we’ve already established that the amplifier can be made to have an extremely high resolution. So this device serves as an extremely precise voltage-matcher!

This device, a super-sensitive differential amplifier with negative feedback, is an example of an operational amplifier. It might not be immediately obvious to you what’s so interesting or powerful about this device. Sure it’s very precise, but all it does is match voltages. How can we leverage this to get actually interesting computational behavior? Well, that’s the most fun part!

4. Let’s Compute!

Let’s start out by seeing how an op-amp can be used to do calculus. Though this might seem like an unusually complicated starting place, doing calculus with op amps is significantly easier and simpler than doing something like multiplication.

As we saw in the last section, if we simply place a wire between Vout and Vin‘ (the input labelled with a negative sign on the diagram), then we get a voltage matcher. We get more interesting behavior if we place other circuit components along this wire. The negative feedback still exists, which means that the circuit will ultimately stabilize to a state in which the two inputs are identical, and where no current is flowing into the op-amp. But now the output voltage will not just be zero.

Let’s take a look at the following op-amp circuit:

78702668_568111487351368_4817792612675092480_n-1.jpg

Notice that we still have negative feedback, because the V input is connected to the output, albeit not directly. This means that the two inputs to the op amp must be equal, and since V+ is grounded, the other must be at 0V as well. It also means that no current is flowing into the op amp, as current only flows in while the system is stabilizing.

Those two pieces of information – that the input voltages are equal and that no current flows through the device – are enough to allow us to solve the circuit.

78702668_568111487351368_4817792612675092480_n-3402953477-1574653952986.jpg

And there we have it, a circuit that takes in a voltage function that changes with time and outputs the integral of this function! A natural question might be “integral from what to what”? The answer is just: the integral from the moment the circuit is connected to the present! As soon as the circuit is connected it begins integrating.

Alright, now let’s just switch the capacitor and resistor in the circuit:

76685573_432365024352545_6068179419288043520_n.jpg

See if you can figure out for yourself what this circuit does!

 

 

 

Alright, here’s the solution.

76685573_432365024352545_6068179419288043520_n-1.jpg

We have a differentiator! Feed in some input voltage, and you will get out a precise measurement of the rate of change of this voltage!

I find it pretty amazing that doing integration and differentiation is so simple and natural. Addition is another easy one:

75127992_824200681353695_8324501034672062464_n.jpg

We can use diodes to produce exponential and logarithm calculators. We utilize the ideal diode equation:

76710826_595060124565198_3738884638802706432_n

The logarithm can now be used to produce circuits for calculating exponentials and logarithms:

69938194_442591446661744_1843941520164519936_n.jpg78272557_661889637548819_7805558089259679744_n

Again, these are all fairly simple-looking circuits. But now let’s look at the simplest way of computing multiplication using an op amp. Schematically, it looks like this:

70654156_1528537660633156_5924974734214168576_n.jpg

And here’s the circuit in full detail:

76759967_2648350875451816_1144785316129800192_n

There’s another method besides this one that you can use to do multiplication, but it’s similarly complicated. It’s quite intriguing that multiplication turns out to be such an unnatural thing to get nature to do to voltages, while addition, exponentiation, integration, and the rest are so simple.

What about boolean logic? Well, it turns out that we already have a lot of it. Our addition corresponds to an OR gate if the input voltages are binary signals. We also already have an AND gate, because we have multiplication! And depending on our choice of logic levels (which voltage corresponds to True and which corresponds to False), we can really easily sketch a circuit that does negation:

76939836_489336945014891_1006891873513504768_n.jpg

And of course if we have NOT and we have AND, then we have NAND, and if we have NAND, then we have all the rest of binary logic. We can begin to build flip-flop gates out of the NAND gates to get simple memory cells, and we’re on our way to Turing-completeness!

Complex numbers in physics

“You cannot observe an imaginary number in reality.” Have you ever heard this claim before? It has a nice ring to it, and sounds almost tautological. I’ve personally heard the claim made by several professors of physics, alongside a host of less qualified people. So let’s take a close look at it and see if it holds up to scrutiny. Can you in fact observe imaginary numbers in reality?

First of all, let’s ask a much simpler sounding question. Can you ever observe a real number in reality? Well, yeah. Position, time, charge, mass, and so on; pretty much any physical quantity you name can be represented by real numbers. But if we’re going to be pedantic, when we measure the position of a particle, we are not technically observing a number. We are observing a position, a physical quantity, which we find has a useful representation as a real number. Color is another physical phenomena that is usually represented mathematically as a real number (the frequency of emitted light). But we do not necessarily want to say that color is a number. No, we say that color is a physical phenomena, which we find is usefully described as a real number.

More specifically, we have some physical phenomena whose structure contains many similarities to the abstract structure of these mathematical objects known as real numbers, so it behooves us to translate statements about the phenomena over to the platonic realm of math, where the work we do is precise and logically certain. Once we have done the mathematical manipulations we desire to get useful results, we translate our statements about numbers back into statements about the physical phenomena. There are really just two possible failure points in this process. First, it might be that the mathematical framework actually doesn’t have the same structure as the physical phenomena we have chosen it to describe. And second, it might be that the mathematical manipulations we do once we have translated our physical statements into mathematical statements contain some error (like maybe we accidentally divided by zero or something).

So on one (overly literal) interpretation, when somebody asks whether a certain abstract mathematical object exists in reality, the answer is always no. Numbers and functions and groups and rings don’t exist in reality, because they are by definition abstract objects, not concrete ones. But this way of thinking about the relationship between math and physics does give us a more useful way to answer the question. Do real numbers exist in reality? Well, yes, they exist insofar as their structure is mirrored in the structure of some real world phenomena! If we’re being careful with our language, we might want to say that real numbers are instantiated in reality instead of saying that they exist.

So, are imaginary numbers instantiated in reality? Well, yes, of course they are! The wave function in quantum mechanics is an explicitly complex entity — try doing quantum mechanics with only real-valued wave functions, and you will fail, dramatically. The existence of imaginary values of the wave function is absolutely necessary in order to get an adequate description of our reality. So if you believe quantum mechanics, then you believe that imaginary numbers are actually embedded in the structure of the most fundamental object there is: the wave function of the universe.

A simpler example: any wave-like phenomena is best described in the language of complex numbers. A ripple in the water is described as a complex function of position and time, where the complex phase of the function represents the propagation of the wave through space and time. Any time you see a wave-like phenomena (by which I mean any process that is periodic, including something as prosaic as a ticking clock), you are looking at a physical process that really nicely mirrors the structure of complex numbers.

Now we’ll finally get to the main point of this post, which is to show off a particularly elegant and powerful instance of complex numbers applying to physics. This example is in the realm of electromagnetism, specifically the approximation of electromagnetism that we use when we talk about circuits, resistors, capacitors, and so on.

Suppose somebody comes up to you in the street, hands you the following circuit diagram, and asks you to solve it:

IMG_1014-rotated.jpg

If you’ve never studied circuits before, you might stare at it blankly for a moment, then throw your hands up in puzzlement and give them back their sheet of paper.

If you’ve learned a little bit about circuits, you might stare at it blankly for a few moments, then write down some complicated differential equations, stare at them for a bit, and then throw your hands up in frustration and hand the sheet back to them.

And if you know a lot about circuits, then you’ll probably smile, write down a few short lines of equations involving imaginary numbers, and hand back the paper with the circuit solved.

Basically, the way that students are initially taught to solve circuits is to translate them into differential equations. These differential equations quickly become immensely difficult to solve (as differential equations tend to do). And so, while a few simple circuits are nicely solvable with this method, any interesting circuits that do nontrivial computations are at best a massive headache to solve with this method, and at worst actually infeasible.

This is the real numbers approach to circuits. But it’s not the end of the story. There’s another way! A better and more beautiful way! And it uses complex numbers.

Here’s the idea: circuits are really easy to solve if all of your circuit components are just resistors. For a resistor, the voltage across it is just linearly proportional to the current through it. No derivatives or integrals required, we just use basic algebra to solve one from the other. Furthermore, we have some nice little rules for simplifying complicated-looking resistor circuits by finding equivalent resistances.

IMG_1019-rotated.jpg

The problem is that interesting circuits don’t just involve resistors. They also contain things like inductors and capacitors. And these circuit elements don’t have that nice linear relationship between current and voltage. For capacitors, the relationship is between voltage and the integral of current. And for inductors, the relationship is between voltage and the derivative of current. Thus, a circuit involving a capacitor, a resistor, and an inductor, is going to be solved by an equation that involves the derivative of current, the current itself, and the integral of current. In other words, a circuit involving all three types of circuit elements is in general going to be solved by a second-order differential equation. And those are a mess.

The amazing thing is that if instead of treating current and voltage as real-valued functions, you treat them as complex-valued functions, what you find is that capacitors and inductors behave exactly like resistors. Voltage and current become related by a simple linear equation once more, with no derivatives or integrals involved. And the distinguishing characteristic of a capacitor or an inductor is that the constant of proportionality between voltage and current is an imaginary number instead of a real number. More specifically, a capacitor is a circuit element for which voltage is equal to a negative imaginary number times the current. An inductor is a circuit element for which voltage is equal to a positive imaginary number times the current. And a resistor is a circuit element for which voltage is equal to a positive real number times the current.

Suppose our voltage is described by a simple complex function: Vexp(iωt). Then we can describe the relationship between voltage and current for each of these circuit elements as follows:

IMG_1015

Notice that now the equations for all three circuit components look just like resistors! Just with different constants of proportionality relating voltage to current. We can even redraw our original diagrams to make the point:

IMG_1017.jpg

Fourier showed that any function whatsoever can be described as a sum of functions that look like Vo exp(iωt). And there’s a nice theorem called the superposition theorem that allows us to use this to solve any circuit, no matter what the voltage is.

So, let’s go back to our starting circuit.

IMG_1014-1-rotated.jpg

What we can now do is just redraw it, with all capacitors and inductors substituted for resistors with imaginary resistances:

IMG_1018

This may look just as complicated as before, but it’s actually much much simpler. We can use the rules for reducing resistor equations to solve an immensely simpler circuit:

IMG_1018-rotated-4172847234-1574468473970.jpg

Our circuit is now much simpler, but the original complexity had to be offloaded somewhere. As you can see, it’s been offloaded onto the complex (in both senses of the word) value of the resistance of our simplified circuit. But this type of complexity is much easier to deal with, because it’s just algebra, not calculus! To solve the current in our circuit now, we don’t need to solve a single differential equation, we just have to do some algebraic rearranging of terms. We’ll give the final equivalent resistance the name “Z”.

IMG_1020.jpg

Now suppose that our original voltage was just the real part of V (that is, Vcos(ωt)). Then the current will also be the real part of I. And if our original voltage was just the imaginary part of V, then the current will be the imaginary part of I. And there we have it! We’ve solved our circuit without struggling over a single differential equation, simply by assuming that our quantities were complex numbers instead of real numbers.

This is one of my favorite examples of complex numbers playing an enormously important role in physics. It’s true that it’s a less clear-cut example of complex numbers being necessary to describe a physical phenomena, because we could have in principle done everything with purely real-valued functions by solving some differential equations, but it also highlights the way in which accepting a complex view of the world can simplify your life.

The laugh-hospital of constructive mathematics

In Raymond Smullyan’s story there was a planet on which the concept of humor was unknown and laughter was treated as a disease. Those who laughed were sent to live in laugh-hospitals, away from normal people. As time passed ever more people contracted laughter and the laugh-hospitals grew into whole laugh-communities, until the few remaining normal people pretended to understand humor just so that they could join the rest. What if constructive models are like the laugh-hospitals? What if not understanding constructivism is like not having a sense of humor?

This is a fun talk. It’s a defense of constructive mathematics, which is what mathematics becomes when you reject the law of the excluded middle (henceforth LEM), which is the claim that “φ∨¬φ” is a tautology. Here’s a link to the paper corresponding to the talk (from which the starting quote comes).

Some highlights:

  • The rejection of LEM is not the same as the claim that there are some propositions that are neither true nor false. That is, denying that φ∨¬φ is a tautology is not the same as saying that there is some proposition φ for which φ∧¬φ. This second claim can actually be proven to be false in constructive mathematics.
  • Both the law of double negation (which is the claim that ¬¬φ implies φ) and the axiom of choice imply LEM. So to deny LEM, one must also deny the law of double negation and the axiom of choice.
  • In constructive math, proof by negation is valid, but not proof by contradiction. Proof by negation is the argument structure “Suppose φ, (argument reaching contradiction), therefore ¬φ.” Proof by contradiction is the argument structure “Suppose ¬φ, (argument reaching contradiction), therefore φ.”
    • This might seem strange; can’t you just substitute each φ in the first argument for ¬φ to get the second argument? The reason you can’t is that the second argument involves removing a negation, while the first involves introducing one. Using proof by negation and starting with ¬φ, we get ¬¬φ, and of course to a constructivist this is not equivalent to φ.
  • LEM is equivalent to the claim that all subsets of a finite set must be finite (you can prove LEM from this claim, and can prove this claim from LEM).
  • In particular, in the Brouwer–Heyting–Kolmogorov interpretation of constructive mathematics one cannot prove that “all subsets of the set {0, 1} are finite”.
  • φ is a classical tautology if and only if ¬¬φ is an intuitionistic tautology. (Not in this talk, but a fun one nonetheless.)

So how do we make sense of all this? Well, there are different ways of understanding propositions in intuitionistic logic. One simple one is to reinterpret the assertion that P as meaning ‘P is proven’ rather than ‘P is true’. In this interpretation, the law of the excluded middle in this interpretation is something like ‘every proposition is either provably true or provably false.’ This is the ambitious claim of Hilbert’s program that we now know to be false; for instance, Gödel’s sentence is a counterexample to the law of the excluded middle. We cannot prove that Gödel’s sentence is true, and we cannot prove that it is false. And note that Gödel’s sentence is true! Just not provably so. As for double negation, ¬¬φ means “any proof that φ entails a contradiction, would itself entail a contradiction”, and this is clearly not equivalent to φ (proving that proofs of ¬φ inevitably lead to contradiction is not the same as proving that φ). 

In some interpretations of intuitionistic logic, the interpretation of a proposition is literally tied to its current epistemic status (whether we’ve proven it one way or another yet). So in these interpretations, P=NP is a counterexample to the law of the excluded middle (not because it’s neither true nor false, nor because it’s impossible to prove, but because we don’t currently have such a proof). 

As a final note, let’s just quickly show that we can prove the law of the excluded middle from lambda calculus. We’ll formalize the law of the excluded middle as a function that takes in a proposition p and returns the function corresponding to p∨¬p. If we can show that on each possible value of the input proposition p (either T or F), the function returns T, then we have proven LEM.

Screen Shot 2019-11-22 at 12.37.13 AM.png

I’m not totally sure about the relationship between lambda calculus and intuitionistic logic, so this apparent proof of the law of the excluded middle is curious to me.

Can chess be solved?

There are some chess positions in which one player can force a win. Here’s an extremely simple example:

Chess Easy win

White just moves their queen up to a8 and checkmates Black. Some winning positions are harder to see than this. Take a look at the following position. Can you find a guaranteed two-move checkmate for White?

Chess 2100 Puzzle

And for fun, here’s a harder one, again a guaranteed two-move checkmate, this time by Black:

Chess 2498 Puzzle

Notice that in this last one, the opponent had multiple possible moves to choose from. A forced mate does not necessarily mean restricting your opponent to exactly one move on each of their turns. It just means that no matter what they do, you can still guarantee a win. Forced wins can become arbitrarily complicated and difficult to see if you’re looking many moves down the line, as you have to consider all the possible responses your opponent has at each turn. The world record for the longest forced win is the following position:

549-move endgame

It’s White’s move, and White does have a strategy for a forced win. It just takes 549 turns to actually do this! (This strategy does violate the 50-move rule, which says that after 50 turns with no pawn moves or capture the game is drawn.) At this link you can watch the entire 549 move game. Most of it is totally incomprehensible to human players, and apparently top chess players that look at this game have reported that the reasoning behind the first 400 moves is opaque to them. Interestingly, White gets a pawn promotion after six moves, and it promotes it to a knight instead of a queen! It turns out that promoting to a queen actually loses for White, and their only way to victory is the knight promotion!

This position is the longest forced win with 7 pieces on the board. There are a few others that are similarly long. All of them represent a glimpse at the perfect play we might expect to see if a hypercomputer could calculate the whole game tree for chess and select the optimal move.

A grandmaster wouldn’t be better at these endgames than someone who had learned chess yesterday. It’s a sort of chess that has nothing to do with chess, a chess that we could never have imagined without computers. The Stiller moves are awesome, almost scary, because you know they are the truth, God’s Algorithm – it’s like being revealed the Meaning of Life, but you don’t understand one word.

Tim Krabbe

With six pieces on the board, the longest mate takes 262 moves (you can play out this position here). For five pieces, it’s 127 moves, for four it’s 43 moves, and the longest 3-man mate takes 28 moves.

Longest n move mates.png

But now a natural question arises. We know that a win can be forced in some positions. But how about the opening position? That is, is there a guaranteed win for White (or for Black) starting in this position?

Screen Shot 2019-11-09 at 2.16.41 AM.png

Said more prosaically: Can chess be solved?

Zermelo’s theorem, published in “On an Application of Set Theory to the Theory of the Game of Chess” (1913), was the first formal theorem in game theory. It predated the work of von Neumann (the so-called “father of game theory”) by 15 years. It proves that yes, it is in fact possible to solve chess. We don’t know what the solution is, but we know that either White can force a win, or Black can force a win, or the result will be a draw if both play perfectly.

Of course, the guarantee that in principle there is a solution to chess doesn’t tell us much in practice. The exponential blowup in the number of possible games is so enormous that humans will never find this solution. Nonetheless, I still find it fascinating to think that the mystery of chess is ultimately a product of computational limitations, and that in principle, if we had a hypercomputer, we could just find the unique best chess game and watch it play out, either to a win by one side or to a draw. That would be a game that I would love to see.

✯✯✯

Here’s another fun thing. There’s an extremely bizarre variant on chess called suicide chess (or anti-chess). The goal of suicide chess is to lose all your pieces. Of course, if all the rules of play were the same, it would be practically impossible to win (since your opponent could always just keep refusing to take a piece that you are offering them). To remedy this, in suicide chess, capturing is mandatory! And if you have multiple possible captures, then you can choose among them.

Suicide chess gameplay is extremely complicated and unusual looking, and evaluating who is winning at any given moment tends to be really difficult, as sudden turnarounds are commonplace compared to ordinary chess. But one simplifying factor is that it tends to be easier to restrict your opponents’ moves. In ordinary chess, you can only restrict your opponents’ moves by blocking off their pieces or threatening their king. But in suicide chess, your opponents’ moves are restricted ANY time you put one of your pieces in their line of fire! This feature of the gameplay makes the exponential blow up in possible games more manageable.

Given this, it probably won’t be much of a surprise that suicide chess is, just like ordinary chess, in principle solvable. But here’s the crazy part. Suicide chess is solved!!

That’s right: it was proven a triple of years ago that White can force a win by moving first with e3!

Screen Shot 2019-11-09 at 2.17.03 AM

Here’s the paper. The proof amounts to basically running a program that looks at all possible responses to e3 and expands out the game tree, ultimately showing that all branches can be terminated with White losing all pieces and winning the game.

Not only do we know that by starting with e3, White is guaranteed a win, we also know that Black can force a win if White starts with any of the following moves: a3, b4, c3, d3, d4, e4, f3, f4, h3, h4, Nc3, Nf3. As far as I was able to tell, there are only six opening moves remaining for which we don’t know if White wins, Black wins, or they draw: a4, b3, c4, e3, g3, and g4.

✯✯✯

Alright, final chess variant trivia. Infinite chess is just chess, but played on an infinite board.

Infinite_chess.png

There’s a mind-blowing connection between infinite chess and mathematical logic. As a refresher, a little while back I discussed the first-order theory of Peano arithmetic. This is the theory of natural numbers with addition and multiplication. If you recall, we found that Peano arithmetic was incomplete (in that not all first-order sentences about the natural numbers can be proven from its axioms). First order PA is also undecidable, in that there exists no algorithm that takes in a first order sentence and returns whether it is provable from the axioms. (In fact, first order logic in general is undecidable! To get decidability, you have to go to a weaker fragment of first order logic known as monadic predicate calculus, in which predicates take only one argument and there are no functions. As soon as you introduce a single binary predicate, you lose decidability.)

Okay, so first order PA (the theory of natural numbers with addition and multiplication) is incomplete and undecidable. But there are weaker fragments of first order PA that are decidable! Take away multiplication, and you have Presburger arithmetic, the theory of natural numbers with addition. Take away addition, and you have Skolem arithmetic, the theory of natural numbers with multiplication. Both of these fragments are significantly weaker than Peano arithmetic (each is unable to prove general statements about the missing operation, like that multiplication is commutative for Presburger arithmetic). But in exchange for this weakness, you get both decidability and completeness!

How does all this relate to infinite chess? Well, consider the problem of determining whether there exists a checkmate in n turns from a given starting position. This seems like a really hard problem, because unlike in ordinary chess, now it’s possible for there to be literally infinite possible moves for a given player from a position. (For instance, a queen on an empty diagonal can move to any of the infinite locations on this diagonal.) So apparently, the game tree for infinite chess, in general, branches infinitely. Given this, we might expect that this problem is not decidable.

Well, it turns out that any instance of this problem (any particular board setup, with the question of whether there’s a mate-in-n for one of the players) can be translated into a sentence in Presburger arithmetic. You do this by translating a position into a fixed length sequence of natural numbers, where each piece is given a sequence of numbers indicating its type and location. The possibility of attacks can be represented as equations about these numbers. And since the distance pieces (bishops, rooks, and queens – those that have in general an infinite number of available moves) all move in straight lines, there are simple equations expressible in Presburger arithmetic that describe whether these pieces can attack other pieces! From the attack relations, you can build up more complicated relations, including the mate-in-n relation.

So we have a translation from the mate-in-n problem to a sentence in Presburger arithmetic. But Presburger arithmetic is decidable! So there must also be a decision procedure for the mate-in-n problem in infinite chess. And not only is there a decision procedure for the mate-in-n problem, but there’s an algorithm that gives the precise strategy that achieves the win in the fewest number of moves!

Here’s the paper in which all of this is proven. It’s pretty wild. Many other infinite chess problems can be proven to be decidable by the same method (demonstrating interpretability of the problem in Presburger arithmetic). But interestingly, not all of them! This has a lot to do with the limitations of first-order logic. The question of whether, in general, there is a forced win from a given position can not be shown to be decidable in this way. (This relates to the general impossibility in first-order logic of expressing infinitely long statements. Determining whether a given position is a winning position for a given player requires looking at the mate-in-n problem, but without any upper bound on what this n is – on how many moves the win may take.) It’s not even clear whether the winning-position problem can be phrased in first-order arithmetic, or whether it requires going to second-order!

The paper takes this one step further. This proof of the decidability of the mate-in-n problem for infinite chess doesn’t crucially rest upon the two-dimensionality of the chess board. We could easily translate the proof to a three-dimensional board, just by changing the way we code positions! So in fact, we have a proof that the mate-in-n problem for k-dimensional infinite chess is decidable!

I’ll leave you with this infinite chess puzzle:

Screen Shot 2019-11-09 at 3.09.47 AM.png

It’s White’s turn. Can they guarantee a checkmate in 12 moves or less?