# Building an analog calculator

Op-amps are magical little devices that allow us to employ the laws of electromagnetism for our purposes, giving us the power to do ultimately any computation using physics. To explain this, we need to take three steps: first transistors, then differential amplifiers, and then operational amplifiers. The focus of this post will be on purely analog calculations, as that turns out to be the most natural type of computation you get out of op amps.

## 1. Transistor

Here’s a transistor:

It has three ports, named (as labelled) the base, the collector and the emitter. Intuitively, you can think about the base as being like a dial that sensitively controls the flow of large quantities of current from the collector to the emitter. This is done with only very small trickles of current flowing into the base.

More precisely, a transistor obeys the following rules:

1. The transistor is ON if the voltage at the collector is higher than the voltage at the emitter. It is OFF otherwise.
2. When the transistor is in the ON state, which is all we’ll be interested in, the following regularities are observed:
1. Current flows along only two paths: base-to-emitter and collector-to-emitter. A transistor does not allow current to flow from the emitter to either the collector or the base.
2. The current into the collector is proportional to the current into the base, with a constant of proportionality labelled β whose value is determined by the make of the transistor. β is typically quite large, so that the current into the collector is much larger than the current into the base.
3. The voltage at the emitter is 0.6V less than the voltage at the base.

Combined, these rules allow us to solve basic transistor circuits. Here’s a basic circuit that amplifies input voltages using a transistor:

Applying the above rules, we derive the following relationship between the input and output voltages:

The +15V port at the top is important for the functioning of the amplifier because of the first rule of transistors, regarding when they are ON and OFF. If we just had it grounded, then the voltage at the collector would be negative (after the voltage drop from the resistor), so the transistor would switch off. Having the voltage start at +15V allows VC to be positive even after the voltage drop at the resistor (although it does set an upper bound on the input currents for which the circuit will still operate).

And the end result is that any change in the input voltage will result in a corresponding change in the output voltage, amplified by a factor of approximately -RC/RE. Why do we call this amplification? Well, because we can choose whatever values of resistance for RC and RE that we want! So we can make RC a thousand times larger than RE, and we have created an amplifier that takes millivolt signals to volt signals. (The amplification tops out at +15V, because a signal larger than that would result in the collector voltage going negative and the transistor turning off.)

Ok, great, now you’re a master of transistor circuits! We move on to Step 2: the differential amplifier.

## 2. Differential Amplifier

A differential amplifier is a circuit that amplifies the difference between two voltages and suppresses the commonalities. Here’s how to construct a differential amplifier from two transistors:

Let’s solve this circuit explicitly:

Vin = 0.6  + REiE+ Rf (iE+ iE’)
Vin’ = 0.6 + REiE’ + R(iE+ iE’)

∆(Vin – Vin’) = RE(i– iE’) = R(β+1)(iB – iB’)
∆(Vin + Vin’) = (RE + 2Rf)(iE + iE’) = (RE + 2Rf)(β+1)(iB + iB’)

Vout = 15 – RiC
Vout’ = 15 – RC iC

∆(Vout – Vout’) = – RC(i– iC’) = – Rβ (iB– iB’)
∆(Vout + Vout’) = – RC(i+ iC’) = – RC β (i+ iB’)

ADM = Differential Amplification = ∆(Vout – Vout’) / ∆(Vin – Vin’) = -β/(β+1) RC/RE
ACM = “Common Mode” Amplification = ∆(Vout + Vout’) / ∆(Vin + Vin’) = -β/(β+1) RC/(R+ 2Rf)

We can solve this directly for changes in one particular output voltage:

∆Vout = ADM ∆(Vin – Vin‘) + ACM ∆(Vin + Vin‘)

To make a differential amplifier, we require that ADM be large and ACM be small. We can achieve this if Rf >> R>> RC. Notice that since the amplification is a function of the ratio of our resistors, we can easily make the amplification absolutely enormous.

Here’s one way this might be useful: say that Alice wants to send a signal to Bob over some distance, but there is a source of macroscopic noise along the wire connecting the two. Perhaps the wire happens to pass through a region in which large magnetic disturbances sometimes occur. If the signal is encoded in a time-varying voltage on Alice’s end, then what Bob gets may end up being a very warped version of what Alice sent.

But suppose that instead Alice sends Bob two signals, one with the original message and the other just a static fixed voltage. Now the difference between the two signals represents Alice’s message. And crucially, if these two signals are sent on wires that are right side-by-side, then they will pick up the same noise!

This means that while the original message will be warped, so will the static signal, and by roughly the same amount! Which means that the difference between the two signals will still carry the information of the original message. This allows Alice to communicate with Bob through the noise, so long as Bob takes the two wires on his end and plugs them into a differential amplifier to suppress the common noise factor.

## 3. Operational Amplifier

To make an operational amplifier, we just need to make some very slight modifications to our differential amplifier. The first is that we’ll make our amplification as large as possible. We can get this by putting multiple stages of differential amplifiers side by side (so if your differential amplifier amplifies by 100, two of them will amplify by 10,000, and three by 1,000,000).

What’ll happen now if we send in two voltages, with Vin very slightly larger than Vin‘? Well, suppose that the difference is on the order of a single millivolt (mV). Then the output voltage will be on the order of 1 mV multiplied by our amplication factor of 1,000,000. This will be around 1000V. So… do we expect to get out an output signal of 1000V? No, clearly not, because our maximum possible voltage at the output is +15V! We can’t create energy from nowhere. Remember that the output voltage Vout is equal to 15V – RCiC, and that the transistor only accepts current traveling from the collector to the emitter, not the other way around. This means that iC cannot be negative, so Vout cannot be larger that +15V. In addition, we know that Vout cannot be smaller than 0V (as this would turn off the transistor via Rule 1 above).

What this all means is that if Vin is even the slightest bit smaller than Vin‘, Vout will “max out”, jumping to the peak voltage of +15V (remember that ADM is negative). And similarly, if Vin is even the slightest bit larger than Vin‘, then Vout will bottom out, dropping to 0V. The incredible sensitivity of the instrument is given to us by the massive amplification factor, so that it will act as a binary measurement of which of the voltages is larger, even if that difference is just barely detectable. Often, the bottom voltage will be set to -15V rather than ground (0V), so that the signal we get out is either +15V (if Vin is smaller than Vin‘) or -15V (if Vin is greater than Vin‘). That way, perfect equality of the inputs will be represented as 0V. That’s the convention we’ll use for the rest of the post. Also, instead of drawing out the whole diagram for this modified differential amplifier, we will use the following simpler image:

Ok, we’re almost to an operational amplifier. The final step is to apply negative feedback!

What we’re doing here is just connecting the output voltage to the Vin‘ input. Let’s think about what this does. Suppose that Vin is larger than Vin‘. Then Vout will quickly become negative, approaching -15V. But as Vout decreases, so does Vin‘! So Vin‘ will decrease, getting closer to Vin. Once it passes Vin, the quantity Vin – Vin‘ suddenly becomes negative, so Vout will change direction, becoming more positive and approaching +15V. So Vin‘ will begin to increase! This will continue until it passes Vin again, at which point Vout will change signs again and Vin‘ will start decreasing. The result of this process will be that no matter what Vin‘ starts out as, it will quickly adjust to match Vin to a degree dictated by the resolution of our amplifier. And we’ve already established that the amplifier can be made to have an extremely high resolution. So this device serves as an extremely precise voltage-matcher!

This device, a super-sensitive differential amplifier with negative feedback, is an example of an operational amplifier. It might not be immediately obvious to you what’s so interesting or powerful about this device. Sure it’s very precise, but all it does is match voltages. How can we leverage this to get actually interesting computational behavior? Well, that’s the most fun part!

## 4. Let’s Compute!

Let’s start out by seeing how an op-amp can be used to do calculus. Though this might seem like an unusually complicated starting place, doing calculus with op amps is significantly easier and simpler than doing something like multiplication.

As we saw in the last section, if we simply place a wire between Vout and Vin‘ (the input labelled with a negative sign on the diagram), then we get a voltage matcher. We get more interesting behavior if we place other circuit components along this wire. The negative feedback still exists, which means that the circuit will ultimately stabilize to a state in which the two inputs are identical, and where no current is flowing into the op-amp. But now the output voltage will not just be zero.

Let’s take a look at the following op-amp circuit:

Notice that we still have negative feedback, because the V input is connected to the output, albeit not directly. This means that the two inputs to the op amp must be equal, and since V+ is grounded, the other must be at 0V as well. It also means that no current is flowing into the op amp, as current only flows in while the system is stabilizing.

Those two pieces of information – that the input voltages are equal and that no current flows through the device – are enough to allow us to solve the circuit.

And there we have it, a circuit that takes in a voltage function that changes with time and outputs the integral of this function! A natural question might be “integral from what to what”? The answer is just: the integral from the moment the circuit is connected to the present! As soon as the circuit is connected it begins integrating.

Alright, now let’s just switch the capacitor and resistor in the circuit:

See if you can figure out for yourself what this circuit does!

Alright, here’s the solution.

We have a differentiator! Feed in some input voltage, and you will get out a precise measurement of the rate of change of this voltage!

I find it pretty amazing that doing integration and differentiation is so simple and natural. Addition is another easy one:

We can use diodes to produce exponential and logarithm calculators. We utilize the ideal diode equation:

The logarithm can now be used to produce circuits for calculating exponentials and logarithms:

Again, these are all fairly simple-looking circuits. But now let’s look at the simplest way of computing multiplication using an op amp. Schematically, it looks like this:

And here’s the circuit in full detail:

There’s another method besides this one that you can use to do multiplication, but it’s similarly complicated. It’s quite intriguing that multiplication turns out to be such an unnatural thing to get nature to do to voltages, while addition, exponentiation, integration, and the rest are so simple.

What about boolean logic? Well, it turns out that we already have a lot of it. Our addition corresponds to an OR gate if the input voltages are binary signals. We also already have an AND gate, because we have multiplication! And depending on our choice of logic levels (which voltage corresponds to True and which corresponds to False), we can really easily sketch a circuit that does negation:

And of course if we have NOT and we have AND, then we have NAND, and if we have NAND, then we have all the rest of binary logic. We can begin to build flip-flop gates out of the NAND gates to get simple memory cells, and we’re on our way to Turing-completeness!

# Complex numbers in physics

“You cannot observe an imaginary number in reality.” Have you ever heard this claim before? It has a nice ring to it, and sounds almost tautological. I’ve personally heard the claim made by several professors of physics, alongside a host of less qualified people. So let’s take a close look at it and see if it holds up to scrutiny. Can you in fact observe imaginary numbers in reality?

First of all, let’s ask a much simpler sounding question. Can you ever observe a real number in reality? Well, yeah. Position, time, charge, mass, and so on; pretty much any physical quantity you name can be represented by real numbers. But if we’re going to be pedantic, when we measure the position of a particle, we are not technically observing a number. We are observing a position, a physical quantity, which we find has a useful representation as a real number. Color is another physical phenomena that is usually represented mathematically as a real number (the frequency of emitted light). But we do not necessarily want to say that color is a number. No, we say that color is a physical phenomena, which we find is usefully described as a real number.

More specifically, we have some physical phenomena whose structure contains many similarities to the abstract structure of these mathematical objects known as real numbers, so it behooves us to translate statements about the phenomena over to the platonic realm of math, where the work we do is precise and logically certain. Once we have done the mathematical manipulations we desire to get useful results, we translate our statements about numbers back into statements about the physical phenomena. There are really just two possible failure points in this process. First, it might be that the mathematical framework actually doesn’t have the same structure as the physical phenomena we have chosen it to describe. And second, it might be that the mathematical manipulations we do once we have translated our physical statements into mathematical statements contain some error (like maybe we accidentally divided by zero or something).

So on one (overly literal) interpretation, when somebody asks whether a certain abstract mathematical object exists in reality, the answer is always no. Numbers and functions and groups and rings don’t exist in reality, because they are by definition abstract objects, not concrete ones. But this way of thinking about the relationship between math and physics does give us a more useful way to answer the question. Do real numbers exist in reality? Well, yes, they exist insofar as their structure is mirrored in the structure of some real world phenomena! If we’re being careful with our language, we might want to say that real numbers are instantiated in reality instead of saying that they exist.

So, are imaginary numbers instantiated in reality? Well, yes, of course they are! The wave function in quantum mechanics is an explicitly complex entity — try doing quantum mechanics with only real-valued wave functions, and you will fail, dramatically. The existence of imaginary values of the wave function is absolutely necessary in order to get an adequate description of our reality. So if you believe quantum mechanics, then you believe that imaginary numbers are actually embedded in the structure of the most fundamental object there is: the wave function of the universe.

A simpler example: any wave-like phenomena is best described in the language of complex numbers. A ripple in the water is described as a complex function of position and time, where the complex phase of the function represents the propagation of the wave through space and time. Any time you see a wave-like phenomena (by which I mean any process that is periodic, including something as prosaic as a ticking clock), you are looking at a physical process that really nicely mirrors the structure of complex numbers.

Now we’ll finally get to the main point of this post, which is to show off a particularly elegant and powerful instance of complex numbers applying to physics. This example is in the realm of electromagnetism, specifically the approximation of electromagnetism that we use when we talk about circuits, resistors, capacitors, and so on.

Suppose somebody comes up to you in the street, hands you the following circuit diagram, and asks you to solve it:

If you’ve never studied circuits before, you might stare at it blankly for a moment, then throw your hands up in puzzlement and give them back their sheet of paper.

If you’ve learned a little bit about circuits, you might stare at it blankly for a few moments, then write down some complicated differential equations, stare at them for a bit, and then throw your hands up in frustration and hand the sheet back to them.

And if you know a lot about circuits, then you’ll probably smile, write down a few short lines of equations involving imaginary numbers, and hand back the paper with the circuit solved.

Basically, the way that students are initially taught to solve circuits is to translate them into differential equations. These differential equations quickly become immensely difficult to solve (as differential equations tend to do). And so, while a few simple circuits are nicely solvable with this method, any interesting circuits that do nontrivial computations are at best a massive headache to solve with this method, and at worst actually infeasible.

This is the real numbers approach to circuits. But it’s not the end of the story. There’s another way! A better and more beautiful way! And it uses complex numbers.

Here’s the idea: circuits are really easy to solve if all of your circuit components are just resistors. For a resistor, the voltage across it is just linearly proportional to the current through it. No derivatives or integrals required, we just use basic algebra to solve one from the other. Furthermore, we have some nice little rules for simplifying complicated-looking resistor circuits by finding equivalent resistances.

The problem is that interesting circuits don’t just involve resistors. They also contain things like inductors and capacitors. And these circuit elements don’t have that nice linear relationship between current and voltage. For capacitors, the relationship is between voltage and the integral of current. And for inductors, the relationship is between voltage and the derivative of current. Thus, a circuit involving a capacitor, a resistor, and an inductor, is going to be solved by an equation that involves the derivative of current, the current itself, and the integral of current. In other words, a circuit involving all three types of circuit elements is in general going to be solved by a second-order differential equation. And those are a mess.

The amazing thing is that if instead of treating current and voltage as real-valued functions, you treat them as complex-valued functions, what you find is that capacitors and inductors behave exactly like resistors. Voltage and current become related by a simple linear equation once more, with no derivatives or integrals involved. And the distinguishing characteristic of a capacitor or an inductor is that the constant of proportionality between voltage and current is an imaginary number instead of a real number. More specifically, a capacitor is a circuit element for which voltage is equal to a negative imaginary number times the current. An inductor is a circuit element for which voltage is equal to a positive imaginary number times the current. And a resistor is a circuit element for which voltage is equal to a positive real number times the current.

Suppose our voltage is described by a simple complex function: Vexp(iωt). Then we can describe the relationship between voltage and current for each of these circuit elements as follows:

Notice that now the equations for all three circuit components look just like resistors! Just with different constants of proportionality relating voltage to current. We can even redraw our original diagrams to make the point:

Fourier showed that any function whatsoever can be described as a sum of functions that look like Vo exp(iωt). And there’s a nice theorem called the superposition theorem that allows us to use this to solve any circuit, no matter what the voltage is.

So, let’s go back to our starting circuit.

What we can now do is just redraw it, with all capacitors and inductors substituted for resistors with imaginary resistances:

This may look just as complicated as before, but it’s actually much much simpler. We can use the rules for reducing resistor equations to solve an immensely simpler circuit:

Our circuit is now much simpler, but the original complexity had to be offloaded somewhere. As you can see, it’s been offloaded onto the complex (in both senses of the word) value of the resistance of our simplified circuit. But this type of complexity is much easier to deal with, because it’s just algebra, not calculus! To solve the current in our circuit now, we don’t need to solve a single differential equation, we just have to do some algebraic rearranging of terms. We’ll give the final equivalent resistance the name “Z”.

Now suppose that our original voltage was just the real part of V (that is, Vcos(ωt)). Then the current will also be the real part of I. And if our original voltage was just the imaginary part of V, then the current will be the imaginary part of I. And there we have it! We’ve solved our circuit without struggling over a single differential equation, simply by assuming that our quantities were complex numbers instead of real numbers.

This is one of my favorite examples of complex numbers playing an enormously important role in physics. It’s true that it’s a less clear-cut example of complex numbers being necessary to describe a physical phenomena, because we could have in principle done everything with purely real-valued functions by solving some differential equations, but it also highlights the way in which accepting a complex view of the world can simplify your life.

# The laugh-hospital of constructive mathematics

In Raymond Smullyan’s story there was a planet on which the concept of humor was unknown and laughter was treated as a disease. Those who laughed were sent to live in laugh-hospitals, away from normal people. As time passed ever more people contracted laughter and the laugh-hospitals grew into whole laugh-communities, until the few remaining normal people pretended to understand humor just so that they could join the rest. What if constructive models are like the laugh-hospitals? What if not understanding constructivism is like not having a sense of humor?

This is a fun talk. It’s a defense of constructive mathematics, which is what mathematics becomes when you reject the law of the excluded middle (henceforth LEM), which is the claim that “φ∨¬φ” is a tautology. Here’s a link to the paper corresponding to the talk (from which the starting quote comes).

Some highlights:

• The rejection of LEM is not the same as the claim that there are some propositions that are neither true nor false. That is, denying that φ∨¬φ is a tautology is not the same as saying that there is some proposition φ for which φ∧¬φ. This second claim can actually be proven to be false in constructive mathematics.
• Both the law of double negation (which is the claim that ¬¬φ implies φ) and the axiom of choice imply LEM. So to deny LEM, one must also deny the law of double negation and the axiom of choice.
• In constructive math, proof by negation is valid, but not proof by contradiction. Proof by negation is the argument structure “Suppose φ, (argument reaching contradiction), therefore ¬φ.” Proof by contradiction is the argument structure “Suppose ¬φ, (argument reaching contradiction), therefore φ.”
• This might seem strange; can’t you just substitute each φ in the first argument for ¬φ to get the second argument? The reason you can’t is that the second argument involves removing a negation, while the first involves introducing one. Using proof by negation and starting with ¬φ, we get ¬¬φ, and of course to a constructivist this is not equivalent to φ.
• LEM is equivalent to the claim that all subsets of a finite set must be finite (you can prove LEM from this claim, and can prove this claim from LEM).
• In particular, in the Brouwer–Heyting–Kolmogorov interpretation of constructive mathematics one cannot prove that “all subsets of the set {0, 1} are finite”.
• φ is a classical tautology if and only if ¬¬φ is an intuitionistic tautology. (Not in this talk, but a fun one nonetheless.)

So how do we make sense of all this? Well, there are different ways of understanding propositions in intuitionistic logic. One simple one is to reinterpret the assertion that P as meaning ‘P is proven’ rather than ‘P is true’. In this interpretation, the law of the excluded middle in this interpretation is something like ‘every proposition is either provably true or provably false.’ This is the ambitious claim of Hilbert’s program that we now know to be false; for instance, Gödel’s sentence is a counterexample to the law of the excluded middle. We cannot prove that Gödel’s sentence is true, and we cannot prove that it is false. And note that Gödel’s sentence is true! Just not provably so. As for double negation, ¬¬φ means “any proof that φ entails a contradiction, would itself entail a contradiction”, and this is clearly not equivalent to φ (proving that proofs of ¬φ inevitably lead to contradiction is not the same as proving that φ).

In some interpretations of intuitionistic logic, the interpretation of a proposition is literally tied to its current epistemic status (whether we’ve proven it one way or another yet). So in these interpretations, P=NP is a counterexample to the law of the excluded middle (not because it’s neither true nor false, nor because it’s impossible to prove, but because we don’t currently have such a proof).

As a final note, let’s just quickly show that we can prove the law of the excluded middle from lambda calculus. We’ll formalize the law of the excluded middle as a function that takes in a proposition p and returns the function corresponding to p∨¬p. If we can show that on each possible value of the input proposition p (either T or F), the function returns T, then we have proven LEM.

I’m not totally sure about the relationship between lambda calculus and intuitionistic logic, so this apparent proof of the law of the excluded middle is curious to me.

# Can chess be solved?

There are some chess positions in which one player can force a win. Here’s an extremely simple example:

White just moves their queen up to a8 and checkmates Black. Some winning positions are harder to see than this. Take a look at the following position. Can you find a guaranteed two-move checkmate for White?

And for fun, here’s a harder one, again a guaranteed two-move checkmate, this time by Black:

Notice that in this last one, the opponent had multiple possible moves to choose from. A forced mate does not necessarily mean restricting your opponent to exactly one move on each of their turns. It just means that no matter what they do, you can still guarantee a win. (Edit: with best play from White, this is not actually a forced mate.) Forced wins can become arbitrarily complicated and difficult to see if you’re looking many moves down the line, as you have to consider all the possible responses your opponent has at each turn. The world record for the longest forced win is the following position:

It’s White’s move, and White does have a strategy for a forced win. It just takes 549 turns to actually do this! (This strategy does violate the 50-move rule, which says that after 50 turns with no pawn moves or capture the game is drawn.) At this link you can watch the entire 549 move game. Most of it is totally incomprehensible to human players, and apparently top chess players that look at this game have reported that the reasoning behind the first 400 moves is opaque to them. Interestingly, White gets a pawn promotion after six moves, and it promotes it to a knight instead of a queen! It turns out that promoting to a queen actually loses for White, and their only way to victory is the knight promotion!

This position is the longest forced win with 7 pieces on the board. There are a few others that are similarly long. All of them represent a glimpse at the perfect play we might expect to see if a hypercomputer could calculate the whole game tree for chess and select the optimal move.

A grandmaster wouldn’t be better at these endgames than someone who had learned chess yesterday. It’s a sort of chess that has nothing to do with chess, a chess that we could never have imagined without computers. The Stiller moves are awesome, almost scary, because you know they are the truth, God’s Algorithm – it’s like being revealed the Meaning of Life, but you don’t understand one word.

Tim Krabbe

With six pieces on the board, the longest mate takes 262 moves (you can play out this position here). For five pieces, it’s 127 moves, for four it’s 43 moves, and the longest 3-man mate takes 28 moves.

But now a natural question arises. We know that a win can be forced in some positions. But how about the opening position? That is, is there a guaranteed win for White (or for Black) starting in this position?

Said more prosaically: Can chess be solved?

Zermelo’s theorem, published in “On an Application of Set Theory to the Theory of the Game of Chess” (1913), was the first formal theorem in game theory. It predated the work of von Neumann (the so-called “father of game theory”) by 15 years. It proves that yes, it is in fact possible to solve chess. We don’t know what the solution is, but we know that either White can force a win, or Black can force a win, or the result will be a draw if both play perfectly.

Of course, the guarantee that in principle there is a solution to chess doesn’t tell us much in practice. The exponential blowup in the number of possible games is so enormous that humans will never find this solution. Nonetheless, I still find it fascinating to think that the mystery of chess is ultimately a product of computational limitations, and that in principle, if we had a hypercomputer, we could just find the unique best chess game and watch it play out, either to a win by one side or to a draw. That would be a game that I would love to see.

✯✯✯

Here’s another fun thing. There’s an extremely bizarre variant on chess called suicide chess (or anti-chess). The goal of suicide chess is to lose all your pieces. Of course, if all the rules of play were the same, it would be practically impossible to win (since your opponent could always just keep refusing to take a piece that you are offering them). To remedy this, in suicide chess, capturing is mandatory! And if you have multiple possible captures, then you can choose among them.

Suicide chess gameplay is extremely complicated and unusual looking, and evaluating who is winning at any given moment tends to be really difficult, as sudden turnarounds are commonplace compared to ordinary chess. But one simplifying factor is that it tends to be easier to restrict your opponents’ moves. In ordinary chess, you can only restrict your opponents’ moves by blocking off their pieces or threatening their king. But in suicide chess, your opponents’ moves are restricted ANY time you put one of your pieces in their line of fire! This feature of the gameplay makes the exponential blow up in possible games more manageable.

Given this, it probably won’t be much of a surprise that suicide chess is, just like ordinary chess, in principle solvable. But here’s the crazy part. Suicide chess is solved!!

That’s right: it was proven a triple of years ago that White can force a win by moving first with e3!

Here’s the paper. The proof amounts to basically running a program that looks at all possible responses to e3 and expands out the game tree, ultimately showing that all branches can be terminated with White losing all pieces and winning the game.

Not only do we know that by starting with e3, White is guaranteed a win, we also know that Black can force a win if White starts with any of the following moves: a3, b4, c3, d3, d4, e4, f3, f4, h3, h4, Nc3, Nf3. As far as I was able to tell, there are only six opening moves remaining for which we don’t know if White wins, Black wins, or they draw: a4, b3, c4, e3, g3, and g4.

✯✯✯

Alright, final chess variant trivia. Infinite chess is just chess, but played on an infinite board.

There’s a mind-blowing connection between infinite chess and mathematical logic. As a refresher, a little while back I discussed the first-order theory of Peano arithmetic. This is the theory of natural numbers with addition and multiplication. If you recall, we found that Peano arithmetic was incomplete (in that not all first-order sentences about the natural numbers can be proven from its axioms). First order PA is also undecidable, in that there exists no algorithm that takes in a first order sentence and returns whether it is provable from the axioms. (In fact, first order logic in general is undecidable! To get decidability, you have to go to a weaker fragment of first order logic known as monadic predicate calculus, in which predicates take only one argument and there are no functions. As soon as you introduce a single binary predicate, you lose decidability.)

Okay, so first order PA (the theory of natural numbers with addition and multiplication) is incomplete and undecidable. But there are weaker fragments of first order PA that are decidable! Take away multiplication, and you have Presburger arithmetic, the theory of natural numbers with addition. Take away addition, and you have Skolem arithmetic, the theory of natural numbers with multiplication. Both of these fragments are significantly weaker than Peano arithmetic (each is unable to prove general statements about the missing operation, like that multiplication is commutative for Presburger arithmetic). But in exchange for this weakness, you get both decidability and completeness!

How does all this relate to infinite chess? Well, consider the problem of determining whether there exists a checkmate in n turns from a given starting position. This seems like a really hard problem, because unlike in ordinary chess, now it’s possible for there to be literally infinite possible moves for a given player from a position. (For instance, a queen on an empty diagonal can move to any of the infinite locations on this diagonal.) So apparently, the game tree for infinite chess, in general, branches infinitely. Given this, we might expect that this problem is not decidable.

Well, it turns out that any instance of this problem (any particular board setup, with the question of whether there’s a mate-in-n for one of the players) can be translated into a sentence in Presburger arithmetic. You do this by translating a position into a fixed length sequence of natural numbers, where each piece is given a sequence of numbers indicating its type and location. The possibility of attacks can be represented as equations about these numbers. And since the distance pieces (bishops, rooks, and queens – those that have in general an infinite number of available moves) all move in straight lines, there are simple equations expressible in Presburger arithmetic that describe whether these pieces can attack other pieces! From the attack relations, you can build up more complicated relations, including the mate-in-n relation.

So we have a translation from the mate-in-n problem to a sentence in Presburger arithmetic. But Presburger arithmetic is decidable! So there must also be a decision procedure for the mate-in-n problem in infinite chess. And not only is there a decision procedure for the mate-in-n problem, but there’s an algorithm that gives the precise strategy that achieves the win in the fewest number of moves!

Here’s the paper in which all of this is proven. It’s pretty wild. Many other infinite chess problems can be proven to be decidable by the same method (demonstrating interpretability of the problem in Presburger arithmetic). But interestingly, not all of them! This has a lot to do with the limitations of first-order logic. The question of whether, in general, there is a forced win from a given position can not be shown to be decidable in this way. (This relates to the general impossibility in first-order logic of expressing infinitely long statements. Determining whether a given position is a winning position for a given player requires looking at the mate-in-n problem, but without any upper bound on what this n is – on how many moves the win may take.) It’s not even clear whether the winning-position problem can be phrased in first-order arithmetic, or whether it requires going to second-order!

The paper takes this one step further. This proof of the decidability of the mate-in-n problem for infinite chess doesn’t crucially rest upon the two-dimensionality of the chess board. We could easily translate the proof to a three-dimensional board, just by changing the way we code positions! So in fact, we have a proof that the mate-in-n problem for k-dimensional infinite chess is decidable!

I’ll leave you with this infinite chess puzzle:

It’s White’s turn. Can they guarantee a checkmate in 12 moves or less?

# For Loops and Bounded Quantifiers in Lambda Calculus

Lambda calculus is an extremely minimal language that is powerful enough to express every computation that a Turing machine can do. Here is a fantastic video explaining the basic “rules of the game”:

Figuring out how to translate programs into lambda calculus can be a challenging puzzle. I recently discovered a nice way to use lambda calculus to generate for loops and bounded quantifiers, thus allowing recursion without requiring the Y Combinator. I’m sure similar things have been discovered by others, but things feel cooler than they are when you’ve found them, so I’ll present this function here in this post. 🙂

First, let me lay out some of the most important basic functions in lambda calculus. (A lot of this will be rehashing bits of the video lecture.)

Useful Tools

Propositional Logic

To give you a sense of how proving things works in lambda calculus, here’s a quick practice proof that ¬¬T is the same thing as T. We’ll prove this by reducing ¬¬T to T, which will show us that the two functions are extensionally equivalent (they return the same values on all inputs). Each step, we will either be substituting in the definition of some symbol, or evaluating a function by substituting its inputs in.

Another more convoluted approach is to show that the function (¬¬T ↔ T) reduces to the function T, which is to say that the two are equivalent. Notice that we’re not saying that the expression (¬¬T ↔ T) “has the truth value True”. There is no such thing as expressions with truth values in lambda calculus, there are just functions, and some functions are equivalent to the True function.

To shorten this proof, I’ll start out by proving the equivalence of ¬T and F, as well as the equivalence of ¬F and T. This will allow me to translate between these functions in one step.

Got the hang of it? Ok, on to the natural numbers!

Natural Numbers

In lambda calculus, numbers are adverbs. 1 is not “one”, it’s “once”. 1 is a function that takes in a function f and tells you to apply it one time. 2 is “twice”; it’s a function that tells you to apply f twice. And so on.

Defining things this way allows us to have beautifully simple definitions of addition, multiplication, and exponentiation. For instance, the successor of n is just what you get by one more application of f on top of n applications of f. And n plus m is just n S m, because this means that S (the successor function) should be applied n times to m. See if you can see for yourself why the definitions of multiplication and exponentiation make sense (and how they’re different! It’s not just the order of the n and m!).

Let’s prove that 1 + 1 = 2 using lambda calculus!

One more: Here’s a proof that 0 + k = k.

Interestingly, formally proving that k + 0 = k is significantly more convoluted, and there’s no obvious way to do it in general for all k (as opposed to producing a separate proof for each possible value of k). In addition, the proof length will end up being the size of k.

Okay, let’s dive deeper by looking at our first non-trivial lambda calculus data structure, the pair.

Pair Data Structure

The pair function can be fed two values (a and b) which can then be referenced at a later time by feeding it one more function. Feeding the function either of T or F will just select either the first or the second element in the pair. It might not be immediately obvious, but having this ability to, as it were, store functions “in memory” for later reference gives us enormous power. In fact, at this point we have everything we need to introduce the magical function which opens the door to quantification and recursion:

Magic Function

All that Φ does is take a pair of functions (a, b) to a new pair (f(a, b), g(a, b)). Here it’s written in more familiar notation:

Φ: (a, b)  (f(a, b), g(a, b))

Different choices of f and g give us very different types of behavior when we repeatedly apply Φ. First of all, let’s use Φ to subtract and compare numbers.

Subtraction and Comparison

φ1 simply chooses f and g to get the following behavior:

φ1: (a, b)  (b, b+1)

Here’s what happens with repeated application of φ1 to the pair (0, 0):

(0, 0) → (0, 1) → (1, 2) → (2, 3) → …

Looking at this, you can see how n applications of φ1 to (0, 0) gives the pair (n – 1, n), which justifies our definition of the predecessor function.

Our power is enhancing every minute; now, we create for loops!

For Loops and Quantifiers

B S F is the composition of successor and True, so it takes two inputs, fetches the value of the first input, and then adds one to it. Since B S T is the first input to Φ, it will be the function that determines the value of the first member of the output pair. In other words, φ2 gives the following mapping:

φ2: (a, b)  (a + 1, f(a, b))

To get a for loop, we now just iterate this function the appropriate number of times!

The for function I’ve written takes in four parameters (n, m, f, a), and is equivalent to the following program:

The  function is equivalent to the following:

𝑥 ∈ [n, m-1] θ(𝑥)

In code, this is:

And the  function is equivalent to:

𝑥 ∈ [n, m-1] θ(𝑥)

With these powerful tools in hand, we can now define more interesting functions, like one that detects if an input number is prime!

Let’s use lambda calculus to see if 2 is prime!

That got a little messy, but it’s nothing compared to the computation of whether 3 is prime!

This might sound strange, but I actually find it amazing how short and simple this is. To be fair, I skipped steps if all that they did was substitute in the definition of a function, instead opting to just immediately apply the definition and cut the number of steps in half. The full proof would actually be twice as long!

But nonetheless, try writing a full proof of the primality of 3 using only ZFC set theory in anywhere near as few lines! It’s interesting to me that a language as minimal and bare-bones as that of lambda calculus somehow manages to produce such concise proofs of interesting and complicated statements.

# There’s a problem with infinity

Last post I described the Ross-Littlewood paradox, in which an ever-expanding quantity of numbered billiard balls are placed into a cardboard box in such a way that after an infinite number of steps the box ends up empty. Here’s a version of this paradox:

Process 1
Step 1: Put 1 through 9 into the box.
Step 2: Take out 1, then put 10 through 19 into the box.
Step 3: Take out 2, then put 20 through 29 into the box.
Step 4: Take out 3, then put 30 through 39 into the box.
And so on.

Box contents after each step
Step 1: 1 through 9
Step 2: 2 through 19
Step 3: 3 through 29
Step 4: 4 through 39
And so on.

Now take a look at a similar process, where instead of removing balls from the box, we just change the number that labels them (so, for example, we paint a 0 after the 1 to turn “Ball 1” to “Ball 10″).

Process 2
Step 1: Put 1 through 9 into the box
Step 2: Change 1 to 10, then put 11 through 19 into the box.
Step 3: Change 2 to 20, then put 21 through 29 in.
Step 3: Change 3 to 30, then put 31 through 39 in.
And so on.

Box contents after each step
Step 1: 1 through 9
Step 2: 2 through 19
Step 3: 3 through 29
Step 4: 4 through 39
And so on.

Notice that the box contents are identical after each step. If that’s all that you are looking at (and you are not looking at what the person is doing during the step), then the two processes are indistinguishable. And yet, Process 1 ends with an empty box, and Process 2 ends with infinitely many balls in the box!

Why does Process 2 end with an infinite number of balls in it, you ask?

Process 2 ends with infinitely many balls in the box, because no balls are ever taken out. 1 becomes 10, which later becomes 100 becomes 1000, and so on forever. At infinity you have all the natural numbers, but with each one appended an infinite number of zeros.

So apparently the method you use matters, even when two methods provably get you identical results! There’s some sort of epistemic independence principle being violated here. The outputs of an agent’s actions should be all that matters, not the specific way in which the agent obtains those outputs! Something like that.

Somebody might respond to this: “But the outputs of the actions aren’t the same! In Process 1, each step ten are added and one removed, whereas in Process 2, each step nine are added. This is the same with respect to the box, but not with respect to the rest of the universe! After all, those balls being removed in Process 1 have to go somewhere. So somewhere in the universe there’s going to be a big pile of discarded balls, which will not be there in Process 2.

This responds holds water as long as our fictional universe doesn’t violate conservation of information, as if not, these balls can just vanish into thin air, leaving no trace of their existence. But that rebuttal feels cheap. Instead, let’s consider another variant that gets at the same underlying problem of “relevance of things that should be irrelevant”, but avoids this problem.

Process 1 (same as before)
Step 1: Put 1 through 9 into the box.
Step 2: Take out 1, then put 10 through 19 into the box.
Step 3: Take out 2, then put 20 through 29 into the box.
Step 4: Take out 3, then put 30 through 39 into the box.
And so on.

Box contents after each step
Step 1: 1 through 9
Step 2: 2 through 19
Step 3: 3 through 29
Step 4: 4 through 39
And so on.

And…

Process 3
Step 1: Put 1 through 9 into the box.
Step 2: Take out 9, then put 10 through 19 into the box.
Step 3: Take out 19, then put 20 through 29 into the box.
Step 4: Take out 29, then put 30 through 39 into the box.
And so on.

Box contents after each step
Step 1: 1 through 9
Step 2: 1 to 8, 10 to 19
Step 3: 1 to 8, 10 to 18, 20 to 29
Step 4: 1 to 8, 10 to 18, 20 to 28, 30 to 39
And so on

Okay, so as I’ve written it, the contents of each box after each step are different in Processes 1 and 3. Just one last thing we need to do: erase the labels on the balls. The labels will now just be stored safely inside our minds as we look over the balls, which will be indistinguishable from one another except in their positions.

Now we have two processes that look identical at each step with respect to the box, AND with respect to the external world. And yet, the second process ends with an infinite number of balls in the box, and the first with none! (Every number that’s not one less than a multiple of ten will be in there.) It appears that you have to admit that the means used to obtain an end really do matter.

But it’s worse than this. You can arrange things so that you can’t tell any difference between the two processes, even when observing exactly what happens in each step. How? Well, if the labelling is all in your heads, then you can switch around the labels you’ve applied without doing any harm to the logic of the thought experiment. So let’s rewrite Process 3, but fill in both the order of the balls in the box and the mental labelling being used:

Process 3
1 2 3 4 5 6 7 8 9
Mentally rotate labels to the right:
9 1 2 3 4 5 6 7 8
Remove the furthest left ball:
1 2 3 4 5 6 7 8
Add the next ten balls to the right in increasing order:
1 2 3 4 5 6 7 8 10 11 12 13 14 15 16 17 18 19
Repeat!

Compare this to Process 1, supposing that it’s done without any relabelling:

Process 1
1 2 3 4 5 6 7 8 9
Remove the furthest left ball:
2 3 4 5 6 7 8 9
Add the next tell balls to the right in increasing order:
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 18 19
Repeat!

If the labels are all in your head, then these two processes are literally identical except for how a human being is thinking about them.

But looking at Process 3, you can prove that after Step 1 there will always be a ball labelled 1 in the box. Same with 2, 3, 4, and all other numbers that are not a multiple of 10 minus one. Even though we remove an infinity of balls, there are ball numbers that are never removed. And if we look at the pile of discarded balls, we’ll see that it consists of 9, 19, 29, 39, and so on, but none of the others. Unless some ball numbers vanish in the process (which they never do!), all the remainders must still be sitting in the box!

So we have two identical-in-every-relevant-way processes, one of which ends with an infinite number of balls in the box and the other with zero. Do you find this troubling? I find this very troubling. If we add some basic assumption that an objective reality exists independent of our thoughts about it, then we’ve obtained a straightforward contradiction.

✯✯✯

Notice that it’s not enough to say “Well, in our universe this process could never be completed.” This is for two reasons:

First of all, it’s actually not obvious that supertasks (tasks involving the completion of an infinite number of steps in a finite amount of time) cannot be performed in our universe. In fact, if space and time are continuous, then every time you wave your hand you are completing a sort of supertask.

You can even construct fairly physically plausible versions of some of the famous paradoxical supertasks. Take the light bulb that blinks on and off at intervals that get shorter and shorter, such that after some finite duration it has blinked an infinity of times. We can’t say that the bulb is on at the end (as that would seem to imply that the sequence 0101010… had a last number) or that it is off (for much the same reason). But these are the only two allowed states of the bulb! (Assume the bulb is robust against bursting and all the other clever ways you can distract from the point of the thought experiment.)

Now, here’s a variant that seems fairly physically reasonable:

A ball is dropped onto a conductive plate that is attached by wire to a light bulb. The ball is also wired to the bulb, so that when the ball contacts the plate, a circuit is completed that switches the light bulb on. Each bounce, the ball loses some energy to friction, cutting its velocity exactly in half. This means that after each bounce, the ball hangs in the air for half as long as it did the previous bounce.

Suppose the time between the first and second bounce was 1 second. Then the time between the second and third will be .5 seconds. And next will be .25 seconds. And so on. At 2 seconds, the ball will have bounced an infinite number of times. So at 2 seconds, the light bulb will have switched on and off an infinite number of times.

And of course, at 2 seconds the ball is at rest on the plate, completing the circuit. So at 2 seconds, upon the completion of the supertask, the light will be on.

Notice that there are no infinite velocities here, or infinite quantities of energy. Just ordinary classical mechanics applied to a bouncing ball and a light bulb. What about infinite accelerations? Well even that is not strictly speaking necessary; we just imagine that each velocity reversal takes some amount of time, which shrinks to zero as the velocity shrinks to zero in such a way as to keep all accelerations finite and sum to a finite total duration.

All this is just to say that we shouldn’t be too hasty in dismissing the real-world possibility of apparently paradoxical supertasks.

But secondly, and more importantly, physical possibility is not the appropriate barometer of whether we should take a thought experiment seriously. Don’t be the person that argues that the fat man wouldn’t be sufficient to stop a trolley’s momentum. When we find that some intuitive conceptual assumptions lead us into trouble, the takeaway is that we need to closely examine and potentially revise our concepts!

Think about Russell’s paradox, which showed that some of our most central intuitions about the concept of a set lead us to contradiction. Whether or not the sets that Bertie was discussing can be pointed to in the physical world is completely immaterial to the argument. Thinking otherwise would have slowed down progress in axiomatic set theory immensely!

These thought experiments are a problem if you believe that it is logically possible for there to be a physical universe in which these setups are instantiated. That’s apparently all that’s required to get a paradox, not that the universe we live in happens to be that one.

So it appears that we have to conclude some limited step in the direction of finitism, in which we rule out a priori the possibility of a universe that allows these types of supertasks. I’m quite uncomfortable with this conclusion, for what it’s worth, but I don’t currently see a better option.

## The Puzzle

You have in front of you an empty box. You also have on hand an infinite source of billiard balls, numbered 0, 1, 2, 3, 4, and so on forever.

At time zero, you place balls 0 and 1 in the box.

In thirty minutes, you remove ball 0 from the box, and place in two new balls (2 and 3).

Fifteen minutes after that, you remove ball 1 from the box, and place in two new balls (4 and 5).

7.5 minutes after that, you remove ball 2 and place in balls 6 and 7.

And so on.

After an hour, you will have taken an infinite number of steps. How many billiard balls will be in the box?

✯✯✯

At time zero, the box contains two balls (0 and 1). After thirty minutes, it contains three (1, 2, and 3). After 45 minutes, it contains four (2, 3, 4, and 5). You can see where this is going…

Naively taking the limit of this process, we arrive at the conclusion that the box will contain an infinity of balls.

But hold on. Ask yourself the following question: If you think that the box contains an infinity of balls, name one ball that’s in there. Go ahead! Give me a single number such that at the end of this process, the ball with that number is sitting in the box.

The problem is that you cannot do this. Every single ball that is put in at some step is removed at some later step. So for any number you tell me, I can point you to the exact time at which that ball was removed from the box, never to be returned to it!

But if any ball that you can name can be proven to not be in the box.. and every ball you put in there was named… then there must be zero balls in the box at the end!

In other words, as time passes and you get closer and closer to the one-hour mark, the number of balls in the box appears to be growing, more and more quickly each moment, until you hit the one-hour mark. At that exact moment, the box suddenly becomes completely empty. Spooky, right??

Let’s make it weirder.

What if at each step, you didn’t just put in two new balls, but one MILLION? So you start out at time zero by putting balls 0, 1, 2, 3, and so on up to 1 million into the empty box. After thirty minutes, you take out ball 1, but replace it with the next 1 million numbered balls. And at the 45-minute mark, you take out ball 2 and add the next 1 million.

What’ll happen now?

Well, the exact same argument we gave initially applies here! Any ball that is put in the box at any point, is also removed at a later point. So you literally cannot name any ball that will still be in the box after the hour is up, because there are no balls left in the box! The magic of infinity doesn’t care about how many more balls you’ve put in than removed at any given time, it still delivers you an empty box at the end!

Now, here’s a final variant. What if, instead of removing the smallest numbered ball each step, you removed the largest numbered ball?

So, for instance, at the beginning you put in balls 0 and 1. Then at thirty minutes you take out ball 1, and put in balls 2 and 3. At 45 minutes, you take out ball 3, and put in balls 4 and 5. And so on, until you hit the one hour mark. Now how many balls are there in the box?

Infinity! Why not zero like before? Well, because now I can name you an infinity of numbers whose billiard balls are still guaranteed to be in the box when the hour’s up. Namely, 0, 2, 4, 6, and all the other even numbered balls are still going to be in there.

Take a moment to reflect on how bizarre this is. We removed the exact same number of balls each step as we did last time. All that changed is the label on the balls we removed! We could even imagine taking off all the labels so that all we have are identical plain billiard balls, and just labeling them purely in our minds. Now apparently the choice of whether to mentally label the balls in increasing or decreasing order will determine whether at the end of the hour the box is empty or packed infinitely full. What?!? It’s stuff like this that makes me sympathize with ultrafinitists.

One final twist: what happens if the ball that we remove each step is determined randomly? Then how many balls will there be once the hour is up? I’ll leave it to you all to puzzle over!

# Can an irrational number raised to an irrational power be rational?

There’s a wonderful proof that yes, indeed, this is possible, and it goes as follows:

Let’s consider the number $\sqrt 2 ^ {\sqrt 2}$. This number is either rational or irrational. Let’s examine each case.

## Case 1: $\sqrt 2 ^ {\sqrt 2}$$\sqrt 2 ^ {\sqrt 2}$ is rational.

Recall that $\sqrt 2$ is irrational. So if $\sqrt 2 ^ {\sqrt 2}$ is rational, then we have proven that it’s possible to raise an irrational number to an irrational power and get a rational value. Done!

## Case 2: $\sqrt 2 ^ {\sqrt 2}$$\sqrt 2 ^ {\sqrt 2}$ is irrational.

In this case, $\sqrt 2 ^ {\sqrt 2}$ and $\sqrt 2$ are both irrational numbers. So what if we raise the $\sqrt 2 ^ {\sqrt 2}$ to the power of $\sqrt 2$?

$\left( \sqrt 2 ^ {\sqrt 2} \right) ^{\sqrt 2} = \sqrt 2 ^ {\sqrt 2 \cdot \sqrt 2} = \sqrt 2 ^ 2 = 2$

So in this case, we have again found a pair of irrational numbers such that one raised to the power of the other is a rational number! Proof complete!

✯✯✯

One thing that’s fun about this proof is that the result is pretty surprising. I would not have guessed a priori that you could get a rational by raising one irrational to another; it just seems like irrationality is the type of thing that would be closed under ordinary arithmetic operations.

But an even cooler thing is that it’s a non-constructive proof. By the end of the proof, we know for sure that there is a pair of irrational numbers such that one raised to the other gives us a rational number, but we have no idea whether it’s ($\sqrt 2$, $\sqrt 2$) or ($\sqrt 2 ^ {\sqrt 2}$,$\sqrt 2$).

(It turns out that it’s the second. The Gelfond–Schneider theorem tells us that for any two non-zero algebraic numbers a and b with a ≠ 1 and b irrational, the number ab is irrational. So $\sqrt 2 ^ {\sqrt 2}$ is in fact irrational.)

Now, most mathematicians are totally fine with non-constructive proofs, as long as they follow all the usual rules of proofs. But there is a branch of mathematics known as constructive mathematics that only accepts constructive proofs of existence. Within constructive mathematics, this proof is not valid!

Now, it so happens that you can prove the irrationality of $\sqrt 2 ^ {\sqrt 2}$ by purely constructive means, but that’s besides the point. To my eyes, the refusal to accept such an elegant and simple proof because it asserts a number’s existence without telling us exactly what it is just looks a little silly!

Along similar lines, here’s one more fun problem.

## Are $\pi + e$$\pi + e$ and $\pi - e$$\pi - e$ transcendental?

We know that $\pi$ and $e$ are both transcendental numbers (i.e. they cannot be expressed as the roots of any polynomial with rational coefficients). But are $\pi + e$ and $\pi - e$ both transcendental?

It turns out that this amazingly simple sounding problem is unsolved to this day! But one thing that we do know is that it can’t be that neither of them are transcendental. Because if this was the case, then their sum $(\pi + e) + (\pi - e) = 2 \pi$ would also not be transcendental, which we know is false! So we know that at least one of them has to be true, using a proof that doesn’t guarantee the truth of either of them! Cool, right?