Building the diffusion operator

Part of the quantum computing series
Part 1: Quantum Computing in a Nutshell
Part 2: More on quantum gates
Part 3: Deutch-Josza Algorithm
Part 4: Grover’s algorithm
Part 5: Building the quantum oracle

Let’s more precisely define the Grover diffusion operator D we used for Grover’s algorithm, and see why it functions to flip amplitudes over the average amplitude.

First off, here’s a useful bit of shorthand we’ll use throughout the post. We define the uniform superposition over states as |s⟩:

Screen Shot 2018-07-17 at 1.32.40 PM

We previously wrote that flipping an amplitude ax over the average of all amplitudes ā involved the transformation ax → 2ā – ax. This can be understood by a simple geometric argument:


Now, the primary challenge is to figure out how to build a quantum gate that returns the average amplitude of a state. In other words, we want to find an operator A such that acting on a state ⟩ gives:

Screen Shot 2018-07-17 at 2.25.12 PM

If we can find this operator, then we can just define D as follows:

Screen Shot 2018-07-17 at 1.47.26 PM

It turns out that we can define A solely in terms of the uniform superposition.

Screen Shot 2018-07-17 at 2.00.08 PM

As a matrix, A would look like:

Screen Shot 2018-07-17 at 3.03.23 PM.png

Proof that this satisfies the definition:

Screen Shot 2018-07-17 at 2.41.44 PM.pngScreen Shot 2018-07-17 at 2.38.38 PM

Thus we have our full definition of D!

Screen Shot 2018-07-17 at 2.44.58 PM.png

As a matrix, D looks like:

Screen Shot 2018-07-17 at 3.05.26 PM

Building the quantum oracle

Part 1: Quantum Computing in a Nutshell
Part 2: More on quantum gates
Part 3: Deutch-Josza Algorithm
Part 4: Grover’s algorithm

The quantum gate Uf was featured centrally in both of the previous algorithms I presented. Remember what it does to a qubit in state |x⟩, where x ∈ {0, 1}N:

Screen Shot 2018-07-16 at 11.48.24 PM

I want to show here that this gate can be constructed from a simpler more intuitive version of a quantum oracle. This will also be good practice for getting a deeper intuition about how quantum gates work.

This will take three steps.

1. Addition Modulo 2

First we need to be able implement addition modulo 2 of two single qubits. This operation is defined as follows:

Screen Shot 2018-07-17 at 11.39.33 AM

An implementation of this operation as a quantum gate needs to return two qubits instead of just one. A simple choice might be:

Screen Shot 2018-07-17 at 11.56.51 AM

Screen Shot 2018-07-17 at 11.54.16 AM


2. Oracle

Next we’ll need a straight-forward implementation of the oracle for our function f as a quantum gate. Remember that f is a function from {0, 1}N → {0, 1}. Quantum gates must have the same number of inputs and outputs, and f takes in N bits and returns only a single bit, so we have to improvise a little. A simple implementation is the following:

Screen Shot 2018-07-17 at 12.12.26 PM

In other words, we start with N qubits encoding the input to x, as well as a “blank” qubit that starts as |0⟩. Then we leave the first N qubits unchanged, and encode the value of f(x) in the initially blank qubit.

3. Flipping signs

Finally, we’ll use a clever trick. Let’s take a second look at the ⊕ gate.


Suppose we start with

Screen Shot 2018-07-17 at 12.36.10 PM

Then we get:


Let’s consider both cases, f(x) = 0 and f(x) = 1.

Screen Shot 2018-07-17 at 12.44.59 PM

Also we can notice that we can get the state |y⟩ by applying a Hadamard gate to a qubit in the state |1⟩. Thus we can draw:


Putting it all together

We combine everything we learned so far in the following way:


Screen Shot 2018-07-17 at 12.56.21 PM.png

If we now ignore the last two qubits, as they were only really of interest to us for the purposes of building U, we get:

Screen Shot 2018-07-17 at 12.57.30 PM

And there we have it! We have built the quantum gate Uf that we used in the last two posts.



Grover’s algorithm

This is part 4 of my series on quantum computing. The earlier posts provide necessary background material.
Part 1: Quantum Computing in a Nutshell
Part 2: More on quantum gates
Part 3: Deutch-Josza Algorithm

Grover’s algorithm

This algorithm involves searching through an unsorted list for a particular item. Let’s first look at how this can be optimally solved classically.

Suppose you have private access to the following list:

  1. apple
  2. banana
  3. grapefruit
  4. kiwi
  5. guava
  6. mango
  7. lemon
  8. papaya

A friend of yours wants to know where on the list they could find guava. They are allowed to ask you questions like “Is the first item on the list guava?” and “Is the nth item on the list gauva?”, for any n they choose. Importantly, they have no information about the ordering of your list.

How many queries do they need to perform in order to answer this question?

Well, since they have no information about its placement, they can do no better than querying random items in the list and checking them off one by one. In the best case, they find it on their first query. And in the worst case, they find it on their last query. Thus, if the list is of length N, the number of queries in the average case is N/2.

Grover’s algorithm solves the same problem with roughly √N queries. This is only a quadratic speedup, but still should seem totally impossible. What this means is that in a list of 1,000,000 items, you can find any item of your choice with only about 1,000 queries.

Let’s see how.

Firstly, we can think about the search problem as a function-analysis problem. We’re not really interested in what the items in the list are, just whether or not the item is guava. So we can transform our list of items into a simple binary function: 0 if the input is not the index of the item ‘guava’ and 1 if the input is the index of the item ‘guava’. Now our list looks like:

Screen Shot 2018-07-17 at 12.40.53 AM

Our challenge is now to find out for which value of x f(x) returns 1.

Now, this algorithm uses three quantum gates. Two of them we’ve already seen: HN and Uf. I’ll remind you what these two do:

Screen Shot 2018-07-16 at 11.48.15 PM
Screen Shot 2018-07-16 at 11.48.24 PM

The third is called the Grover diffusion operator, D. In words, what D does to a state Ψ is reflect it over the average amplitude in Ψ. Visually, this looks like:


Mathematically, this transformation can be defined as follows:

Screen Shot 2018-07-17 at 12.24.52 AM

Check for yourself that 2ā – ax flips the amplitude ax over ā. We are guaranteed that D is a valid quantum gate because it keeps the state normalized (the average amplitude remains the same after the flip).

Now, with HNU, and D in hand, we are ready to present the algorithm:


We start with all N qubits in the state |00…0⟩, and apply the Hadamard gate to put them in a uniform superposition. Then we apply U to flip the amplitude of the desired input, and D to flip the whole superposition over the average. We repeat this step square root of the length of the list times, and then measure the qubits.

Here’s a visual intuition for why this works. Let’s suppose that N=3 (so our list contains 8 items), and the item we’re looking for is the 5th in the list (100 in binary). Here is each step in the algorithm:


You can see that what the combination of U and D does to the state is that it magnifies the amplitude of the desired value of f. As you do this again and again, the amplitude is repeatedly magnified, reaching a maximum after sqrt(2N) repeats.

And that’s Grover’s algorithm! Once again we see that the key to it is the unusual phenomenon of superposition. By putting the state into superposition in our first step, we manage to make each individual query give us information about all possible inputs. And through a little cleverness, we are able to massage the amplitudes of our qubits in order that our desired output becomes overwhelmingly likely.

A difference between Grover’s algorithm and the Deutsch-Josza algorithm is that Grover’s algorithm is non-deterministic. It isn’t guaranteed with 100% probability of giving the right answer. But each run of the algorithm only delivers an incorrect answer with probability 1/2N, which is an exponentially small error. Repeating the algorithm n times gives you the wrong answer with only (1/2N)2 probability. And so on.

While it is a more technically complicated algorithm than the Deutsch-Josza algorithm, I find Grover’s algorithm easier to intuitively understand. Let me know in the comments if anything remains unclear!

Deutch-Josza Algorithm

Part 3 of the series on quantum computing.
Part 1: Quantum Computing in a Nutshell
Part 2: More on quantum gates

Okay, now that we’re familiar with qubits and some basic quantum gates, we’re ready to jump into quantum algorithms. First will be the Deutch-Josza algorithm, famous for its exponential speedup.

The problem to solve: You are handed a function f from {0,1}N to {0,1}, and guaranteed that it is either balanced or constant. A balanced function outputs an equal number of 0s and 1s across all inputs, and a constant function outputs all 0s or all 1s. For instance, if N = 3:

Screen Shot 2018-07-16 at 8.23.43 PM

f1 is balanced, and f2 is constant.

You are given an oracle that allows you to query f on an input of your choice. Your goal is to find out if f is balanced or constant with as few queries to the oracle as possible.

Let’s first figure out the number of queries classically required to solve this problem.

If f is balanced, then the minimum amount of queries required to determine that it is not constant is two – a 0 and a 1. And the worst case is that f is constant. In this case, we can only be sure it is not balanced by searching through half of all possible inputs, plus one. (For instance, we might get the series of outputs 0, 0, 0, 0, 0.)

So, since the number of possible inputs is 2N, the number of queries is between 2 and 2N-1 + 1. The average case thus involves roughly 2N-2 queries.

How about if we try to go about solving this problem with a quantum computers? How many queries to the oracle need we perform in order to determine if the function is balanced or constant?

Just one!

It turns out that there exists an algorithm that uses the oracle a single time, and always determines with 100% confidence whether f is balanced or constant.

Let’s prove it.

The algorithm uses only two different gates: the Hadamard gate H, and the quantum version of the oracle gate.

We’ll call the oracle gate Uf. Acting on the qubit state |x⟩, it returns (-1)f(x)|x⟩. That is, it flips the sign of the state if and only if evaluating f on the input x outputs 1. Otherwise it leaves the state of the qubits unchanged. So we have:

Screen Shot 2018-07-16 at 9.44.14 PM

In addition, we will use the general N-qubit Hadamard gate HN. We discussed how to get the matrix for the 2-qubit Hadamard gate last time. The general form of the Hadamard gate is:

Screen Shot 2018-07-16 at 9.59.39 PM.png

(You can prove this by showing it’s true for N=1, and then showing that if it’s true for N, then it’s true for N+1.)

Without further ado, here is the Deutch-Josza algorithm:


In words, you start with all N qubits in the state |00…0⟩. Then you apply HN to all of them. Then you apply Uf, and finally HN again.

Upon measurement, if you find the qubits in the state |00…0⟩, then f is a constant function. If you find them in any other state, then f is guaranteed to be a balanced function.

Why is this so? Let’s prove it by tracking the states through each step.

At first, the state is |00…0⟩. After applying the Hadamard gate, we get:

Screen Shot 2018-07-16 at 10.36.43 PM

In words, the qubit enters a uniform superposition over all possible states.

Now, applying Uf to the new state, we get:

Screen Shot 2018-07-16 at 10.34.49 PM

And finally, applying the Hadamard gate to this state, we get:

Screen Shot 2018-07-16 at 10.35.18 PM

For the purposes of this algorithm, we’re really only interested in the amplitude of the |00…0⟩ state. This amplitude is simply:

Screen Shot 2018-07-16 at 10.46.04 PM

What we want to show is that the square of this amplitude is 0 if f is balanced and 1 if f is constant. (I.e. the probability of observing |00…0⟩ is 0% for a balanced function and 100% for a constant function). Let’s look at both cases in turn.

Case 1: f is constant

If f is constant, than all the terms in the sum are the same: either 1/2N or -1/2N. Since there are 2N values for y to be summed over, the amplitude is either +1 or -1. And since the probability is the amplitude squared, this probability is guaranteed to be 100%!

Case 2: f is balanced

If f is balanced, then there are an equal number of 1/2N terms and -1/2N terms. I.e. every 1/2N term cancels out a -1/2N term. Thus, the amplitude is zero, and so is the probability!

In conclusion, we’ve now seen how a quantum computer can take a problem that can only be classically solved with an average of 2N-2 queries, and solves it with exactly 1 query. I.e. quantum algorithms can provide exponential speedups over classical algorithms!

Looking carefully at the inner workings of this algorithm, you can get a clue as to how it achieves this speedup. The key is in the ability to superpose a single qubit into multiple observable states. This allowed us to apply our oracle operator to all possible inputs at once, in exactly one step.

Somebody that believed in the many-worlds interpretation of quantum mechanics would say that when we put our qubit into superposition, we were creating 2N worlds, each of which contained a different possible input value. Then we leveraged this exponential number of worlds for our computational benefit, applying the operation in all the worlds at once before collapsing the superposition back to one single world with the final Hadamard gate.

It’s worth making a historical note here. David Deutsch was one of the two people that discovered this algorithm, which was the first example where a quantum computer would be able to surpass the ordinary limits on classical computers. He is also a hardcore proponent of the many-worlds interpretation, and has stated that the fact that quantum computers can do more than classical computers is strong evidence for this interpretation. His argument is that the only possible explanation for quantum computers surpassing the classical limits is that some of the computation is being offloaded into other universes.

Anyway, next we’ll look at a more famous algorithm, Grover’s algorithm. This one only provides a quadratic (not exponential) speed up. However, it solves a more useful problem: how to search through an unsorted list to find a particular item. See you next time!

More on quantum gates

Context for this post.

In the last post, we described what qubits are and how quantum computing involves the manipulation of these qubits to perform useful calculations.

In this post, we’ll abstract away from the details of the physics of qubits and just call the two observable states |0⟩ and |1⟩, rather than |ON⟩ and |OFF⟩. This will be useful for ultimately describing quantum algorithms. But before we get there, we need to take a few more steps into the details of quantum gates.

Recap: the general description of a qubit is |Ψ⟩ = α|0⟩ + β|1⟩, where α and β are called amplitudes, and |α|² and |β|² are the probabilities of observing the system in the state |0⟩ and |1⟩, respectively.

We can also express the states of qubits as vectors, like so:

Screen Shot 2018-07-15 at 12.03.18 AM.png

Quantum gates are transformations from quantum states to other quantum states. We can express these transformations as matrices, which when applied to state vectors yield new state vectors. Here’s a simple example of a quantum gate called the X gate:

Screen Shot 2018-07-14 at 11.50.10 PM

Applied to the states |0⟩ and |1⟩, this gate yields

Screen Shot 2018-07-15 at 12.05.21 AM.png

Applied to any general state, this gate yields:

Screen Shot 2018-07-15 at 12.05.12 AM.png

Another gate that is used all the time is the Hadamard gate, or H gate.

Screen Shot 2018-07-15 at 12.08.14 AM

Let’s see what it does to the |0⟩ and |1⟩ states:

Screen Shot 2018-07-15 at 12.12.08 AM.png

In words, H puts ordinary states into superposition. Superposition is the key to quantum computing. Without it, all we have is a fancier way of talking about classical computing. So it should make sense that H is a very useful gate.

One more note on H: When you apply it to a state twice, you get back the state you started with. A simple proof of this comes by just multiplying the H matrix by itself:

Screen Shot 2018-07-15 at 12.48.13 AM.png

Okay, enough with single qubits. While they’re pretty cool as far as they go, any non-trivial quantum algorithm is going to involve multiple qubits.

It turns out that everything we’ve said so far generalizes quite nicely. If we have two qubits, we describe the combined system by smushing them together with what’s called a tensor product (denoted ⊗). What this ends up looking like is the following:

Screen Shot 2018-07-15 at 12.53.42 AM.png

The first number refers to the state of the first qubit, and the second refers to the state of the second.

Let’s smush together two arbitrary qubits:

Screen Shot 2018-07-15 at 1.00.08 AM

This is pretty much exactly what we should have expected combining qubit states would look like.

The amplitude for the combined state to be |00⟩ is just the product of the amplitude for the first qubit to be |0⟩ and the second to be |0⟩. The amplitude for the combined state to be |01⟩ is just the product of the amplitude for the first qubit to be |0⟩ and second to be |1⟩. And so on.

We can write a general two qubit state as a vector with four components.

Screen Shot 2018-07-15 at 1.05.34 AM

And as you might expect by now, two-qubit gates are simply 4 by 4 matrices that act on such vectors to produce new vectors. For instance, we can calculate the 4×4 matrix corresponding to the action of a Hadamard gate on both qubits:

Screen Shot 2018-07-15 at 1.21.28 AM.png

Why the two-qubit Hadamard gate has this exact form is a little beyond the scope of this post. Suffice it to say that this is the 4×4 matrix that successfully transforms two qubits as if they had each been put through a single-qubit Hadamard gate. (You can verify this for yourself by simply applying H to each qubit individually and then smushing them together in the way we described above.)

Here’s what the two-qubit Hadamard gate does to the four basic two-qubit states.

Screen Shot 2018-07-15 at 1.25.51 AM.png

Here’s a visual representation of this transformation using bar graphs:


We can easily extend this further to three, four, or more qubits. The state vector describing a N-qubit system must consider the amplitude for all possible combinations of 0s and 1s for each qubit. There are 2ᴺ such combinations (starting at 00…0 and ending at 11…1). So the vector describing an N-qubit system is composed of 2ᴺ complex numbers.

If you’ve followed everything so far, then we are now ready to move on to some actual quantum algorithms! In the next post, we’ll see first how qubits can be used to solve problems that classical bits cannot, and then why quantum computers have this enhanced problem-solving ability.

Quantum Computing in a Nutshell

You can think about classical bits as like little switches. They have two possible states, ON and OFF, and are in either one or the other.


Then we have logic gates, which are like operations we perform on the switches. If we have one switch, we might flip it from whatever state it’s in to the opposite state. If we have two, we might flip the second switch conditional on the first switch being ON, and otherwise do nothing. And so on.


Put together a bunch of these bits and gates and you get an algorithm. If you arrange them cleverly enough, you end up getting interesting computations like adding binary numbers and factoring primes and playing Skyrim.

Quantum computing is a fundamentally different beast. To start with, the fundamental units of quantum computation are non-deterministic. We call them qubits, for quantum bits. While a qubit can, like a classical bit, be in the state ON and the state OFF, it can also be in other more exotic states:


Important! Just like a classical bit, a qubit will only ever be observed in two possible states, ON and OFF. We never observe a qubit being in the state “36% ON and 64% OFF.” But if we prepare a qubit in that particular state, and then measure it over and over again, we could find that 36% of the time it is ON and 64% of the time it is OFF. That is, the state of a qubit specifies the probability distribution over possible observed outcomes.

Let’s compare the complexity of bits to the complexity of qubits.

To specify the state of a classical bit, we only need to answer a single yes or no question: Is it ON?

To specify the state of a quantum bit, we need to answer a more complicated question: What is the probability that it is ON?

Since probabilities are real numbers, answering the second question requires infinitely more information than answering the first. (In general, specifying a real number to perfect precision requires answering an unbounded number of yes or no questions.) The implication of this is that in some sense, the state of a qubit contains infinitely more information than the state of a classical bit. The trick of quantum computing is to exploit this enhanced information capacity in order to build quantum algorithms that beat out the best classical algorithms.

Moving on! The set of all states that a qubit can be in is the set of all probability distributions over the outcomes ON and OFF.


Now, we could describe a simple probabilistic extension to classical computation, where logic gates transform probability distributions into different probability distributions, and algorithms neatly combine these gates to get useful computations, and be done with it. But it turns out that things are a slight bit more complicated than this. Having described this probabilistic extension, we would still not have quantum computing.

In fact, the states of qubits are specified not by probability distributions over the observations ON and OFF, but an amplitude distribution over these observations.

What are these amplitudes? An amplitude is simply a thing that you square to get the probability of the observation. Amplitudes have a wider range of possible values than probabilities. While a probability has to be greater than zero, an amplitude can be negative (since the square of a negative number will still be positive). In fact, amplitudes can even be complex numbers, in which case the square of the amplitude is just the square of its distance from zero in the complex plane.


Now, why does it matter that the set of quantum states be described by amplitudes rather than probabilities? After all, the amplitudes will always just be converted back to probabilities when we observe the qubits, so what difference does it make if the probability came from a negative amplitude or a positive one? The answer to this question is difficult, but it comes down to this:

For some reason, it appears that on the quantum level, the universe calculates the states of particles in terms of these complex numbers we call amplitudes, not probabilities. This was the discovery of experimentalists in the 1900s that looked at things like the double slit experiment, the Stern-Gerlach experiment, and so on. If you try to just analyze everything in terms of probabilities instead of amplitudes, you will get the wrong answer.

We’ll write the state of a qubit that has an amplitude alpha of being ON and an amplitude beta of being OFF as follows:


It’s useful to have a few different ways of visualizing the state of a qubit.


Now, quantum gates are transformations that take amplitude distributions to different amplitude distributions.


The set of physically realizable quantum gates is the set of all transformations that take normalized states (|α|² + |β|² = 1) to other normalized states. Some of these gates are given special names like the Hadamard gate or the CNOT gate and crop up all the time. And just like with classical states, you can put together these gates in clever ways to construct algorithms that do interesting computations for you.

So, that’s quantum computing!

Now, the first thing to notice is that the set of all things that quantum computers can do must contain the set of all things that classical computers can do. Why? Because classical computers are just a special case of quantum computers where states are only allowed to be either ON or OFF. Every classical logic gate has a parallel quantum logic gate, and so every classical algorithm is a quantum algorithm is disguise.

But quantum computers can also do more than classical computers. In the next posts, I’ll give two examples of quantum algorithms we’ve discovered that solve problems at speeds that are classically impossible: the Deutsch-Jozsa algorithm and Grover’s algorithm. Stay tuned!

Matter and interactions in quantum mechanics

Here’s another post on how quantum mechanics is weird.

Let’s consider the operation of swapping around different quantum particles. If the coordinates of the first particle are denoted x1 and the second denoted x2, we’ll be interested in the transformation:

Ψ(x1, x2) → Ψ(x2, x1)

Let’s just give this transformation a name. We’ll call it S, for “swap”. S is an operator that is defined as follows:

S Ψ(x1, x2) = Ψ(x2, x1)

Now, clearly if you swap the particles’ coordinates two times, you get back the same thing you started with. That is:

S2 Ψ(x1, x2) = Ψ(x1, x2)

This implies that S2 is an operator with eigenvalue +1. What does this tell us about the spectrum of eigenvalues of S?

Suppose S has an eigenvalue λ. Then, from the above, we see:

S2 Ψ = Ψ
S SΨ = Ψ
S λΨ = Ψ
λ SΨ = Ψ
λ2 Ψ = Ψ

So λ = ±1.

In other words, the only possible eigenvalues of S are +1 and -1.

This tells us that the wave functions of particles must obey one of the following two equations:

SΨ = +Ψ
SΨ = -Ψ

Said another way…

Ψ(x1, x2) = Ψ(x2, x1)
Ψ(x1, x2) = -Ψ(x2, x1)

This is pretty important! It says that by the nature of what it means to swap particles around and the structure of quantum mechanics, it must be the case that the wave function obtained by switching around coordinates is either the same as the original wave function, or its negative.

This is a huge constraint on the space of functions we’re considering; most functions don’t have this type of symmetry (e.g. x2 + y ≠ y2 + x).

So we have two possible types of wave functions: those that flip signs upon coordinate swaps, and those that don’t. It turns out that these two types of wave functions describe fundamentally different kinds of particles. The first type of wave functions describes fermions, and the second type describes bosons.

All particles in the standard model of particle physics are either fermions or bosons (as expected by the above argument, since there are only two possible eigenvalues of S). Fermions include electrons, quarks, and neutrinos. Bosons include photons, gravitons, and the Higgs boson. This sampling of particles from each category should give you the intuitive sense that fermions are “matter-like” and bosons “force-like.”

Of course, the definition of fermions and bosons we gave above doesn’t reference matter-like or force-like behavior. Somehow the qualitative difference between fermions and bosons must arise from this simple sign difference. How?

We’ll start exploring this by examining the concept of non-separability of wave functions.

A wave function is separable if it is possible to write it as a product of two individual wave functions, one for each coordinate:

Ψ(x1, x2) = Ψ1(x1) Ψ2(x2)

This is a really nice property. For one thing, it allows us to solve the Schrodinger equation extremely easily by using the differential equations method of variable separation. For another, it tells us that the wave function is simple in a way. If we’re interested in only one set of coordinates, say x1, then we can easily disregard the whole rest of the wave function and just look at Ψ1.

But what does it mean? Well, if a wave function is separable, then it is possible to sensibly ask questions about the properties of individual particles, independent of each other. Why? Because you can just look at the component of the wave function corresponding to the particle you’re interested in, and the rest behaves just like a constant for all you care.

If a wave function is non-separable (i.e. if there aren’t any two functions Ψ1 and Ψ2 for which Ψ(x1, x2) = Ψ1(x1) Ψ2(x2)), then the story is trickier. For all intents and purposes, it loses meaning to talk about one particle independent of the other.

This is hard to wrap our classical brains around. If the wave function cannot be separated, then the particles just don’t have positions, momentums, and so on that can be described independently.

Now, it turns out that most of matter is composed of non-separable wave functions. Pretty much any time you have particles interacting, their wave function cannot be written as a product of independent particles. A lot of the time, wave functions are very approximately separable (consider the wave function describing two electrons light years away). But in such cases, when we talk about the two electrons as two distinct entities, we’re really using an approximation that is not fundamentally correct.

Now, this all relates back to the fermion/boson distinction in the following way. Suppose we had a system that was described by a separable wave function.

Ψ(x1, x2) = Ψ1(x1) Ψ2(x2)

Now what happens when we apply the swap operator to Ψ?

S Ψ(x1, x2) = Ψ(x2, x1) = Ψ1(x2) Ψ2(x1)

As we’d expect, we get the particle described by x2 in the wave function initially occupied by the particle described by x1, and vice versa. But now, our system must obey one of two constraints:

Since SΨ = ±Ψ,

Ψ1(x2) Ψ2(x1) = Ψ1(x1) Ψ2(x2)
Ψ1(x2) Ψ2(x1) = – Ψ1(x1) Ψ2(x2)

Let’s take the second possibility for fermions first. It turns out that this constraint can never be satisfied. Why? Look:

Suppose f(x) g(y) = – f(y) g(x)
Then we have:  f(x)/g(x) = -f(y)/g(y)

Since the left is a pure function of x, and the right of y, this implies that they are both equal to a constant.
I.e. f(x)/g(x) = k, for some constant k.
But then f(y)/g(y) = k as well.

Thus k = -k,
which can only be true if k = 0.

This argument tells us that the only function that satisfies (1) Ψ describes a fermion and (2) Ψ is separable, is Ψ(x1, x2) = 0. Of course, this is not a wave function, since it is not normalizable. In other words, fermion wave functions are always non-separable!

What about boson wave functions? The equation

Ψ1(x2) Ψ2(x1) = Ψ1(x1) Ψ2(x2)

does have some solutions, but they are highly constrained. Essentially, for this to be true for all x1 and x2, Ψ1 and Ψ2 must be the same function.

In other words, bosons are separable only in the case that their independent wave functions are completely identical.

So. If fermions cannot be described by a separable wave function, how can we describe them? We can be clever and notice the following:

While Ψ(x1, x2) = Ψ1(x1) Ψ2(x2) does not obey the requirement that SΨ = -Ψ,
Ψ(x1, x2) = Ψ1(x1) Ψ2(x2) – Ψ2(x1) Ψ1(x2) does!

Let’s check and see that this is right:

S Ψ(x1, x2)
= S [Ψ1(x1) Ψ2(x2) – Ψ2(x1) Ψ1(x2)]
= Ψ1(x2) Ψ2(x1) – Ψ2(x2) Ψ1(x1)
= -Ψ(x1, x2).

Aha! It works.

Now we have a general description for fermion wave functions that does not violate the swap constraints. We can do something similar for bosons, giving us:

Ψ(x1, x2) = Ψ1(x1) Ψ2(x2) – Ψ2(x1) Ψ1(x2)

Ψ(x1, x2) = Ψ1(x1) Ψ2(x2) + Ψ2(x1) Ψ1(x2)

(Note: These are not all possible fermion/boson wave functions. They only capture one set of allowable wave functions.)

Now, let’s notice a peculiar feature of the fermion wave function. What happens if we ask for the probability amplitude of the two particles being at the same place at the same time? We find out by taking the fermion equation and setting x1 = x2 = x:

Ψ(x, x) = Ψ1(x) Ψ2(x) – Ψ2(x) Ψ1(x)
= 0

More generally, we can prove that any fermion wave function must have this same property.

SΨ(x1, x2) = -Ψ(x1, x2)
Ψ(x2, x1) = -Ψ(x1, x2)
Ψ(x, x) = -Ψ(x, x)
Ψ(x, x) = 0

Crazy! Apparently, two fermions cannot be at the same position. And since wave functions are smooth, fermions will generally have a small probability of being close together.

This is the case even if the fermions are interacting by a strong attractive force. It’s almost as if there is an intrinsic repulsive force keeping fermions away from each other. But this would be the wrong way to think about it. This feature of the natural world is not discovered by exploring different possible forces and potentials. Instead we got here by reasoning purely abstractly about the nature of swapping particles. We are a priori guaranteed this unusual feature: that if fermions exist, they must have zero probability of being located at the same point in space.

The name for this property is the Pauli exclusion principle. It’s the reason why electrons in atoms spread out in ever-larger orbitals around nuclei instead of all settling down to the nucleus. It is responsible for the entire structure of the periodic table, and without it we couldn’t have chemistry and biology.

(Quick side note: You might have thought of something that seems to throw a wrench in this – namely, that the ground state of atoms can hold two electrons. In general, electron orbitals in atoms are populated by more than one electron. This is possible because the wave function has extra degrees of freedom such as spin, the details of which determine how many electrons can fit in the same spatial orbital. I.e. two fermions can have the same spatial amplitude distribution, so far as they have some other distinct property.)

Mathematically, this arises because of interference effects. As the two fermions get closer and closer to each other, their wave functions interfere with one another more and more, turning their joint probability to zero. Fermions destructively interfere.

And bosons? They constructively interfere! At x1 = x2 = x, for bosons, we have:

Ψ(x, x) = Ψ1(x) Ψ2(x) + Ψ2(x) Ψ1(x)
= 2 Ψ1(x) Ψ2(x)

In other words, while fermions disperse, bosons cluster! Just like how fermions seemed to be pulled away from each other with an imaginary force, bosons seem to be pulled together! (Once again, this is only a poetic description, not to be taken literally. There is no extra force concentrating bosons, as can be evidenced by the straight trajectory of parallel beams of light).

I’m not aware of a name for this principle to complement the Pauli exclusion principle. But it explains phenomena like lasers, where enormous numbers of photons concentrate together to form a powerful beam of light. By contrast, an “electron laser” that concentrates enormous numbers of electrons into a single beam would be enormously difficult to create.

The intrinsic tendency for fermions to repel each other leads them to form complicated structures that spread out through space, and give them the qualitative material resistance to objects passing through them. You just can’t pack them arbitrarily close together – eventually you’ll run out of degrees of freedom and the particles will push each other away.

Bosons, on the other hand, are like ghosts – they can vanish into each others wave functions, congregate together in large numbers and behave as a single entity, and so on.

These differences between fermions and bosons pop straight out of their definitions, and lead us to the qualitative differences between matter and forces!

Deriving the Schrodinger equation

This video contains a really short and sweet derivation of the form of the Schrodinger equation from some fundamental principles. I want to present it here because I like it a lot.

I’m going to assume a lot of background knowledge of quantum mechanics for the purposes of this post, so as to keep it from getting too long. If you want to know more QM, I highly highly recommend Leonard Susskind’s online video lectures.

So! Very brief review of basic QM:

In quantum mechanics, the state of a system is described by a vector in a complex vector space. These vectors are all unit length, and encode all of the observable information about the system. The notation used for a state vector is |Ψ⟩, and is read as “the state vector psi”. By analogy with complex conjugation of numbers, you can also conjugate vectors. These conjugated vectors are written like ⟨φ|. Similarly, any operator A has a conjugate operator A*.

Inner products between vectors are expressed like ⟨φ|Ψ⟩, and represent the “closeness” between these states. If ⟨φ|Ψ⟩ = 0, then the states φ and Ψ are called orthogonal, and are as different as can be. (In particular, there is zero probability of either state being observed as the other.) And if ⟨φ|Ψ⟩ = 1, then the states are indistinguishable, and |Ψ⟩ =|φ⟩.

Now, we’re interested in the dynamics of quantum systems. How do they change in time?

Well, since we’re dealing with vectors, we can very generally suppose that there exists some operator that will take any state vector to the state vector that it evolves into after some amount of time t. Let’s just give this operator a name: U(t). We express the notion of U(t) as a time-evolution operator by writing

U(t)|Ψ(0)⟩ = |Ψ(t)⟩

In other words, take the state Ψ at time 0, apply the operator U(t) to it, and you get back the state Ψ at time t.

Now, what are some basic things we can say about the time-evolution operator?

First: if we evolve forwards in time by a length of time equal to zero, the state will not change. (This is basically definitional.)

I.e. U(0) = I (where I is the identity operator).

Second: Time evolution is always continuous, in that an evolution forwards by an arbitrarily small time period ε will change the state by an amount proportional to ε.

I.e. U(ε) = I + εG (where G is some other operator).

Third: Time evolution preserves orthogonality. If two states are ever orthogonal, then they are always orthogonal. (This is an assumption of conservation of information – the laws of physics don’t cause information to disappear or new information to pop up out of nowhere.)

I.e. ⟨φ(0)|Ψ(0)⟩ = 0  ⇒ ⟨φ(t)|Ψ(t)⟩ = 0

From this we can actually derive a stronger statement, which is that all inner products are conserved over time. (The intuition for this is that if all our orthogonal basis vectors stay orthogonal when we evolve forward in time, then time evolution is something like a rotation, and rotations preserve all inner products.)

I.e. ⟨φ(0)|Ψ(0)⟩ = ⟨φ(t)|Ψ(t)⟩

So our starting point is:

  1. U(t)|Ψ(0)⟩ = |Ψ(t)⟩
  2. U(0) = I
  3. U(ε) = I + εG
  4. ⟨φ(0)|Ψ(0)⟩ = ⟨φ(t)|Ψ(t)⟩

From (1) and (4), we get

U(t)|Ψ(0)⟩ = |Ψ(t)⟩
U(t)|φ(0)⟩ = |φ(t)⟩
⟨φ(t)|Ψ(t)⟩ = ⟨φ(0)|U*(t) U(t)|Ψ(0)⟩
= ⟨φ(0)|Ψ(0)⟩
and therefore…
U*U = I

Operators that satisfy the identity on the final line this are called unitary – they are analogous to complex numbers of unit length.

Let’s use this identity together with (3):

U*(ε) U(ε) = I
(I + εG)(I + εG*) = I
I + ε(G + G*) + ε² G*G = I
ε(G + G*) + ε² G*G = 0
G + G* ≈ 0

In the last line, I’ve used the assumption that ε is arbitrarily small, so that we can throw out factors of ε².

Now, what does this final line tell us? Well, it says that the operator G (which dictates the change in state over an infinitesimal time) is purely imaginary. By analogy, any purely imaginary number y = ix satisfies the identity:

y + y* =  ix + (ix)* = ix – ix = 0

So if G is purely imaginary, it is convenient to consider a new purely real operator H = iG. This operator is Hermitian by construction – it is equal to its complex conjugate. Substituting this operator into our infinitesimal time evolution equation, we get

U(ε) = I – iεH

Now, let’s consider the derivative of a quantum state.

d|Ψ⟩/dt = (|Ψ(t + ε)⟩ – |Ψ(t)⟩) / ε
= (U(ε) – I)|Ψ(t)⟩ / ε
= -iεH|Ψ(t)⟩ / ε
= -iH|Ψ(t)⟩

Thus we get…

d/dt|Ψ⟩ = -iH|Ψ⟩

This is the time-dependent Schrodinger equation, although we haven’t yet specified what this operator H is supposed to be. However, since we know H is Hermitian, we also know that H corresponds to some observable quantity.

It turns out that if we multiply this operator by Planck’s constant ħ, it becomes the Hamiltonian – the operator that corresponds to the observable energy. We’ll just change notation subtly by taking H to be the Hamiltonian – that is, what we would previously have called ħH. Then we get the more familiar form of the time-dependent Schrodinger equation:

iħ d/dt|Ψ⟩ = H|Ψ⟩


Wave function entropy

Entropy is a feature of probability distributions, and can be taken to be a quantification of uncertainty.

Standard quantum mechanics takes its fundamental object to be the wave function – an amplitude distribution. And from an amplitude distribution Ψ you can obtain a probability distribution Ψ*Ψ.

So it is very natural to think about the entropy of a given quantum state. For some reason, it looks like this concept of wave function entropy is not used much in physics. The quantum-mechanical version of entropy that is typically referred to is the Von-Neumann entropy, which involves uncertainty over which quantum state a system is in (rather than uncertainty intrinsic to a quantum state).

I’ve been looking into some of the implications of the concept of wave function entropy, and found a few interesting things.

Firstly, let’s just go over what precisely wave function entropy is.

Quantum mechanics is primarily concerned with calculating the wave function Ψ(x), which distributes complex amplitudes over phase space. The physical meaning of these amplitudes is interpreted by taking their absolute square Ψ*Ψ, which is a probability distribution.

Thus, the entropy of the wave function is given by:

S = – ∫ Ψ*Ψ ln(Ψ*Ψ) dx

As an example, I’ll write out some of the wave functions for the basic hydrogen atom:

*Ψ)1s = e-2r / π
*Ψ)2s = (2 – r)2 e-r / 32π

*Ψ)2p = r2 e-r cos(θ) / 32π
*Ψ)3s = (2r2 – 18r + 27)2 e-⅔r / 19683π

With these wave functions in hand, we can go ahead and calculate the entropies! Some of the integrals are intractable, so using numerical integration, we get:

S1s ≈ 70
S2s ≈ 470
S2p ≈ 326
S3s ≈ 1320

The increasing values for (1s, 2s, 3s) make sense – higher energy wave functions are more dispersed, meaning that there is greater uncertainty in the electron’s spatial distribution.

Let’s go into something a bit more theoretically interesting.

We’ll be interested in a generalization of entropy – relative entropy. This will quantify, rather than pure uncertainty, changes in uncertainty from a prior probability distribution ρ to our new distribution Ψ*Ψ. This will be the quantity we’ll denote S from now on.

S = – ∫ Ψ*Ψ ln(Ψ*Ψ/ρ) dx

Now, suppose we’re interested in calculating the wave functions Ψ that are local maxima of entropy. This means we want to find the Ψ for which δS = 0. Of course, we also want to ensure that a few basic constraints are satisfied. Namely,

∫ Ψ*Ψ dx = 1
∫ Ψ*HΨ = E

These constraints are chosen by analogy with the constraints in ordinary statistical mechanics – normalization and average energy. H is the Hamiltonian operator, which corresponds to the energy observable.

We can find the critical points of entropy that satisfy the constraint by using the method of Lagrange multipliers. Our two Lagrange multipliers will be α (for normalization) and β (for energy). This gives us the following equation for Ψ:

Ψ ln(Ψ*Ψ/ρ) + (α + 1)Ψ + βHΨ = 0

We can rewrite this as an operator equation, which gives us

ln(Ψ*Ψ/ρ) + (α + 1) + βH = 0
Ψ*Ψ = ρ/Z e-βH

Here we’ve renamed our constants so that Z =  eα+1 is a normalization constant.

So we’ve solved the wave function equation… but what does this tell us? If you’re familiar with some basic quantum mechanics, our expression should look somewhat familiar to you. Let’s backtrack a few steps to see where this familiarity leads us.

Ψ ln(Ψ*Ψ/ρ) + (α + 1)Ψ + βHΨ = 0
HΨ + 1/β ln(Ψ*Ψ/ρ) Ψ = – (α + 1)/β Ψ

Let’s rename – (α + 1)/β to a new constant λ. And we’ll take a hint from statistical mechanics and call 1/β the temperature T of the state. Now our equation looks like

HΨ + T ln(Ψ*Ψ/ρ) Ψ = λΨ

This equation is almost the Schrodinger equation. In particular, the Schrodinger equation pops out as the zero-temperature limit of this equation:

As T → 0,
our equation becomes…
HΨ = λΨ

The obvious interpretation of the constant λ in the zero temperature limit is E, the energy of the state. 

What about in the infinite-temperature limit?

As T → ∞,
our equation becomes…
Ψ*Ψ = ρ

Why is this? Because the only solution to the equation in this limit is for ln(Ψ*Ψ/ρ) → 0, or in other words Ψ*Ψ/ρ → 1

And what this means is that in the infinite temperature limit, the critical entropy wave function is just that which gives the prior distribution.

We can interpret this result as a generalization of the Schrodinger equation. Rather than a linear equation, we now have an additional logarithmic nonlinearity. I’d be interested to see how the general solutions to this equation differ from the standard equations, but that’s for another post.

HΨ + T ln(Ψ*Ψ/ρ) Ψ = λΨ

Solution: How change arises in QM

Previously I pointed out that if you drew out the wave function of the entire universe by separating out its different energy components and shading each according to its amplitude, you would find that the universe appears completely static.

Energy superposition

This is correct according to standard quantum mechanics. If you looked at how much amplitude the universe had in any particular energy level, you would find that this amplitude was not changing in size.

The only change you would observe would be in the direction, or phase, of the amplitude in the complex plane. And directions of amplitudes in the complex plane are unphysical. Right?

No! While there is an important sense in which the direction of an amplitude is unphysical (the universe ultimately only computes magnitudes of amplitudes), there is a much much more important sense in which the direction of an amplitude contains loads of physical information.

This is because when the universe is in a superposition of different energy states, the amplitudes of these states can interfere.

It is here that we can find the answer to the question I posed in the previous post. Physical changes come from interference between the amplitudes of all the energy states that the universe is in superposition over.

One consequence of all of this is that if the universe did happen to be in a pure energy state, and not in a superposition of multiple energy levels, then change would be impossible.

From which we can conclude: The universe is in a superposition of energy levels, not in any clearly defined single energy level! (Proof: Look around and notice that stuff is happening)

This doesn’t mean, by the way, that the universe is actually in one of the energy levels and we just don’t know which. It also doesn’t mean that the universe is in some other distinct state found by averaging over all of the different energy states. “Superposition” is one of these funny words in quantum mechanics that doesn’t have an analogue in natural language. The best we can say is that the universe really truly is in all of the states in the superposition at once, and the degree to which it is in any particular state is the amplitude of that state.


Let’s imagine a simple toy universe with one dimension of space and one of time.

This universe is initially in an equal superposition of two pure energy states Φ0(x) and Φ1(x), each of which is a real function (no imaginary components). The first has zero energy, and we choose our units so that the second has an energy level equal to exactly 1.

So the wave function of our universe at time zero can be written Ψ = Φ0 + Φ1. (I’m ignoring normalization factors because they aren’t really crucial to the point here)

And from this we can conclude that our probability density is:

P(x) = Ψ*·Ψ = Φ02 + Φ12 + 2·Φ0·Φ1

Now we advance forward in time. Applying the Schrodinger equation, we find:

Φ0(x, t) = Φ0(x)
Φ1(x, t) = Φ1(x) · e-it

Notice that both of these energy states have a time-independent magnitude. The first one is obvious – it’s just completely static. The second one you can visualize as a function spinning in the complex plane, going from purely real and positive to purely imaginary to purely real and negative, et cetera. The magnitude of the function is just what you’d get by spinning it back to its positive real value.

From our two energy functions, we can find the total wave function of the universe:

Ψ(x, t) = Φ0(x) + Φ1(x) · e-it

Already we can see that our time-dependent wave function is not a simple product of our time-independent wave function and a phase.

We can see the consequences of this by calculating the time-dependent probability density:

P(x, t) = Φ0(x)2 + Φ1(x)2 + Φ0(x) · Φ1(x) · (e-it + eit)


P(x, t) = |Φ0|2 + |Φ1|2 + 2 · Φ0(x) · Φ1(x) · cos(t)

And in our final result, we can see a clear time dependence of the spatial probability distribution over the universe. The last term will grow and shrink, oscillating over time and giving rise to dynamics.


We can visualize what’s going on here by looking at the time evolution of each pure energy state as if it’s spinning in the complex plane. For instance, if the universe was in a superposition of the lowest four energy levels we would see something like:


The length of the arrow represents the amplitude of that energy level – “how much” the universe is in that energy state. The arrows are spinning in the complex plane with a speed proportional to the energy level they represent.

The wave function of the universe is represented by the sum of all of these arrows, as if you stacked each on the head of the previous. And this sum is changing!

For instance, in the universe’s first moment, the superposition looks like this:

4-Rotating T=0

And later the universe looks like this:

4-Rotating T=1

If we plotted out the first two energy states scaled by their amplitudes, we might see the following spatial distributions, initially and finally:

Even though there have been no changes in the magnitudes of the arrows (the degree to which the universe exists in each energy level) we get a very different looking universe.

This is the basic idea that explains all change in the universe, from the rising and falling of civilizations to the births and deaths of black holes: they are results of the complex patterns of interference produced by spinning amplitudes.