Deriving the Schrodinger equation

This video contains a really short and sweet derivation of the form of the Schrodinger equation from some fundamental principles. I want to present it here because I like it a lot.

I’m going to assume a lot of background knowledge of quantum mechanics for the purposes of this post, so as to keep it from getting too long. If you want to know more QM, I highly highly recommend Leonard Susskind’s online video lectures.

So! Very brief review of basic QM:

In quantum mechanics, the state of a system is described by a vector in a complex vector space. These vectors are all unit length, and encode all of the observable information about the system. The notation used for a state vector is |Ψ⟩, and is read as “the state vector psi”. By analogy with complex conjugation of numbers, you can also conjugate vectors. These conjugated vectors are written like ⟨φ|. Similarly, any operator A has a conjugate operator A*.

Inner products between vectors are expressed like ⟨φ|Ψ⟩, and represent the “closeness” between these states. If ⟨φ|Ψ⟩ = 0, then the states φ and Ψ are called orthogonal, and are as different as can be. (In particular, there is zero probability of either state being observed as the other.) And if ⟨φ|Ψ⟩ = 1, then the states are indistinguishable, and |Ψ⟩ =|φ⟩.

Now, we’re interested in the dynamics of quantum systems. How do they change in time?

Well, since we’re dealing with vectors, we can very generally suppose that there exists some operator that will take any state vector to the state vector that it evolves into after some amount of time t. Let’s just give this operator a name: U(t). We express the notion of U(t) as a time-evolution operator by writing

U(t)|Ψ(0)⟩ = |Ψ(t)⟩

In other words, take the state Ψ at time 0, apply the operator U(t) to it, and you get back the state Ψ at time t.

Now, what are some basic things we can say about the time-evolution operator?

First: if we evolve forwards in time by a length of time equal to zero, the state will not change. (This is basically definitional.)

I.e. U(0) = I (where I is the identity operator).

Second: Time evolution is always continuous, in that an evolution forwards by an arbitrarily small time period ε will change the state by an amount proportional to ε.

I.e. U(ε) = I + εG (where G is some other operator).

Third: Time evolution preserves orthogonality. If two states are ever orthogonal, then they are always orthogonal. (This is an assumption of conservation of information – the laws of physics don’t cause information to disappear or new information to pop up out of nowhere.)

I.e. ⟨φ(0)|Ψ(0)⟩ = 0  ⇒ ⟨φ(t)|Ψ(t)⟩ = 0

From this we can actually derive a stronger statement, which is that all inner products are conserved over time. (The intuition for this is that if all our orthogonal basis vectors stay orthogonal when we evolve forward in time, then time evolution is something like a rotation, and rotations preserve all inner products.)

I.e. ⟨φ(0)|Ψ(0)⟩ = ⟨φ(t)|Ψ(t)⟩

So our starting point is:

  1. U(t)|Ψ(0)⟩ = |Ψ(t)⟩
  2. U(0) = I
  3. U(ε) = I + εG
  4. ⟨φ(0)|Ψ(0)⟩ = ⟨φ(t)|Ψ(t)⟩

From (1) and (4), we get

U(t)|Ψ(0)⟩ = |Ψ(t)⟩
U(t)|φ(0)⟩ = |φ(t)⟩
so…
⟨φ(t)|Ψ(t)⟩ = ⟨φ(0)|U*(t) U(t)|Ψ(0)⟩
= ⟨φ(0)|Ψ(0)⟩
and therefore…
U*U = I

Operators that satisfy the identity on the final line this are called unitary – they are analogous to complex numbers of unit length.

Let’s use this identity together with (3):

U*(ε) U(ε) = I
(I + εG)(I + εG*) = I
I + ε(G + G*) + ε² G*G = I
ε(G + G*) + ε² G*G = 0
G + G* ≈ 0

In the last line, I’ve used the assumption that ε is arbitrarily small, so that we can throw out factors of ε².

Now, what does this final line tell us? Well, it says that the operator G (which dictates the change in state over an infinitesimal time) is purely imaginary. By analogy, any purely imaginary number y = ix satisfies the identity:

y + y* =  ix + (ix)* = ix – ix = 0

So if G is purely imaginary, it is convenient to consider a new purely real operator H = iG. This operator is Hermitian by construction – it is equal to its complex conjugate. Substituting this operator into our infinitesimal time evolution equation, we get

U(ε) = I – iεH

Now, let’s consider the derivative of a quantum state.

d|Ψ⟩/dt = (|Ψ(t + ε)⟩ – |Ψ(t)⟩) / ε
= (U(ε) – I)|Ψ(t)⟩ / ε
= -iεH|Ψ(t)⟩ / ε
= -iH|Ψ(t)⟩

Thus we get…

d/dt|Ψ⟩ = -iH|Ψ⟩

This is the time-dependent Schrodinger equation, although we haven’t yet specified what this operator H is supposed to be. However, since we know H is Hermitian, we also know that H corresponds to some observable quantity.

It turns out that if we multiply this operator by Planck’s constant ħ, it becomes the Hamiltonian – the operator that corresponds to the observable energy. We’ll just change notation subtly by taking H to be the Hamiltonian – that is, what we would previously have called ħH. Then we get the more familiar form of the time-dependent Schrodinger equation:

iħ d/dt|Ψ⟩ = H|Ψ⟩

 

Wave function entropy

Entropy is a feature of probability distributions, and can be taken to be a quantification of uncertainty.

Standard quantum mechanics takes its fundamental object to be the wave function – an amplitude distribution. And from an amplitude distribution Ψ you can obtain a probability distribution Ψ*Ψ.

So it is very natural to think about the entropy of a given quantum state. For some reason, it looks like this concept of wave function entropy is not used much in physics. The quantum-mechanical version of entropy that is typically referred to is the Von-Neumann entropy, which involves uncertainty over which quantum state a system is in (rather than uncertainty intrinsic to a quantum state).

I’ve been looking into some of the implications of the concept of wave function entropy, and found a few interesting things.

Firstly, let’s just go over what precisely wave function entropy is.

Quantum mechanics is primarily concerned with calculating the wave function Ψ(x), which distributes complex amplitudes over phase space. The physical meaning of these amplitudes is interpreted by taking their absolute square Ψ*Ψ, which is a probability distribution.

Thus, the entropy of the wave function is given by:

S = – ∫ Ψ*Ψ ln(Ψ*Ψ) dx

As an example, I’ll write out some of the wave functions for the basic hydrogen atom:

*Ψ)1s = e-2r / π
*Ψ)2s = (2 – r)2 e-r / 32π

*Ψ)2p = r2 e-r cos(θ) / 32π
*Ψ)3s = (2r2 – 18r + 27)2 e-⅔r / 19683π

With these wave functions in hand, we can go ahead and calculate the entropies! Some of the integrals are intractable, so using numerical integration, we get:

S1s ≈ 70
S2s ≈ 470
S2p ≈ 326
S3s ≈ 1320

The increasing values for (1s, 2s, 3s) make sense – higher energy wave functions are more dispersed, meaning that there is greater uncertainty in the electron’s spatial distribution.

Let’s go into something a bit more theoretically interesting.

We’ll be interested in a generalization of entropy – relative entropy. This will quantify, rather than pure uncertainty, changes in uncertainty from a prior probability distribution ρ to our new distribution Ψ*Ψ. This will be the quantity we’ll denote S from now on.

S = – ∫ Ψ*Ψ ln(Ψ*Ψ/ρ) dx

Now, suppose we’re interested in calculating the wave functions Ψ that are local maxima of entropy. This means we want to find the Ψ for which δS = 0. Of course, we also want to ensure that a few basic constraints are satisfied. Namely,

∫ Ψ*Ψ dx = 1
∫ Ψ*HΨ = E

These constraints are chosen by analogy with the constraints in ordinary statistical mechanics – normalization and average energy. H is the Hamiltonian operator, which corresponds to the energy observable.

We can find the critical points of entropy that satisfy the constraint by using the method of Lagrange multipliers. Our two Lagrange multipliers will be α (for normalization) and β (for energy). This gives us the following equation for Ψ:

Ψ ln(Ψ*Ψ/ρ) + (α + 1)Ψ + βHΨ = 0

We can rewrite this as an operator equation, which gives us

ln(Ψ*Ψ/ρ) + (α + 1) + βH = 0
Ψ*Ψ = ρ/Z e-βH

Here we’ve renamed our constants so that Z =  eα+1 is a normalization constant.

So we’ve solved the wave function equation… but what does this tell us? If you’re familiar with some basic quantum mechanics, our expression should look somewhat familiar to you. Let’s backtrack a few steps to see where this familiarity leads us.

Ψ ln(Ψ*Ψ/ρ) + (α + 1)Ψ + βHΨ = 0
HΨ + 1/β ln(Ψ*Ψ/ρ) Ψ = – (α + 1)/β Ψ

Let’s rename – (α + 1)/β to a new constant λ. And we’ll take a hint from statistical mechanics and call 1/β the temperature T of the state. Now our equation looks like

HΨ + T ln(Ψ*Ψ/ρ) Ψ = λΨ

This equation is almost the Schrodinger equation. In particular, the Schrodinger equation pops out as the zero-temperature limit of this equation:

As T → 0,
our equation becomes…
HΨ = λΨ

The obvious interpretation of the constant λ in the zero temperature limit is E, the energy of the state. 

What about in the infinite-temperature limit?

As T → ∞,
our equation becomes…
Ψ*Ψ = ρ

Why is this? Because the only solution to the equation in this limit is for ln(Ψ*Ψ/ρ) → 0, or in other words Ψ*Ψ/ρ → 1

And what this means is that in the infinite temperature limit, the critical entropy wave function is just that which gives the prior distribution.

We can interpret this result as a generalization of the Schrodinger equation. Rather than a linear equation, we now have an additional logarithmic nonlinearity. I’d be interested to see how the general solutions to this equation differ from the standard equations, but that’s for another post.

HΨ + T ln(Ψ*Ψ/ρ) Ψ = λΨ

Solution: How change arises in QM

Previously I pointed out that if you drew out the wave function of the entire universe by separating out its different energy components and shading each according to its amplitude, you would find that the universe appears completely static.

Energy superposition

This is correct according to standard quantum mechanics. If you looked at how much amplitude the universe had in any particular energy level, you would find that this amplitude was not changing in size.

The only change you would observe would be in the direction, or phase, of the amplitude in the complex plane. And directions of amplitudes in the complex plane are unphysical. Right?

No! While there is an important sense in which the direction of an amplitude is unphysical (the universe ultimately only computes magnitudes of amplitudes), there is a much much more important sense in which the direction of an amplitude contains loads of physical information.

This is because when the universe is in a superposition of different energy states, the amplitudes of these states can interfere.

It is here that we can find the answer to the question I posed in the previous post. Physical changes come from interference between the amplitudes of all the energy states that the universe is in superposition over.

One consequence of all of this is that if the universe did happen to be in a pure energy state, and not in a superposition of multiple energy levels, then change would be impossible.

From which we can conclude: The universe is in a superposition of energy levels, not in any clearly defined single energy level! (Proof: Look around and notice that stuff is happening)

This doesn’t mean, by the way, that the universe is actually in one of the energy levels and we just don’t know which. It also doesn’t mean that the universe is in some other distinct state found by averaging over all of the different energy states. “Superposition” is one of these funny words in quantum mechanics that doesn’t have an analogue in natural language. The best we can say is that the universe really truly is in all of the states in the superposition at once, and the degree to which it is in any particular state is the amplitude of that state.

***

Let’s imagine a simple toy universe with one dimension of space and one of time.

This universe is initially in an equal superposition of two pure energy states Φ0(x) and Φ1(x), each of which is a real function (no imaginary components). The first has zero energy, and we choose our units so that the second has an energy level equal to exactly 1.

So the wave function of our universe at time zero can be written Ψ = Φ0 + Φ1. (I’m ignoring normalization factors because they aren’t really crucial to the point here)

And from this we can conclude that our probability density is:

P(x) = Ψ*·Ψ = Φ02 + Φ12 + 2·Φ0·Φ1

Now we advance forward in time. Applying the Schrodinger equation, we find:

Φ0(x, t) = Φ0(x)
Φ1(x, t) = Φ1(x) · e-it

Notice that both of these energy states have a time-independent magnitude. The first one is obvious – it’s just completely static. The second one you can visualize as a function spinning in the complex plane, going from purely real and positive to purely imaginary to purely real and negative, et cetera. The magnitude of the function is just what you’d get by spinning it back to its positive real value.

From our two energy functions, we can find the total wave function of the universe:

Ψ(x, t) = Φ0(x) + Φ1(x) · e-it

Already we can see that our time-dependent wave function is not a simple product of our time-independent wave function and a phase.

We can see the consequences of this by calculating the time-dependent probability density:

P(x, t) = Φ0(x)2 + Φ1(x)2 + Φ0(x) · Φ1(x) · (e-it + eit)

Or…

P(x, t) = |Φ0|2 + |Φ1|2 + 2 · Φ0(x) · Φ1(x) · cos(t)

And in our final result, we can see a clear time dependence of the spatial probability distribution over the universe. The last term will grow and shrink, oscillating over time and giving rise to dynamics.

***

We can visualize what’s going on here by looking at the time evolution of each pure energy state as if it’s spinning in the complex plane. For instance, if the universe was in a superposition of the lowest four energy levels we would see something like:

4-Rotating.gif

The length of the arrow represents the amplitude of that energy level – “how much” the universe is in that energy state. The arrows are spinning in the complex plane with a speed proportional to the energy level they represent.

The wave function of the universe is represented by the sum of all of these arrows, as if you stacked each on the head of the previous. And this sum is changing!

For instance, in the universe’s first moment, the superposition looks like this:

4-Rotating T=0

And later the universe looks like this:

4-Rotating T=1

If we plotted out the first two energy states scaled by their amplitudes, we might see the following spatial distributions, initially and finally:

Even though there have been no changes in the magnitudes of the arrows (the degree to which the universe exists in each energy level) we get a very different looking universe.

This is the basic idea that explains all change in the universe, from the rising and falling of civilizations to the births and deaths of black holes: they are results of the complex patterns of interference produced by spinning amplitudes.

Is quantum mechanics simpler than classical physics?

I want to make a few very fundamental comparisons between classical and quantum mechanics. I’ll be assuming a lot of background in this particular post to prevent it from getting uncontrollably long, but am planning on writing a series on quantum mechanics at some point.

***

Let’s assume that the universe consists of N simple point particles (where N is an ungodly large number), each interacting with each other in complicated ways according to their relative positions. These positions are written as x1, x2, …, xN.

The classical description for this simple universe makes each position a function of time, and gives the following set of N equations of motion, one for each particle:

Fk(x1, x2, …, xN) = mk · ∂t2xk

Each force function Fk will be a horribly messy nonlinear function of the positions of all the particles in the universe. These functions encode the details of all of the interactions taking place between the particles.

Analytically solving this equation is completely hopeless – It’s a set of N separate equations, each one a highly nonlinear second order differential equation. You couldn’t solve any of them on their own, and on top of that, they are tightly entangled together, making it impossible to solve any one without also solving all the others.

So if you thought that Newton’s equation F = ma was simple, think again!

Compare this to how quantum mechanics describes our universe. The state of the universe is described by a function Ψ(x1, x2, …, xN, t). This function changes over time according to the Schrödinger equation:

tΨ = -i·H[Ψ]

H is a differential operator that is a complicated function of all of the positions of all the particles in the universe. It encodes the information about particle interactions in the same way that the force functions did in classical mechanics.

I claim that Schrodinger’s equation is infinitely easier to solve than Newton’s equation. In fact, I will by the end of this post write out the exact solution to the wave function of the entire universe.

At first glance, you can notice a few features of the equation that make it look potentially simpler than the classical equation. For one, there’s only one single equation, instead of N entangled equations.

Also, the equation is only first order in time derivatives, while Newton’s equation is second order in time derivatives. This is extremely important. The move from a first order differential equation to a second order differential equation is a huge deal. For one thing, there’s a simple general solution to all first order linear differential equations, and nothing close for second order linear differential equations.

Unfortunately… Schrodinger’s equation, just like Newton’s, is highly highly nonlinear, because of the presence of H. If we can’t find a way to simplify this immensely complex operator, then we’re probably stuck.

But quantum mechanics hands us exactly what we need: two magical facts about the universe that allow us to turn Schrodinger’s equation into a linear first-order differential equation.

First: It guarantees us that there exist a set of functions φE(x1, x2, …, xN) such that:

HE] = E · φE

E is an ordinary real number, and its physical meaning is the energy of the entire universe. The set of values of E is the set of allowed energies for the universe. And the functions φE(x1, x2, …, xN) are the wave functions that correspond to each allowed energy.

Second: it tells us that no matter what complicated state our universe is in, we can express it as a weighted sum over these functions:

Ψ = ∑ a· φE

With these two facts, we’re basically omniscient.

Since Ψ is a sum of all the different functions φE, if we want to know how Ψ changes with time, we can just see how each φE changes with time.

How does each φE change with time? We just use the Schrodinger equation:

tφE = -i · HE]
= -iE · φE

And we end up with a first order linear differential equation. We can write down the solution right away:

φE(x1, x2, …, xN, t) = φE(x1, x2, …, xN) · e-iEt

And just like that, we can write down the wave function of the entire universe:

Ψ(x1, x2, …, xN, t) = ∑ a· φE(x1, x2, …, xN, t)
= ∑ a· φE(x1, x2, …, xN) · e-iEt

Hand me the initial conditions of the universe, and I can hand you back its exact and complete future according to quantum mechanics.

***

Okay, I cheated a little bit. You might have guessed that writing out the exact wave function of the entire universe is not actually doable in a short blog post. The problem can’t be that simple.

But at the same time, everything I said above is actually true, and the final equation I presented really is the correct wave function of the universe. So if the problem must be more complex, where is the complexity hidden away?

The answer is that the complexity is hidden away in the first “magical fact” about allowed energy states.

HE] = E · φE

This equation is a highly non-linear and in general second-order differential equation. If we actually wanted to expand out Ψ in terms of the different functions φE, we’d have to solve this equation.

So there is no free lunch here. But what’s interesting is where the complexity moves when switching from classical mechanics to quantum mechanics.

In classical mechanics, virtually zero effort goes into formalizing the space of states, or talking about what configurations of the universe are allowable. All of the hardness of the problem of solving the laws of physics is packed into the dynamics. That is, it is easy to specify an initial condition of the universe. But describing how that initial condition evolves forward in time is virtually impossible.

By contrast, in quantum mechanics, solving the equation of motion is trivially easy. And all of the complexity has moved to defining the system. If somebody hands you the allowed energy levels and energy functions of the universe at a given moment of time, you can solve the future of the rest of the universe immediately. But actually finding the allowed energy levels and corresponding wave functions is virtually impossible.

***

Let’s get to the strangest (and my favorite) part of this.

If quantum mechanics is an accurate description of the world, then the following must be true:

Ψ(x1, x2, …, xN, 0) = ∑ a· φE(x1, x2, …, xN)
implies
Ψ(x1, x2, …, xN, t) = ∑ a· φE(x1, x2, …, xN) · e-iEt

This equation has two especially interesting features. First, each term in the sum can be broken down separately into a function of position and a function of time.

And second, the temporal component of each term is an imaginary exponential – a phase factor e-iEt.

Let me take a second to explain the significance of this.

In quantum mechanics, physical quantities are invariably found by taking the absolute square of complex quantities. This is why you can have a complex wave function and an equation of motion with an i in it, and still end up with a universe quite free of imaginary numbers.

But when you take the absolute square of e-iEt, you end up with e-iEt · eiEt = 1. What’s important here is that the time dependence seems to fall away.

A way to see this is to notice that y = e-ix, when graphed, looks like a point on a unit circle in the complex plane.

Phase

So e-iEt, when graphed, is just a point repeatedly spinning around the unit circle. The larger E is, the faster it spins.
2-Interference

Taking the absolute square of a complex number is the same as finding its distance from the origin on the complex plane. And since e-iEt always stays on the unit circle, its absolute square is always 1.

So what this all means is that quantum mechanics tells us that there’s a sense in which our universe is remarkably static. The universe starts off as a superposition of a bunch of possible energy states, each with a particular weight. And it ends up as a sum over the same energy states, with weights of the exact same magnitude, just pointing different directions in the complex plane.

Imagine drawing the universe by drawing out all possible energy states in boxes, and shading these boxes according to how much amplitude is distributed in them. Now we advance time forward by one millisecond. What happens?

Absolutely nothing, according to quantum mechanics. The distribution of shading across the boxes stays the exact same, because the phase factor multiplication does not change the magnitude of the amplitude in each box.

Given this, we are faced with a bizarre question: if quantum mechanics tells us that the universe is static in this particular way, then why do we see so much change and motion and excitement all around us?

I’ll stop here for you to puzzle over, but I’ve posted an answer here.