What do I find conceptually puzzling?

There are lots of things that I don’t know, like, say, what the birth rate in Sweden is or what the effect of poverty on IQ is. There are also lots of things that I find really confusing and hard to understand, like quantum field theory and monetary policy. There’s also a special category of things that I find conceptually puzzling. These things aren’t difficult to grasp because the facts about them are difficult to understand or require learning complicated jargon. Instead, they’re difficult to grasp because I suspect that I’m confused about the concepts in use.

This is a much deeper level of confusion. It can’t be adjudicated by just reading lots of facts about the subject matter. It requires philosophical reflection on the nature of these concepts, which can sometimes leave me totally confused about everything and grasping for the solid ground of mere factual ignorance.

As such, it feels like a big deal when something I’ve been conceptually puzzled about becomes clear. I want to compile a list for future reference of things that I’m currently conceptually puzzled about and things that I’ve become un-puzzled about. (This is not a complete list, but I believe it touches on the major themes.)

Things I’m conceptually puzzled about

What is the relationship between consciousness and physics?

I’ve written about this here.

Essentially, at this point every available viewpoint on consciousness seems wrong to me.

Eliminativism amounts to a denial of pretty much the only thing that we can be sure can’t be denied – that we are having conscious experiences. Physicalism entails the claim that facts about conscious experience can be derived from laws of physics, which is wrong as a matter of logic.

Dualism entails that the laws of physics by themselves cannot account for the behavior of the matter in our brains, which is wrong. And epiphenomenalism entails that our beliefs about our own conscious experience are almost certainly wrong, and are no better representations of our actual conscious experiences than random chance.

How do we make sense of decision theory if we deny libertarian free will?

Written about this here and here.

Decision theory is ultimately about finding the decision D that maximizes expected utility EU(D). But to do this calculation, we have to decide what the set of possible decisions we are searching is.

EU confusion

Make this set too large, and you end up getting fantastical and impossible results (like that the optimal decision is to snap your fingers and make the world into a utopia). Make it too small, and you end up getting underwhelming results (in the extreme case, you just get that the optimal decision is to do exactly what you are going to do, since this is the only thing you can do in a strictly deterministic world).

We want to find a nice middle ground between these two – a boundary where we can say “inside here the things that are actually possible for us to do, and outside are those that are not.” But any principled distinction between what’s in the set and what’s not must be based on some conception of some actions being “truly possible” to us, and others being truly impossible. I don’t know how to make this distinction in the absence of a robust conception of libertarian free will.

Are there objectively right choices of priors?

I’ve written about this here.

If you say no, then there are no objectively right answers to questions like “What should I believe given the evidence I have?” And if you say yes, then you have to deal with thought experiments like the cube problem, where any choice of priors looks arbitrary and unjustifiable.

(If you are going to be handed a cube, and all you know is that it has a volume less than 1 cm3, then setting maximum entropy priors over volumes gives different answers than setting maximum entropy priors over side areas or side lengths. This means that what qualifies as “maximally uncertain” depends on whether we frame our reasoning in terms of side length, areas, or cube volume. Other approaches besides MaxEnt have similar problems of concept dependence.)

How should we deal with infinities in decision theory?

I wrote about this here, here, here, and here.

The basic problem is that expected utility theory does great at delivering reasonable answers when the rewards are finite, but becomes wacky when the rewards become infinite. There are a huge amount of examples of this. For instance, in the St. Petersburg paradox, you are given the option to play a game with an infinite expected payout, suggesting that you should buy in to the game no matter how high the cost. You end up making obviously irrational choices, such as spending $1,000,000 on the hope that a fair coin will land heads 20 times in a row. Variants of this involve the inability of EU theory to distinguish between obviously better and worse bets that have infinite expected value.

And Pascal’s mugging is an even worse case. Roughly speaking, a person comes up to you and threatens you with infinite torture if you don’t submit to them and give them 20 dollars. Now, the probability that this threat is credible is surely tiny. But it is non-zero! (as long as you don’t think it is literally logically impossible for this threat to come true)

An infinite penalty times a finite probability is still an infinite expected penalty. So we stand to gain an infinite expected utility by just handing over the 20 dollars. This seems ridiculous, but I don’t know any reasonable formalization of decision theory that allows me to refute it.

Is causality fundamental?

Causality has been nicely formalized by Pearl’s probabilistic graphical models. This is a simple extension of probability theory, out of which naturally falls causality and counterfactuals.

One can use this framework to represent the states of fundamental particles and how they change over time and interact with one another. What I’m confused about is that in some ways of looking at it, the causal relations appear to be useful but un-fundamental constructs for the sake of easing calculations. In other ways of looking at it, causal relations are necessarily built into the structure of the world, and we can go out and empirically discover them. I don’t know which is right. (Sorry for the vagueness in this one – it’s confusing enough to me that I have trouble even precisely phrasing the dilemma).

How should we deal with the apparent dependence of inductive reasoning upon our choices of concepts?

I’ve written about this here. Beyond just the problem of concept-dependence in our choices of priors, there’s also the problem presented by the grue/bleen thought experiment.

This thought experiment proposes two new concepts: grue (= the set of things that are either green before 2100 or blue after 2100) and bleen (the inverse of grue). It then shows that if we reasoned in terms of grue and bleen, standard induction would have us concluding that all emeralds will suddenly turn blue after 2100. (We repeatedly observed them being grue before 2100, so we should conclude that they will be grue after 2100.)

In other words, choose the wrong concepts and induction breaks down. This is really disturbing – choices of concepts should be merely pragmatic matters! They shouldn’t function as fatal epistemic handicaps. And given that they appear to, we need to develop some criterion we can use to determine what concepts are good and what concepts are bad.

The trouble with this is that the only proposals I’ve seen for such a criterion reference the idea of concepts that “carve reality at its joints”; in other words, the world is composed of green and blue things, not grue and bleen things, so we should use the former rather than the latter. But this relies on the outcome of our inductive process to draw conclusions about the starting step on which this outcome depends!

I don’t know how to cash out “good choices of concepts” without ultimately reasoning circularly. I also don’t even know how to make sense of the idea of concepts being better or worse for more than merely pragmatic reasons.

How should we reason about self defeating beliefs?

The classic self-defeating belief is “This statement is a lie.” If you believe it, then you are compelled to disbelieve it, eliminating the need to believe it in the first place. Broadly speaking, self-defeating beliefs are those that undermine the justifications for belief in them.

Here’s an example that might actually apply in the real world: Black holes glow. The process of emission is known as Hawking radiation. In principle, any configuration of particles with a mass less than the black hole can be emitted from it. Larger configurations are less likely to be emitted, but even configurations such as a human brain have a non-zero probability of being emitted. Henceforth, we will call such configurations black hole brains.

Now, imagine discovering some cosmological evidence that the era in which life can naturally arise on planets circling stars is finite, and that after this era there will be an infinite stretch of time during which all that exists are black holes and their radiation. In such a universe, the expected number of black hole brains produced is infinite (a tiny finite probability multiplied by an infinite stretch of time), while the expected number of “ordinary” brains produced is finite (assuming a finite spatial extent as well).

What this means is that discovering this cosmological evidence should give you an extremely strong boost in credence that you are a black hole brain. (Simply because most brains in your exact situation are black hole brains.) But most black hole brains have completely unreliable beliefs about their environment! They are produced by a stochastic process which cares nothing for producing brains with reliable beliefs. So if you believe that you are a black hole brain, then you should suddenly doubt all of your experiences and beliefs. In particular, you have no reason to think that the cosmological evidence you received was veridical at all!

I don’t know how to deal with this. It seems perfectly possible to find evidence for a scenario that suggests that we are black hole brains (I’d say that we have already found such evidence, multiple times). But then it seems we have no way to rationally respond to this evidence! In fact, if we do a naive application of Bayes’ theorem here, we find that the probability of receiving any evidence in support of black hole brains to be 0!

So we have a few options. First, we could rule out any possible skeptical scenarios like black hole brains, as well as anything that could provide any amount of evidence for them (no matter how tiny). Or we could accept the possibility of such scenarios but face paralysis upon actually encountering evidence for them! Both of these seem clearly wrong, but I don’t know what else to do.

How should we reason about our own existence and indexical statements in general?

This is called anthropic reasoning. I haven’t written about it on this blog, but expect future posts on it.

A thought experiment: imagine a murderous psychopath who has decided to go on an unusual rampage. He will start by abducting one random person. He rolls a pair of dice, and kills the person if they land snake eyes (1, 1). If not, he lets them free and hunts down ten new people. Once again, he rolls his pair of die. If he gets snake eyes he kills all ten. Otherwise he frees them and kidnaps 100 new people. On and on until he eventually gets snake eyes, at which point his murder spree ends.

Now, you wake up and find that you have been abducted. You don’t know how many others have been abducted alongside you. The murderer is about to roll the dice. What is your chance of survival?

Your first thought might be that your chance of death is just the chance of both dice landing 1: 1/36. But think instead about the proportion of all people that are ever abducted by him that end up dying. This value ends up being roughly 90%! So once you condition upon the information that you have been captured, you end up being much more worried about your survival chance.

But at the same time, it seems really wrong to be watching the two dice tumble and internally thinking that there is a 90% chance that they land snake eyes. It’s as if you’re imagining that there’s some weird anthropic “force” pushing the dice towards snake eyes. There’s way more to say about this, but I’ll leave it for future posts.

Things I’ve become un-puzzled about

Newcomb’s problem – one box or two box?

To almost everyone, it is perfectly clear and obvious what should be done. The difficulty is that these people seem to divide almost evenly on the problem, with large numbers thinking that the opposing half is just being silly.

– Nozick, 1969

I’ve spent months and months being hopelessly puzzled about Newcomb’s problem. I now am convinced that there’s an unambiguous right answer, which is to take the one box. I wrote up a dialogue here explaining the justification for this choice.

In a few words, you should one-box because one-boxing makes it nearly certain that the simulation of you run by the predictor also one-boxed, thus making it nearly certain that you will get 1 million dollars. The dependence between your action and the simulation is not an ordinary causal dependence, nor even a spurious correlation – it is a logical dependence arising from the shared input-output structure. It is the same type of dependence that exists in the clone prisoner dilemma, where you can defect or cooperate with an individual you are assured is identical to you in every single way. When you take into account this logical dependence (also called subjunctive dependence), the answer is unambiguous: one-boxing is the way to go.

Summing up:

Things I remain conceptually confused about:

  • Consciousness
  • Decision theory & free will
  • Objective priors
  • Infinities in decision theory
  • Fundamentality of causality
  • Dependence of induction on concept choice
  • Self-defeating beliefs
  • Anthropic reasoning

More on quantum entanglement and irreducibility

A few posts ago, I talked about how quantum mechanics entails the existence of irreducible states – states of particles that in principle cannot be described as the product of their individual components. The classic example of such an entangled state is the two qubit state

Screen Shot 2018-07-17 at 8.03.53 PM

This state describes a system which is in an equal-probability superposition of both particles being |0 and both particles being |1. As it turns out, this state cannot be expressed as the product of two single-qubit states.

A friend of mine asked me a question about this that was good enough to deserve its own post in response. Start by imagining that Alice and Bob each have a coin. They each put their quarter inside a small box with heads facing up. Now they close their respective boxes, and shake them up in the exact same way. This is important! (as well as unrealistic) We suppose that whatever happens to the coin in Alice’s box, also happens to the coin in Bob’s box.

Now we have two boxes, each of which contains a coin, and these coins are guaranteed to be facing the same way. We just don’t know what way they are facing.

Alice and Bob pick up their boxes, being very careful to not disturb the states of their respective coins, and travel to opposite ends of the galaxy. The Milky Way is 100,000 light years across, so any communication between the two now would take a minimum of 100,000 years. But if Alice now opens her box, she instantly knows the state of Bob’s coin!

So while Alice and Bob cannot send messages about the state of their boxes any faster than 100,000 years, they can instantly receive information about each others’ boxes by just observing their own! Is this a contradiction?

No, of course not. While Alice does learn something about Bob’s box, this is not because of any message passed between the two. It is the result of the fact that in the past the configurations of their coins were carefully designed to be identical. So what seemed on its face to be special and interesting turns out to be no paradox at all.

Finally, we get to the question my friend asked. How is this any different from the case of entangled particles in quantum mechanics??

Both systems would be found to be in the states |00 and |11⟩ with equal probability (where |0⟩ is heads and |1⟩ is tails). And both have the property that learning the state of one instantly tells you the state of the other. Indeed, the coins-in-boxes system also has the property of irreducibility that we talked about before! Try as we might, we cannot coherently treat the system of both coins as the product of two independent coins, as doing so will ignore the statistical dependence between the two coins.

(Which, by the way, is exactly the sort of statistical dependence that justifies timeless decision theory and makes it a necessary update to decision theory.)

I love this question. The premise of the question is that we can construct a classical system that behaves in just the same supposedly weird ways that quantum systems behave, and thus make sense of all this mystery. And answering it requires that we get to the root of why quantum mechanics is a fundamentally different description of reality than anything classical.

So! I’ll describe the two primary disanalogies between entangled particles and “entangled” coins.

Epistemic Uncertainty vs Fundamental Indeterminacy

First disanalogy. With the coins, either they are both heads or they are both tails. There is an actual fact in the world about which of these two is true, and the probabilities we reference when we talk about the chance of HH or TT represent epistemic uncertainty. There is a true determinate state of the coins, and probability only arises as a way to deal with our imperfect knowledge.

On the other hand, according to the mainstream interpretation of quantum mechanics, the state of the two particles is fundamentally indeterminate. There isn’t a true fact out there waiting to be discovered about whether the state is |00⟩ or |11⟩. The actual state of the system is this unusual thing called a superposition of |00⟩ and |11⟩. When we observe it to be |00⟩, the state has now actually changed from the superposition to the determinate state.

We can phrase this in terms of counterfactuals: If when we look at the coins, we see that they are HH, then we know that they were HH all along. In particular, we know that if we had observed them a moment later or earlier, we would have gotten H with 100% certainty. Give that we actually observed HH, the probability that we would have observed HH is 100%.

But if we observe the state of the particles to be |00⟩, this does not mean that had we observed it a moment before, we would be guaranteed to get the same answer. Given that we actually observed |00⟩, the probability that we would have observed |00⟩ is still 50%.

(A project for some enterprising reader: see what the truths of these counterfactuals imply for an interpretation of quantum mechanics in terms of Pearl-style causal diagrams. Is it even possible to do?)

Predictive differences

The second difference between the two cases is a straightforward experimental difference. Suppose that Alice and Bob identically prepare thousands of coins as we described before, and also identically prepare thousands of entangled particles. They ensure that the coins are treated exactly the same way, so that they are guaranteed to all be in the same state, and similarly for the entangled pairs.

If they now just observe all of their entangled pairs and coins, they will get similar results – roughly half of the coins will be HH and roughly half of the entangled pairs will be |00⟩. But there are other experiments they could run on the entangled pairs that would give different answers depending on whether we treat the particles as being in superposition or not. I described what these experiments could be in this earlier post – essentially they involve applying an operation that takes qubits in and out of superposition. The conclusion of this is that even if you tried to model the entangled pair as a simple probability distribution similar to the coins, you will get the wrong answer in some experiments.

So we have both a theoretical argument and a practical argument for the difference between these two cases. They key take-away is the following:

According to quantum mechanics an entangled pair is in a state that is fundamentally indeterminate. When we describe it with probabilities, we are not saying “This probabilistic description is an account of my imperfect knowledge of the state of the system”. We’re saying that nature herself is undecided on what we will observe when we look at the state. (Side note: there is actually a way to describe epistemic uncertainty in quantum mechanics. It is called the density matrix, and is distinct from the description of superpositions.)

In addition, the most fundamental and accurate probability description for the state of the two particles is one that cannot be described as the product of two independent particles. This is not the case with the coins! The most fundamental and accurate probability description for the state of the two coins is either 100% HH or 100% TT (whichever turns out to be the case). What this means is that in the quantum case, not only is the state indeterminate, but the two particles are fundamentally interdependent – entangled. There is no independent description of the individual components of the system, there is only the system as a whole.

Ramanujan’s infinite radicals

Here’s one of my favorite puzzles.

The Problem

Solve the following equation:

Screen Shot 2018-07-28 at 7.00.13 AM

Background

This problem was initially posed by the most brilliant mathematician of the past century, an eccentric prodigy named Srinivasa Ramanujan. He submitted it to the Journal of Indian Mathematical Society without a solution, challenging his fellow mathematicians to solve it. Eventually, when nobody could do it, he gave in and revealed the answer along with a simple proof that probably put them all to shame.

Solution

Ok, so seriously, try this for yourself before reading on. Most people haven’t ever done anything with infinite nested square roots, so this is a good opportunity to be creative and come up with weird ways of solving the problem. Play around with it, give random things names, and see what you can do!

You’ll notice (as I did when I first tried working it out) that solving nested radicals can be really really hard. Things quickly get really ugly. Luckily, the answer is beautifully simple. It’s just 3!

Let’s see why. We start with a very simple identity:

Screen Shot 2018-07-28 at 7.20.54 AM

Taking the square root, we get

Screen Shot 2018-07-28 at 7.21.02 AM

If we just plug in a few integer values, we see the following:

Screen Shot 2018-07-28 at 7.16.28 AM.png

What happens if we plug in the value of 4 from the second line to the 4 in the first? We get

Screen Shot 2018-07-28 at 7.18.10 AM.png

Hm, already looking familiar! We’re not done yet – we now plug in the value of 5 from the third line to this new equation, obtaining:

Screen Shot 2018-07-28 at 7.19.37 AM.png

And once more:

Screen Shot 2018-07-28 at 7.22.31 AM.png

And etc off to infinity.

Screen Shot 2018-07-28 at 7.42.25 AM.png

And that’s the proof!

Generalizing

Now, Ramanujan actually proved a much more general result than this. I’ll prove a less general result than his, but still more general than what we just saw.

The generalization comes from noticing that our initial identity didn’t have to involve adding 1 in (x+1)2. We could have added any number to x, squared it, and then gotten a brand new general theorem about a whole class of continued root problems.

Thus we start with the identity

Screen Shot 2018-07-28 at 7.30.03 AM

or…

Screen Shot 2018-07-28 at 7.30.23 AM.png

Now we perform substitutions exactly as we did above:

screen-shot-2018-07-28-at-7-50-31-am.png

In the end, what we get is a solution for a whole class of initially very difficult seeming problems:

Screen Shot 2018-07-28 at 7.39.58 AM.png

And if we plug in m = 1 and n = 2, we get our initial problem!

Communication through entanglement

Is it possible to use quantum entanglement to communicate faster than light?

Here’s a suggestion for how we might achieve just such a thing. Two people each possess one qubit of an entangled pair in the state ⟩:

Screen Shot 2018-07-17 at 8.03.53 PM

The owner of the second qubit then decides whether or not to apply some quantum gate U to their qubit. Immediately following this, the owner of the first qubit measures their qubit.

If the application of U to the second qubit changes the amplitude distribution over the first qubit, then the measurement can be used to communicate a message between the two people instantaneously! Why? Well, initially 0 and 1 are expected with equal probability. But if applying U makes these probabilities unequal, then the observation of a 0 or 1 carries evidence as to whether or not U was applied. (In an extreme case, applying U could make it guaranteed that the qubit would be observed as |0⟩, in which case observation of |1⟩ ensures that U was not applied.)

In this way, somebody could send information across by encoding it in a string of decisions about whether or not to apply U to a shared entangled pair.

It might seem a little strange that doing something to your qubit over here could affect the state of their qubit over there. But this is quantum mechanics, and quantum mechanics is very, very strange. Remember that the two qubits are entangled with one another. We are already guaranteed that what happens to one can affect the state of the other instantaneously – after all, if we measure the second qubit and find it in the state |0⟩, the first qubit’s state is instantaneously “collapsed” into the state |0⟩ as well. (This fact alone cannot be used to communicate messages, because before measuring the second qubit, its owner has no control over which of the two states it will end up in.)

So whether or not our scheme will work cannot be ruled out a priori. We must work out the math for ourselves to see if applying U to qubit 2 can successfully warp the probability distribution of qubit 1, thus sending information between the two instantaneously.

First, we’ll describe our single qubit gate U as a matrix.

Screen Shot 2018-07-23 at 1.04.00 AM.png

a, b, c, and d are complex numbers. Can they have any possible values?

No. U must preserve the normalization of states it acts on. In other words, for U to represent a physically possible transformation of a qubit, it cannot transform physically possible states into physically impossible states.

What precise constraints does this entail? It turns out that the following two suffice to ensure the normalization condition:

Screen Shot 2018-07-23 at 1.06.35 AM.png

Alright. Now, we have our general description of a single-qubit gate. Of course, the qubit that U is operating on is a part of an entangled pair. It is an irreducible component of a two-qubit system. So we can’t actually describe it as just a single qubit.

Instead, we need to describe the state of both qubits, which, I’ll remind you, looks like:

Screen Shot 2018-07-23 at 1.09.48 AM.png

A 2×2 matrix can’t operate on a vector with four components. What we need is a two-qubit quantum gate that corresponds to applying U to qubit 2 while leaving qubit 1 alone. “Leaving a qubit alone” is equivalent to applying the identity gate I to it, which just leaves the state unchanged.

Screen Shot 2018-07-23 at 1.13.42 AM.png

So what we really want is the 4×4 matrix that corresponds to applying U to qubit 2 and I to qubit 1. It turns out that we can generate this matrix by simply taking the tensor product of U with I:

Screen Shot 2018-07-23 at 1.17.56 AM.png

Alright, now we’re ready to see what happens when we apply this gate to ⟩!

Screen Shot 2018-07-23 at 1.21.19 AM.png

This state sure looks different than the state we started with! But is it different enough to have carried some information between the qubits? Let’s now look at the probabilities for measuring qubit 1 in the states 0 and 1:

Screen Shot 2018-07-23 at 1.26.11 AM.png

Screen Shot 2018-07-23 at 1.29.32 AM.png

Screen Shot 2018-07-23 at 1.31.24 AM.png

Remember the constraints on the possible values of a, b, c, and d we started with?

Screen Shot 2018-07-23 at 1.32.56 AM.png

Which leads us to the final answer!

Screen Shot 2018-07-23 at 1.34.53 AM.png

Sadly, it looks like our method won’t work to produce faster-than-light communication. No matter what gate we apply to the second qubit, it has no effect on the observed probabilities of the first. And therefore, no information can be sent by applying U.

Of course, this does not rule out all possible ways to attempt to utilize entanglement to communicate faster-than-light. But it does provide a powerful demonstration of the way in which such attempts are defeated by the laws of quantum mechanics.

Quantum mechanics, reductionism, and irreducibility

Take a look at the following two qubit state:

Screen Shot 2018-07-17 at 8.03.53 PM

Is it possible to describe this two-qubit system in terms of the states of the two individual particles that compose it? The principle of reductionism suggests to us that yes it should be possible; after all, of course we can always take any larger system and describe it perfectly fine in terms of its components.

But this turns out to not be the case! The above state is a perfectly allowable physical configuration of two particles, but there is no accurate description of the state of the individual particles composing the system!

Multi-particle systems cannot in general be reduced to their parts. This is one of the shocking features of quantum mechanics that is extremely easy to prove, but is rarely emphasized in proportion to its importance. We’ll prove it now.

Suppose we have a system composed of two qubits in states |Ψ1⟩ and |Ψ2⟩. In general, we may write:

Screen Shot 2018-07-17 at 6.24.32 PM

Now, as we’ve seen in previous posts, we can describe the state of the two qubits as a whole by simply smushing them together as follows:

Screen Shot 2018-07-17 at 6.24.37 PM

So the set of all two-qubit states that can be split in their component parts is the set of all states arising from all possible values of α1, α2, β1, and β2 such that all states are normalized. I.e.

Screen Shot 2018-07-17 at 11.15.18 PM

However, there’s also a theorem that says that if any two states are physical possible, then all normalized linear combinations are physically possible as well. Because the states |00⟩, |01⟩, |10⟩, and |11⟩ are all physically possible, and because they form a basis for the set of two-qubit states, we can write out the set of all possible states:

Screen Shot 2018-07-17 at 6.50.09 PM.png

Now the philosophical question of whether or not there exist states that are irreducible can be formulated as a precise mathematical question: Does A = R?

And the answer is no! It turns out that A is much much larger than R.

The proof of this is very simple. R and A are both sets defined by a set of four complex numbers, and they share a constraint. But R also has two other constraints, independent of the shared constraint. That is, the two additional constraints cannot be derived from the first (try to derive it yourself! Or better, show that it cannot be derived). So the set of states that satisfy the conditions necessary to be in R must be smaller than the set of states that satisfy the conditions necessary to be in A. This is basically just the statement that when you take a set, and then impose a constraint on it, you get a smaller set.

An even simpler proof of the irreducibility of some states is to just give an example. Let’s return to our earlier example of a two-qubit state that cannot be decomposed into its parts:

Screen Shot 2018-07-17 at 8.03.53 PM

Suppose that ⟩ is reducible. Then for both |00⟩ and |11⟩ to have a nonzero amplitude, there must be a nonzero amplitude for the first qubit to be in the state |0⟩ and for the second to be in the state |1⟩. But then there can’t be zero amplitude for the state |01⟩. Q.E.D.!

More precisely:

Screen Shot 2018-07-20 at 12.16.01 AM.png

Screen Shot 2018-07-17 at 8.22.17 PM

So here we have a two-qubit state that is fundamentally irreducible. There is literally no possible description of the individual qubits on their own. We can go through all possible states that each qubit might be in, and rule them out one.

Let’s pause for a minute to reflect on how totally insane this is. It is a definitive proof that according to quantum mechanics, reality cannot necessarily be described in terms of its smallest components. This is a serious challenge to the idea of reductionism, and I’m still trying to figure out how to adjust my worldview in response. While the notion of reductionism as “higher-level laws can be derived as approximations of the laws of physics” isn’t challenged by this, the notion that “the whole is always reducible to its parts” has to go.

In fact, I’ll show in the next section that if you try to make predictions about an system but analyze it in terms of its smallest components, you will not in general get the right answer.

Predictive accuracy requires holism

So suppose that we have two qubits in the state we already introduced:

Screen Shot 2018-07-17 at 8.03.53 PM

You might think the following: “Look, the two qubits are either both in the state |0, or both in the state |1⟩. There’s a 50% chance of either one happening. Let’s suppose that we are only interested in the first qubit, and don’t care what happens with the second one. Can’t we just say that the first qubit is in a state with amplitudes 1/√2 in both states |0⟩ and |1⟩? After all, this will match the experimental results when we measure the qubit (50% of the time it is |0⟩ and 50% of the time it is |1⟩.”

Okay, but there are two big problems with this. First of all, while it’s true that each particle has a 50% chance of being observed in the state |0⟩, if you model these probabilities as independent of one another, then you will end up concluding that there is a 25% chance of the first particle being in the state |0⟩ and the second being in the state |1⟩. Whereas in fact, this will never happen!

You may reply that this is only a problem if you’re interested in making predictions about the state of the second qubit. If you are solely looking at your single qubit, you can still succeed at predicting what will happen when you measure it.

Well, fine. But the second, more important point is that even if you are able to accurately describe what happens when you measure your single qubit, you can always construct a different experiment you could perform that this same description will give the wrong answer for.

What this comes down to is the observation that quantum gates don’t operate the same way on 1/√2 (|00⟩ + |11⟩) as on 1/√2 (|0⟩ + |1⟩).

Suppose you take your qubit and pretend that the other one doesn’t exist. Then you apply a Hadamard gate to just your qubit and measure it. If you thought that the state was initially 1/√2 (|0⟩ + |1⟩), you will now think that your qubit is in the state |0⟩. You will predict with 100% confidence that if you measure it now, you will observe |0⟩.

But in fact when you measure it, you will find that 50% of the time it is |0⟩ and 50% of the time it is |1⟩! Where did you go wrong? You went wrong by trying to describe the particle as an individual entity.

Let’s prove this. First we’ll figure out what it looks like when we apply a Hadamard gate to only the first qubit, in the two-qubit representation:

Screen Shot 2018-07-17 at 10.02.27 PMScreen Shot 2018-07-17 at 10.13.22 PM

So we have a ​25% chance of observing each of |00⟩|10⟩|01⟩, and|11⟩. Looking at just your own qubit, then, you have a 50% chance of observing |0⟩ and a 50% chance of observing |1⟩.

While your single-qubit description told you to predict a 100% chance of observing |0⟩, you actually would get a 50% chance of |0⟩ and a 50% chance of |1⟩.

Okay, but maybe the problem was that we were just using the wrong amplitude distribution for our single qubit. There are many choices we could have made for the amplitudes besides 1/√2 that would have kept the probabilities 50/50. Maybe one of these correctly simulates the behavior of the qubit in response to a quantum gate?

But no. It turns out that even though it is correct that there is a 50/50 chance of observing the qubit to be |0⟩ or |1⟩, there is no amplitude distribution matching this probability distribution that will correctly predict the results of all possible experiments.

Quick proof: We can describe a general two-qubit state with a 50/50 probability of being observed in |0⟩ and |1⟩ as follows:

Screen Shot 2018-07-19 at 2.36.10 AM

For any ⟩, we can construct a specially designed quantum gate U that transforms ⟩ into |0⟩:

Screen Shot 2018-07-19 at 2.40.01 AMScreen Shot 2018-07-19 at 2.40.10 AM

Applying U to our single qubit, we now expect to observe |0⟩ with 100% probability. But now let’s look at what happens if we consider the state of the combined system. The operation of applying U to only the first qubit is represented by taking the tensor product of U with the identity matrix I: U ⊗ I.

Screen Shot 2018-07-19 at 2.48.42 AM

Screen Shot 2018-07-19 at 2.48.56 AM

Screen Shot 2018-07-19 at 7.41.44 AM

What we see is that the two-qubit state ends up with a 25% chance of being observed as each of |00⟩, |01⟩, |10⟩, and |11⟩. This means that there is still a 50% chance of the first qubit being observed as |0⟩ and |1⟩.

This means that for every possible single qubit description of the first qubit, we can construct an experiment that will give different results than the model predicts. And the only model that always gives the right experimental predictions is a model that considers the two qubits as a single unit, irreducible and impossible to describe independently. 

To recap: The lesson here is that for some quantum systems, if you describe them in terms of their parts instead of as a whole, you will necessarily make the wrong predictions about experimental results. And if you describe them as a whole, you will get the predictions spot on.

So how many states are irreducible?

Said another way, how much larger is A (the set of all states) than R (the set of reducible states)? Well, they’re both infinite sets with the same cardinality (they each have the cardinality of the continuum, |ℝ|). So in this sense, they’re the same size of infinity. But we can think about this by considering the dimensionality of these various spaces.

Let’s take another look at the definitions of A and R:

Screen Shot 2018-07-17 at 11.15.18 PM

Screen Shot 2018-07-17 at 6.50.09 PM.png

Each set is defined by four complex numbers, or 8 real numbers. If we ignored all constraints, then, our sets would be isomorphic to ℝ8.

Now, each share the same first constraint, which says that the overall state must be normalized. This constraint cuts one dimension off of the space of solutions, making it isomorphic to ℝ7.

That’s the only constraint for A, so we can say that A ~ ℝ7. But R involves two further constraints (the normalization conditions for each individual qubit). So we have three total constraints. However, it turns out that one of them can be derived from the others – two normalized qubits, when smushed together, always produce a normalized state. This gives us a net two constraints, meaning that the space of reducible states is isomorphic to ℝ6.

The space of irreducible states is what’s left when we subtract all elements of R from A. The dimensionality of this is just the same as the dimensionality of A. (A 3D volume minus a plane is still a 3D volume, a plane minus a curve is still two dimensional, a curve minus a point is still one dimensional, and so on.)

So both the space of total states and the space of irreducible states are 7-real-dimensional, while the space of reducible states is 6-real dimensional.

Screen Shot 2018-07-19 at 3.39.07 AM

You can visualize this as the space of all states being a volume, through which cuts a plane that composes all reducible states. The entire rest of the volume is the set of irreducible states. Clearly there are a lot more irreducible states than reducible states.

What about if we consider totally reducible three-qubit states? Now things are slightly different.

The set of all possible three qubit states (which we’ll denote A3) is a set of 8 complex numbers (16 real numbers) with one normalization constraint. So A3 ~ ℝ15.

The set of all totally reducible three qubit states (which we’ll denote R3) is a set of only six complex numbers. Why? Because we only need to specify two complex numbers for each of the three individual qubits that will be smushed together. So we start off with only 12 real numbers. Then we have three constraints, one for the normalization of each individual qubit. And the final normalization constraint (of the entire system) follows from the previous three constraints. In the end, we see that R3 ~ ℝ9.

Screen Shot 2018-07-19 at 3.40.17 AM

Now the space of reducible states is six-dimensions less than the space of all states.

How does this scale for larger quantum systems? Let’s look in general at a system of N qubits.

AN is a set of 2N complex amplitudes (2N+1 real numbers), one for each N qubit state. There is just one normalization constraint. Thus we have a space with 2N+1 – 1 real dimensions.

On the other hand, RN is a set of only 2N complex amplitudes (4N real numbers), two for each of the N individual qubits. And there are N independent constraints ensuring that all states are normalized. So we have:

screen-shot-2018-07-19-at-4-04-32-am-e1532019398268.png

The point of all of this is that as you consider larger and larger quantum systems, the dimensionality of the space of irreducible states grows exponentially, while the dimensionality of the space of reducible states only grows linearly. If we were to imagine randomly selecting a 20-qubit state from the space of all possibilities, we would be exponentially more likely to ending up with a space that cannot be described as a product of each of its parts.

What this means is that irreducibility is not a strange exotic phenomenon that we shouldn’t expect to see in the real world. Instead, we should expect that basically all systems we’re surrounded by are irreducible. And therefore, we should expect that the world as a whole is almost certainly not describable as the sum of individual parts.

Building the diffusion operator

Part of the quantum computing series
Part 1: Quantum Computing in a Nutshell
Part 2: More on quantum gates
Part 3: Deutch-Josza Algorithm
Part 4: Grover’s algorithm
Part 5: Building the quantum oracle

Let’s more precisely define the Grover diffusion operator D we used for Grover’s algorithm, and see why it functions to flip amplitudes over the average amplitude.

First off, here’s a useful bit of shorthand we’ll use throughout the post. We define the uniform superposition over states as |s⟩:

Screen Shot 2018-07-17 at 1.32.40 PM

We previously wrote that flipping an amplitude ax over the average of all amplitudes ā involved the transformation ax → 2ā – ax. This can be understood by a simple geometric argument:

37311753_10216943203002375_1653742952205254656_n.jpg

Now, the primary challenge is to figure out how to build a quantum gate that returns the average amplitude of a state. In other words, we want to find an operator A such that acting on a state ⟩ gives:

Screen Shot 2018-07-17 at 2.25.12 PM

If we can find this operator, then we can just define D as follows:

Screen Shot 2018-07-17 at 1.47.26 PM

It turns out that we can define A solely in terms of the uniform superposition.

Screen Shot 2018-07-17 at 2.00.08 PM

As a matrix, A would look like:

Screen Shot 2018-07-17 at 3.03.23 PM.png

Proof that this satisfies the definition:

Screen Shot 2018-07-17 at 2.41.44 PM.pngScreen Shot 2018-07-17 at 2.38.38 PM

Thus we have our full definition of D!

Screen Shot 2018-07-17 at 2.44.58 PM.png

As a matrix, D looks like:

Screen Shot 2018-07-17 at 3.05.26 PM

Building the quantum oracle

Part 1: Quantum Computing in a Nutshell
Part 2: More on quantum gates
Part 3: Deutch-Josza Algorithm
Part 4: Grover’s algorithm

The quantum gate Uf was featured centrally in both of the previous algorithms I presented. Remember what it does to a qubit in state |x⟩, where x ∈ {0, 1}N:

Screen Shot 2018-07-16 at 11.48.24 PM

I want to show here that this gate can be constructed from a simpler more intuitive version of a quantum oracle. This will also be good practice for getting a deeper intuition about how quantum gates work.

This will take three steps.

1. Addition Modulo 2

First we need to be able implement addition modulo 2 of two single qubits. This operation is defined as follows:

Screen Shot 2018-07-17 at 11.39.33 AM

An implementation of this operation as a quantum gate needs to return two qubits instead of just one. A simple choice might be:

Screen Shot 2018-07-17 at 11.56.51 AM

Screen Shot 2018-07-17 at 11.54.16 AM

37303465_10216943141120828_307666649354338304_n

2. Oracle

Next we’ll need a straight-forward implementation of the oracle for our function f as a quantum gate. Remember that f is a function from {0, 1}N → {0, 1}. Quantum gates must have the same number of inputs and outputs, and f takes in N bits and returns only a single bit, so we have to improvise a little. A simple implementation is the following:

37276624_10216943141000825_5593061577134702592_n
Screen Shot 2018-07-17 at 12.12.26 PM

In other words, we start with N qubits encoding the input to x, as well as a “blank” qubit that starts as |0⟩. Then we leave the first N qubits unchanged, and encode the value of f(x) in the initially blank qubit.

3. Flipping signs

Finally, we’ll use a clever trick. Let’s take a second look at the ⊕ gate.

37303465_10216943141120828_307666649354338304_n

Suppose we start with

Screen Shot 2018-07-17 at 12.36.10 PM

Then we get:

screen-shot-2018-07-17-at-12-36-10-pm.png

Let’s consider both cases, f(x) = 0 and f(x) = 1.

Screen Shot 2018-07-17 at 12.44.59 PM

Also we can notice that we can get the state |y⟩ by applying a Hadamard gate to a qubit in the state |1⟩. Thus we can draw:

37293845_10216943141440836_1386368915768082432_n

Putting it all together

We combine everything we learned so far in the following way:

37239768_10216943140920823_1403664512845873152_n

Screen Shot 2018-07-17 at 12.56.21 PM.png

If we now ignore the last two qubits, as they were only really of interest to us for the purposes of building U, we get:

Screen Shot 2018-07-17 at 12.57.30 PM

And there we have it! We have built the quantum gate Uf that we used in the last two posts.

37321311_10216943141520838_8488741241201098752_n

 

Grover’s algorithm

This is part 4 of my series on quantum computing. The earlier posts provide necessary background material.
Part 1: Quantum Computing in a Nutshell
Part 2: More on quantum gates
Part 3: Deutch-Josza Algorithm

Grover’s algorithm

This algorithm involves searching through an unsorted list for a particular item. Let’s first look at how this can be optimally solved classically.

Suppose you have private access to the following list:

  1. apple
  2. banana
  3. grapefruit
  4. kiwi
  5. guava
  6. mango
  7. lemon
  8. papaya

A friend of yours wants to know where on the list they could find guava. They are allowed to ask you questions like “Is the first item on the list guava?” and “Is the nth item on the list gauva?”, for any n they choose. Importantly, they have no information about the ordering of your list.

How many queries do they need to perform in order to answer this question?

Well, since they have no information about its placement, they can do no better than querying random items in the list and checking them off one by one. In the best case, they find it on their first query. And in the worst case, they find it on their last query. Thus, if the list is of length N, the number of queries in the average case is N/2.

Grover’s algorithm solves the same problem with roughly √N queries. This is only a quadratic speedup, but still should seem totally impossible. What this means is that in a list of 1,000,000 items, you can find any item of your choice with only about 1,000 queries.

Let’s see how.

Firstly, we can think about the search problem as a function-analysis problem. We’re not really interested in what the items in the list are, just whether or not the item is guava. So we can transform our list of items into a simple binary function: 0 if the input is not the index of the item ‘guava’ and 1 if the input is the index of the item ‘guava’. Now our list looks like:

Screen Shot 2018-07-17 at 12.40.53 AM

Our challenge is now to find out for which value of x f(x) returns 1.

Now, this algorithm uses three quantum gates. Two of them we’ve already seen: HN and Uf. I’ll remind you what these two do:

Screen Shot 2018-07-16 at 11.48.15 PM
Screen Shot 2018-07-16 at 11.48.24 PM

The third is called the Grover diffusion operator, D. In words, what D does to a state Ψ is reflect it over the average amplitude in Ψ. Visually, this looks like:

37323292_10216937657223734_1090394773711224832_n

Mathematically, this transformation can be defined as follows:

Screen Shot 2018-07-17 at 12.24.52 AM

Check for yourself that 2ā – ax flips the amplitude ax over ā. We are guaranteed that D is a valid quantum gate because it keeps the state normalized (the average amplitude remains the same after the flip).

Now, with HNU, and D in hand, we are ready to present the algorithm:

37282180_10216937717785248_5535871703881613312_n.jpg

We start with all N qubits in the state |00…0⟩, and apply the Hadamard gate to put them in a uniform superposition. Then we apply U to flip the amplitude of the desired input, and D to flip the whole superposition over the average. We repeat this step square root of the length of the list times, and then measure the qubits.

Here’s a visual intuition for why this works. Let’s suppose that N=3 (so our list contains 8 items), and the item we’re looking for is the 5th in the list (100 in binary). Here is each step in the algorithm:

37279849_10216937747705996_3262919850573430784_n

You can see that what the combination of U and D does to the state is that it magnifies the amplitude of the desired value of f. As you do this again and again, the amplitude is repeatedly magnified, reaching a maximum after sqrt(2N) repeats.

And that’s Grover’s algorithm! Once again we see that the key to it is the unusual phenomenon of superposition. By putting the state into superposition in our first step, we manage to make each individual query give us information about all possible inputs. And through a little cleverness, we are able to massage the amplitudes of our qubits in order that our desired output becomes overwhelmingly likely.

A difference between Grover’s algorithm and the Deutsch-Josza algorithm is that Grover’s algorithm is non-deterministic. It isn’t guaranteed with 100% probability of giving the right answer. But each run of the algorithm only delivers an incorrect answer with probability 1/2N, which is an exponentially small error. Repeating the algorithm n times gives you the wrong answer with only (1/2N)2 probability. And so on.

While it is a more technically complicated algorithm than the Deutsch-Josza algorithm, I find Grover’s algorithm easier to intuitively understand. Let me know in the comments if anything remains unclear!

Deutch-Josza Algorithm

Part 3 of the series on quantum computing.
Part 1: Quantum Computing in a Nutshell
Part 2: More on quantum gates

Okay, now that we’re familiar with qubits and some basic quantum gates, we’re ready to jump into quantum algorithms. First will be the Deutch-Josza algorithm, famous for its exponential speedup.

The problem to solve: You are handed a function f from {0,1}N to {0,1}, and guaranteed that it is either balanced or constant. A balanced function outputs an equal number of 0s and 1s across all inputs, and a constant function outputs all 0s or all 1s. For instance, if N = 3:

Screen Shot 2018-07-16 at 8.23.43 PM

f1 is balanced, and f2 is constant.

You are given an oracle that allows you to query f on an input of your choice. Your goal is to find out if f is balanced or constant with as few queries to the oracle as possible.

Let’s first figure out the number of queries classically required to solve this problem.

If f is balanced, then the minimum amount of queries required to determine that it is not constant is two – a 0 and a 1. And the worst case is that f is constant. In this case, we can only be sure it is not balanced by searching through half of all possible inputs, plus one. (For instance, we might get the series of outputs 0, 0, 0, 0, 0.)

So, since the number of possible inputs is 2N, the number of queries is between 2 and 2N-1 + 1. The average case thus involves roughly 2N-2 queries.

How about if we try to go about solving this problem with a quantum computers? How many queries to the oracle need we perform in order to determine if the function is balanced or constant?

Just one!

It turns out that there exists an algorithm that uses the oracle a single time, and always determines with 100% confidence whether f is balanced or constant.

Let’s prove it.

The algorithm uses only two different gates: the Hadamard gate H, and the quantum version of the oracle gate.

We’ll call the oracle gate Uf. Acting on the qubit state |x⟩, it returns (-1)f(x)|x⟩. That is, it flips the sign of the state if and only if evaluating f on the input x outputs 1. Otherwise it leaves the state of the qubits unchanged. So we have:

Screen Shot 2018-07-16 at 9.44.14 PM

In addition, we will use the general N-qubit Hadamard gate HN. We discussed how to get the matrix for the 2-qubit Hadamard gate last time. The general form of the Hadamard gate is:

Screen Shot 2018-07-16 at 9.59.39 PM.png

(You can prove this by showing it’s true for N=1, and then showing that if it’s true for N, then it’s true for N+1.)

Without further ado, here is the Deutch-Josza algorithm:

37229535_10216937150091056_3490694902721806336_n

In words, you start with all N qubits in the state |00…0⟩. Then you apply HN to all of them. Then you apply Uf, and finally HN again.

Upon measurement, if you find the qubits in the state |00…0⟩, then f is a constant function. If you find them in any other state, then f is guaranteed to be a balanced function.

Why is this so? Let’s prove it by tracking the states through each step.

At first, the state is |00…0⟩. After applying the Hadamard gate, we get:

Screen Shot 2018-07-16 at 10.36.43 PM

In words, the qubit enters a uniform superposition over all possible states.

Now, applying Uf to the new state, we get:

Screen Shot 2018-07-16 at 10.34.49 PM

And finally, applying the Hadamard gate to this state, we get:

Screen Shot 2018-07-16 at 10.35.18 PM

For the purposes of this algorithm, we’re really only interested in the amplitude of the |00…0⟩ state. This amplitude is simply:

Screen Shot 2018-07-16 at 10.46.04 PM

What we want to show is that the square of this amplitude is 0 if f is balanced and 1 if f is constant. (I.e. the probability of observing |00…0⟩ is 0% for a balanced function and 100% for a constant function). Let’s look at both cases in turn.

Case 1: f is constant

If f is constant, than all the terms in the sum are the same: either 1/2N or -1/2N. Since there are 2N values for y to be summed over, the amplitude is either +1 or -1. And since the probability is the amplitude squared, this probability is guaranteed to be 100%!

Case 2: f is balanced

If f is balanced, then there are an equal number of 1/2N terms and -1/2N terms. I.e. every 1/2N term cancels out a -1/2N term. Thus, the amplitude is zero, and so is the probability!

In conclusion, we’ve now seen how a quantum computer can take a problem that can only be classically solved with an average of 2N-2 queries, and solves it with exactly 1 query. I.e. quantum algorithms can provide exponential speedups over classical algorithms!

Looking carefully at the inner workings of this algorithm, you can get a clue as to how it achieves this speedup. The key is in the ability to superpose a single qubit into multiple observable states. This allowed us to apply our oracle operator to all possible inputs at once, in exactly one step.

Somebody that believed in the many-worlds interpretation of quantum mechanics would say that when we put our qubit into superposition, we were creating 2N worlds, each of which contained a different possible input value. Then we leveraged this exponential number of worlds for our computational benefit, applying the operation in all the worlds at once before collapsing the superposition back to one single world with the final Hadamard gate.

It’s worth making a historical note here. David Deutsch was one of the two people that discovered this algorithm, which was the first example where a quantum computer would be able to surpass the ordinary limits on classical computers. He is also a hardcore proponent of the many-worlds interpretation, and has stated that the fact that quantum computers can do more than classical computers is strong evidence for this interpretation. His argument is that the only possible explanation for quantum computers surpassing the classical limits is that some of the computation is being offloaded into other universes.

Anyway, next we’ll look at a more famous algorithm, Grover’s algorithm. This one only provides a quadratic (not exponential) speed up. However, it solves a more useful problem: how to search through an unsorted list to find a particular item. See you next time!

More on quantum gates

Context for this post.

In the last post, we described what qubits are and how quantum computing involves the manipulation of these qubits to perform useful calculations.

In this post, we’ll abstract away from the details of the physics of qubits and just call the two observable states |0⟩ and |1⟩, rather than |ON⟩ and |OFF⟩. This will be useful for ultimately describing quantum algorithms. But before we get there, we need to take a few more steps into the details of quantum gates.

Recap: the general description of a qubit is |Ψ⟩ = α|0⟩ + β|1⟩, where α and β are called amplitudes, and |α|² and |β|² are the probabilities of observing the system in the state |0⟩ and |1⟩, respectively.

We can also express the states of qubits as vectors, like so:

Screen Shot 2018-07-15 at 12.03.18 AM.png

Quantum gates are transformations from quantum states to other quantum states. We can express these transformations as matrices, which when applied to state vectors yield new state vectors. Here’s a simple example of a quantum gate called the X gate:

Screen Shot 2018-07-14 at 11.50.10 PM

Applied to the states |0⟩ and |1⟩, this gate yields

Screen Shot 2018-07-15 at 12.05.21 AM.png

Applied to any general state, this gate yields:

Screen Shot 2018-07-15 at 12.05.12 AM.png

Another gate that is used all the time is the Hadamard gate, or H gate.

Screen Shot 2018-07-15 at 12.08.14 AM

Let’s see what it does to the |0⟩ and |1⟩ states:

Screen Shot 2018-07-15 at 12.12.08 AM.png

In words, H puts ordinary states into superposition. Superposition is the key to quantum computing. Without it, all we have is a fancier way of talking about classical computing. So it should make sense that H is a very useful gate.

One more note on H: When you apply it to a state twice, you get back the state you started with. A simple proof of this comes by just multiplying the H matrix by itself:

Screen Shot 2018-07-15 at 12.48.13 AM.png

Okay, enough with single qubits. While they’re pretty cool as far as they go, any non-trivial quantum algorithm is going to involve multiple qubits.

It turns out that everything we’ve said so far generalizes quite nicely. If we have two qubits, we describe the combined system by smushing them together with what’s called a tensor product (denoted ⊗). What this ends up looking like is the following:

Screen Shot 2018-07-15 at 12.53.42 AM.png

The first number refers to the state of the first qubit, and the second refers to the state of the second.

Let’s smush together two arbitrary qubits:

Screen Shot 2018-07-15 at 1.00.08 AM

This is pretty much exactly what we should have expected combining qubit states would look like.

The amplitude for the combined state to be |00⟩ is just the product of the amplitude for the first qubit to be |0⟩ and the second to be |0⟩. The amplitude for the combined state to be |01⟩ is just the product of the amplitude for the first qubit to be |0⟩ and second to be |1⟩. And so on.

We can write a general two qubit state as a vector with four components.

Screen Shot 2018-07-15 at 1.05.34 AM

And as you might expect by now, two-qubit gates are simply 4 by 4 matrices that act on such vectors to produce new vectors. For instance, we can calculate the 4×4 matrix corresponding to the action of a Hadamard gate on both qubits:

Screen Shot 2018-07-15 at 1.21.28 AM.png

Why the two-qubit Hadamard gate has this exact form is a little beyond the scope of this post. Suffice it to say that this is the 4×4 matrix that successfully transforms two qubits as if they had each been put through a single-qubit Hadamard gate. (You can verify this for yourself by simply applying H to each qubit individually and then smushing them together in the way we described above.)

Here’s what the two-qubit Hadamard gate does to the four basic two-qubit states.

Screen Shot 2018-07-15 at 1.25.51 AM.png

Here’s a visual representation of this transformation using bar graphs:

37160831_10216929417257740_3127265576272003072_n-1.jpg

We can easily extend this further to three, four, or more qubits. The state vector describing a N-qubit system must consider the amplitude for all possible combinations of 0s and 1s for each qubit. There are 2ᴺ such combinations (starting at 00…0 and ending at 11…1). So the vector describing an N-qubit system is composed of 2ᴺ complex numbers.

If you’ve followed everything so far, then we are now ready to move on to some actual quantum algorithms! In the next post, we’ll see first how qubits can be used to solve problems that classical bits cannot, and then why quantum computers have this enhanced problem-solving ability.