A Cognitive Instability Puzzle, Part 2

This is a follow of this previous post, in which I present three unusual cases of belief updating. Read it before you read this.

I find these cases very puzzling, and I don’t have a definite conclusion for any of them. They share some deep similarities. Let’s break all of them down into their basic logical structure:

Joe
Joe initially believes in classical logic and is certain of some other stuff, call it X.
An argument A exists that concludes that X can’t be true if classical logic is true.
If Joe believes classical logic, then he believes A.
If Joe believes intuitionist logic, then he doesn’t believe A.

Karl
Karl initially believes in God and is certain of some other stuff about evil, call it E.
An argument A exists that concludes that God can’t exist if E is true.
If Karl believes in God, then he believes A.
If Karl doesn’t believe in God, then he doesn’t believe A.

Tommy
Tommy initially believes in her brain’s reliability and is certain of some other stuff about her experiences, call it Q.
An argument A exists that concludes that hat her brain can’t be reliable if Q is true.
If Tommy believes in her brain’s reliability, then she believes A.
If Tommy doesn’t believe in her brain’s reliability, then she doesn’t believe A.

First of all, note that all three of these cases are ones in which Bayesian reasoning won’t work. Joe is uncertain about the law of the excluded middle, without which you don’t have probability theory. Karl is uncertain about the meaning of the term ‘evil’, such that the same proposition switches from being truth-apt to being meaningless when he updates his beliefs. Probability theory doesn’t accommodate such variability in its language. And Tommy is entertaining a hypothesis according to which she no longer accepts any deductive or inductive logic, which is inconsistent with Bayesianism in an even more extreme way than Joe.

The more important general theme is that in all three cases, the following two things are true: 1) If an agent believes A, then they also believe an argument that concludes -A. 2) If that agent believes -A, then they don’t believe the argument that concludes -A.

Notice that if an agent initially doesn’t believe A, then they have no problem. They believe -A, and also happen to not believe that specific argument concluding -A, and that’s fine! There’s no instability or self-contradiction there whatsoever. So that’s really not where the issue lies.

The mystery is the following: If the only reason that an agent changed their mind from A to -A is the argument that they no longer buy, then what should they do? Once they’ve adopted the stance that A is false, should they stay there, reasoning that if they accept A they will be led to a contradiction? Or should they jump back to A, reasoning that the initial argument that led them there was flawed?

Said another way, should they evaluate the argument against A from their own standards, or from A’s standards? If they use their own standards, then they are in an unstable position, where they jump back and forth between A and -A. And if they always use A’s standards… well, then we get the conclusion that Tommy should believe herself to be a Boltzmann brain. In addition, if they are asked why they don’t believe A, then they find themselves in the weird position of giving an explanation in terms of an argument that they believe to be false!

I find myself believing that either Joe should be an intuitionist, Karl an atheist, and Tommy a radical skeptic, OR Joe a classical-logician, Karl a theist, and Tommy a reliability-of-brain-believer-in. That is, it seems like there aren’t any significant enough disanalogies between these three cases to warrant concluding one thing in one case and then going the other direction in another.

Logic, Theism, and Boltzmann Brains: On Cognitively Unstable Beliefs

First case

Propositional logic accepts that the proposition A-A is necessarily true. This is called the law of the excluded middle. Intuitionist logic differs in that it denies this axiom.

Suppose that Joe is a believer in propositional logic (but also reserves some credence for intuitionist logic). Joe also believes a set of other propositions, whose conjunction we’ll call X, and has total certainty in X.

One day Joe discovers that a contradiction can be derived from X, in a proof that uses the law of the excluded middle. Since Joe is certain that X is true, he knows that X isn’t the problem, and instead it must be the law of the excluded middle. So Joe rejects the law of the excluded middle and becomes an intuitionist.

The problem is, as an intuitionist, Joe now no longer accepts the validity of the argument that starts at X and concludes -X! Why? Because it uses the law of the excluded middle, which he doesn’t accept.

Should Joe believe in propositional logic or intuitionism?

Second case

Karl is a theist. He isn’t absolutely certain that theism is correct, but holds a majority of his credence in theism (and the rest in atheism). Karl is also 100% certain in the following claim: “If atheism is true, then the concept of ‘evil’ is meaningless”, and believes that logically valid arguments cannot be made using meaningless concepts.

One day somebody presents the problem of evil to Karl, and he sees it as a crushing objection to theism. He realizes that theism, plus some other beliefs about evil that he’s 100% confident in, leads to a contradiction. So since he can’t deny these other beliefs, he is led to atheism.

The problem is, as an atheist, Karl no longer accepts the validity of the argument that starts at theism and concludes atheism! Why? Because the arguments rely on using the concept of ‘evil’, and he is now certain that this concept is meaningless, and thus cannot be used in logically valid arguments.

Should Karl be a theist or an atheist?

Third case

Tommy is a scientist, and she believes that her brain is reliable. By this, I mean that she trusts her ability to reason both deductively and inductively. However, she isn’t totally certain about this, and holds out a little credence for radical skepticism. She is also totally certain about the content of her experiences, though not its interpretation (i.e. if she sees red, she is 100% confident that she is experiencing red, although she isn’t necessarily certain about what in the external world is causing the experience).

One day Tommy discovers that reasoning deductively and inductively from her experiences leads her to a model of the world that entails that her brain is actually a quantum fluctuation blipping into existence outside the event hole of a black hole. She realizes that this means that with overwhelmingly high probability, her brain is not reliable and is just producing random noise uncorrelated with reality.

The problem is, if Tommy believes that her brain is not reliable, then she can no longer accept the validity of the argument that led her to this position! Why? Well, she no longer trusts her ability to reason deductively or inductively. So she can’t accept any argument, let alone this particular one.

What should Tommy believe?

— — —

How are these three cases similar and different? If you think that Joe should be an intuitionist, or Karl an atheist, then should Tommy believe herself to be a black hole brain? Because it turns out that many cosmologists have found themselves to be in a situation analogous to Case 3! (Link.) I have my own thoughts on this, but I won’t share them for now.

Philosophers of religion are religious. Why?

In 2009, David Chalmers organized a massive survey of over 3000 professional philosophers, grad students, and undergrads, asking them questions about all things philosophical and compiling the results. The results are broken down by area of specialization, age, race, gender, and everything else you might be interested in.

Here’s a link to the paper, and here to a listing of all survey results.

This is basically my favorite philosophy paper to read, and I find myself going back to look at the results all the time. I’d love to see an updated version of this survey, done ten years later, to see how things have changed (if at all).

There’s a whole lot I could talk about regarding this paper, but today I’ll just focus on one really striking result. Take a look at the following table from the paper:

Screen Shot 2019-09-11 at 1.36.13 PM.png

What’s shown is the questions which were answered most differently by specialists and non-specialists. At the very top of the list, with a discrepancy more than double the second highest, is the question of God’s existence. 86.78% of non-specialists said yes to atheism, and by contrast only 20.87% of philosophers of religion said yes to atheism. This is fascinating to me.

Here are two narratives one could construct to make sense of these results.

Narrative One

Philosophers that specialize in philosophy of religion probably select that specialization because they have a religious bias. A philosophically-minded devout Catholic is much more likely to go into philosophy of religion than, say, philosophy of language. And similarly, an atheistic philosopher would have less interest in studying philosophy of religion, being that they don’t even believe in the existence of the primary object of study, than a religious philosopher. So the result of the survey is exactly what you’d expect by the selection bias inherent in the specialization.

Narrative Two

Philosophers, like everybody else, are vulnerable to a presumption in favor of the beliefs of their society. Academics in general are quite secular, and in many quarters religion is treated as a product of a bygone age. So it’s only natural that philosophers that haven’t looked too deeply into the issue come out believing basically what the high-status individuals in their social class believe. But philosophers of religion, on the other hand, are those that have actually looked most closely and carefully at the arguments for and against atheism, and this gives them the ability to transcend their cultural bias and recognize the truth of religion.

As an atheist, it’s perhaps not surprising that my immediate reaction to seeing this result was something like Narrative One. And upon reflection, that still seems like the more likely explanation to me. But to a religious person, I’m sure that Narrative Two would seem like the obvious explanation. This, by the way, is what should happen from a Bayesian perspective. If two theories equally well explain some data, then the one with a higher prior should receive a larger credence bump than the one with a lower prior (although their odds ratio should stay fixed).

Ultimately, which of these stories is right? I don’t know. Perhaps both are right to some degree. But I think that it illustrates the difficulty of adjudicating expertise questions. Accusations of bias are quite easy to make, and can be hard to actually get to the bottom of. That said, it’s definitely possible to evaluate the first narrative, just by empirically looking at the reasons that philosophers of religion entered the field. If somebody knows of such a study, comment it or send me a message please! The results of a study like this could end up having a huge effect on my attitude towards questions of religion’s rationality.

Imagine that it turned out that most philosophers of religion were atheists when they entered the field, and only became religious after diving deep into the arguments. This is not what I’d expect to find, but if it was the case, it would serve as a super powerful argument against atheism for me.

Why do prediction markets work?

Is there a paradox in the continued existence of prediction markets? Recently I’ve been wondering this. Let me start with a little background for those that are unfamiliar with the concept of prediction markets.

Prediction markets are markets that allow you to bet on the outcomes of real-life events. This gives financial incentives to predict accurately, and as such the market price of a given bet reflects a kind of aggregate credence for that event occurring. There’s a whole bunch of results, theoretical and applied, that indicate that prediction markets serve to give robustly accurate probability estimates for real-world events.

Here’s a great paper by Robin Hanson about a political system based on prediction markets, named futarchy. Essentially, the idea is that voters determine a nation’s values, so as to generate some average national welfare metric, and then betting markets are used to decide policy. Some quotes:

On info-failures as a primary problem for democracy

According to many experts in economics and development, governments often choose policies that are “inefficient” in the sense that most everyone could expect to gain from other feasible policies. Many other kinds of experts also see existing policies as often clearly inferior to known alternatives.

If inferior policies would not have been adopted had most everyone known they are inferior, and if someone somewhere knew or could have learned that they are inferior, then we can blame inferior policies on a failure of our “info” institutions. By “info” here I just mean clues and analysis that should change our beliefs. Our info institutions are those within which we induce, express, and evaluate the acquiring and sharing of info. They include public relations teams, organized interest groups, news media, conversation forums, think tanks, universities, journals, elite committees, and state agencies. Inferior policies happen because our info institutions fail to induce people to acquire and share relevant info with properly-motivated decision makers.

[…]

Where might we find better info institutions? According to most experts in economics and finance, speculative markets are exemplary info institutions. That is, active speculative markets do very well at inducing people to acquire info, share it via trades, and collect that info into consensus prices that persuade wider audiences. This great success suggests that we should consider augmenting our political info institutions with speculative market institutions. That is, perhaps we should encourage people to create, trade in, and heed policy-relevant speculative markets, instead of discouraging such markets as we do today via anti-gambling laws.

Laying out the proposal

In futarchy, democracy would continue to say what we want, but betting markets would now say how to get it. That is, elected representatives would formally define and manage an after-the-fact measurement of national welfare, while market speculators would say which policies they expect to raise national welfare. The basic rule of government would be:

    • When a betting market clearly estimates that a proposed policy would increase expected national welfare, that proposal becomes law.

Futarchy is intended to be ideologically neutral; it could result in anything from an extreme socialism to an extreme minarchy, depending on what voters say they want, and on what speculators think would get it for them.

Futarchy seems promising if we accept the following three assumptions:

  • Democracies fail largely by not aggregating available information.
  • It is not that hard to tell rich happy nations from poor miserable ones.
  • Betting markets are our best known institution for aggregating information.

On the success of prediction markets

Betting markets, and speculative markets more generally, seem to do very well at aggregating information. To have a say in a speculative market, you have to “put your money where your mouth is.” Those who know they are not relevant experts shut up, and those who do not know this eventually lose their money, and then shut up. Speculative markets in essence offer to pay anyone who sees a bias in current market prices to come and correct that bias.

Speculative market estimates are not perfect. There seems to be a long-shot bias when there are high transaction costs, and perhaps also excess volatility in long term aggregate price movements. But such markets seem to do very well when compared to other institutions. For example, racetrack market odds improve on the predictions of racetrack experts, Florida orange juice commodity futures improve on government weather forecasts, betting markets beat opinion polls at predicting U.S. election results, and betting markets consistently beat Hewlett Packard official forecasts at predicting Hewlett Packard printer sales. In general, it is hard to find information that is not embodied in market prices.

On the possibility of manipulation of prediction markets

We want policy-related info institutions to resist manipulation, that is, to resist attempts to influence policy via distorted participation. Speculative markets do well here because they deal well with “noise trading,” that is, trading for reasons other than info about common asset values. When other traders can’t predict noise trading exactly, they compensate for its expected average by an opposite average trade, and compensate for its expected variation by trading more, and by working harder to find relevant info. Theory says that if trader risk-aversion is mild, and if more effort gives more info, then increased noise trading increases price accuracy. And in fact, the most accurate real speculative markets tend to be those with the most noise trading.

What do noise traders have to do with manipulators? Manipulators, who trade hoping to distort prices, are noise traders, since they trade for reasons other than asset value info. Thus adding manipulators to speculative markets doesn’t reduce average price accuracy. This has been verified in theory, in laboratory experiments, and in the field.

Futarchy remains for me one of the coolest and most exciting ideas I’ve heard in political philosophy, and prediction markets fascinate me. But for today, I have the following question about their feasibility:

If the only individuals that are able to consistently profit off the prediction market are the best predictors, then why wouldn’t the bottom 50% of predictors continuously drop out as they lose money on the market? If so, then as the population of market participants dwindles you would end up with a small fraction of really good predictors, each of whom sometimes gets lucky and makes money and sometimes is unlucky and loses some. On average, these people won’t be able to make money any more (as the ability to make money relies on the participation of inferior predictors in the market), so they’ll drop out as well.

If this line of reasoning is right, then it seems like prediction markets should inevitably collapse as their user base drops out. Why, then, do sites like PredictIt keep functioning?

One possibility is that there’s something wrong with the argument. This is honestly where most of my credence lies; tons of smart people endorse the idea, and this seems like a fairly obviously central flaw in the concept for them all to miss. If this argument isn’t wrong, though, then we have an interesting phenomenon to explain.

One explanation that came to my mind is that the continued survival of prediction markets is only possible because of a bug in human psychology, namely, a lack of epistemic humility. People are on average overly confident in their beliefs, and so uninformed people will continue confidently betting on propositions, even when they are generally betting against individuals with greater expertise.

Is this really what’s going on? I’m not sure. I would be surprised if humans were actually overconfident enough to continue betting on a market that they are consistently losing money on. Maybe they’d find some way to rationalize dropping out of the market that doesn’t amount to them admitting “My opinion is not worth as much as I thought it was”, but surely they would eventually stop betting after enough losses (putting aside whatever impulses drive people to gamble on guaranteed negative-expected-value games until they lose all their money.) On the other hand, it could be that the traffic of less-informed individuals does not consist of the same individuals betting over and over, and instead a constant crowd of new sheep coming in to be exploited by those more knowledgeable. What do you think? How do you explain this?

A Talmudic Probability Puzzle

Today we’ll take a little break from the more intense abstract math stuff I’ve been doing, and do a quick dive into a fun probabilistic puzzle I found on the internet.

Background for the puzzle: In Ancient Israel, there was a court of 23 wise men that tried important cases, known as the Sanhedrin. If you were being tried for a crime by the Sanhedrin and a majority of them found you guilty, you were convicted. But there was an interesting twist on this! According to the Talmud (Tractate Sanhedrin: Folio 17a), if the Sanhedrin unanimously found you guilty, you were to be acquitted.

If the Sanhedrin unanimously find [the accused] guilty, he is acquitted. Why? — Because we have learned by tradition that sentence must be postponed till the morrow in hope of finding new points in favour of the defence. But this cannot be anticipated in this case.

Putting aside the dubious logic of this rule, it gives rise to an interesting probability puzzle with a counterintuitive answer. Imagine that an accused murderer has been brought before the Sanhedrin, and that the evidence is strong enough that no judge has any doubt in their mind about his guilt. Each judge obviously wants for the murderer to be convicted, and would ordinarily vote to convict. But under this Talmudic rule, they need to be worried about the prospect of them all voting guilty and therefore letting him off scot-free!

So: If a probability p can be chosen such that each and every judge votes to convict with probability p, and to acquit with probability 1 – p, which p will give them the highest probability of ultimately convicting the guilty man?

Furthermore, imagine that the number of judges is not 23, but some arbitrarily high number. As the number of judges goes to infinity, what does p converge to?

I want you to think about this for a minute and test your intuitions before moving on.

 

(…)

 

 

(…)

 

 

(…)

 

So, it turns out that the optimal p for 23 judges is actually ≈ 75.3%. And as the number of judges goes to infinity? The optimal value of p converges to…

80%!

This was a big shock to me. I think the natural first thought is that when you have thousands and thousands of judges, you only need a minuscule chance for any one judge to vote ‘acquit’ in order to ensure a majority and prevent him from getting off free. So I initially guessed that p would be something like 99%, and would converge to 100% in the limit of infinite judges.

But this is wrong! And of the small sample of mathematically gifted friends I asked this question to, they mostly guessed the same as me.

There’s clearly a balance going on between the risk of a minority voting to convict and the risk of a unanimous vote to convict. For small p, the first of these is ~1 and the second is ~0, and for p ~ 1, the first is ~0 and the second ~1.  It seems that we are naturally underemphasizing the danger of a minority vote to convict, and overemphasizing the danger of the unanimous vote.

Here are some plots of the various relevant values, for different numbers of judges:

10 Judges23 Judges100 Judges150 Judges

One thing to notice is that as the number of judges gets larger, the graph’s peak becomes more and more of a plateau. And in the limit of infinite judges, you can show that the graph is actually just a simple step function: Pr(conviction) = 0 if p < .5, and 1 if p > .5. This means that while yes, technically, 80% is the optimal value, you can do pretty much equally well by choosing any value of p greater than 50%.

My challenge to you is to come up with some justification for the value 80%. Good luck!

Galois’ Unsolvable Polynomials

Galois’ process for finding out if a polynomial is solvable by radicals (i.e., if you can write its roots using a finite number of rational numbers and the symbols +, -, ×, /, and √) is actually surprisingly simple to describe. Here’s a quick 4-step summary of the process:

  1. Take a polynomial p(x).
  2. Define E  to be the smallest field that contains all rational numbers, as well as all the roots of p(x). This field is called the splitting field of p(x).
  3. Define Gal(E/Q) to be the set of all isomorphisms from E to itself that hold fixed all rational numbers. This set is a group, and is called the Galois group of p(x).
  4. p(x) is solvable by radicals if and only if Gal(E/Q) is a solvable group.

The proof of (4) is pretty complicated (much more involved than the proof I gave in the last post). Galois’ method is also a little weaker in that it only allows you to conclude unsolvability using the symbols +, -, ×, /, and √, whereas the last post also concluded unsolvability with √ as well as any univalent function (exp, log, sin, or whatever). However, it does allow you to not just generally state that some order-five polynomials are not solvable by radicals, but to determine for any given polynomial whether it is solvable. The process also gives you a good deal of insight into the nature of the roots of the polynomial you’re studying.

Now, the crucial step in this 4-step process is (4), which equates solvable polynomials with solvable groups. There are a few different but equivalent definitions of solvable groups:

Composition Series Definition

  • A composition series of G is a series of subgroups of G, such that each subgroup is a maximal normal subgroup of the previous subgroup (maximal means that no normal subgroup contains it besides G itself).
    • {1} = G0 G1 Gn = G.
  • G is solvable if and only if the factors of its composition series are all cyclic of prime order (i.e. if for all k, Gk+1/Gk p for prime p).

Derived Series Definition

  • The derived series of G is the series of subgroups where each subgroup is the commutator subgroup of the previous subgroup.
    • [[G,G], [G,G]] [G, G] G.
  • G is solvable if and only if this series eventually reaches the trivial group.

Subnormal Series Definition

  • A subnormal series of G is a series of subgroups of G where each subgroup is a normal subgroup of the previous subgroup, and which terminates in the trivial group.
    • {1} = G0 G1 Gn = G.
  • G is solvable if and only if all the factors of a subnormal series of G are abelian.

Solvability is an interesting group-theoretic property aside from its application in analyzing polynomial roots. Here are some things we know about solvable groups, in order of generally decreasing power:

  • Every group of odd order is solvable.
  • Every abelian group is solvable.
  • Every group of prime power order is solvable. (i.e. if |G| = pn for prime p and any n)
  • Every group of product of prime powers order is solvable. (i.e. if |G| = pnqm for primes p, q and any n, m)
  • Every group whose Sylow subgroups are cyclic is solvable.
  • No finite non-abelian simple groups are solvable.
  • If H and G/H are solvable, then G is solvable.
  • Sn is not solvable for all n ≥ 5.
  • Dihedral groups are all solvable.

A fun exercise is to see how many of these you can prove. (Some are much easier than others, and in fact the first one took 255 pages to prove!)

It turns out that most groups are solvable. In fact, the smallest non-solvable group has 60 elements (A5). Here’s the first few numbers in the sequence of sizes of non-solvable groups:

  • 60, 120, 168, 180, 240, 300, 336, 360, 420, 480, 504, 540, 600, 660, 672, 720, 780, 840, 900, 960, …

Determining the elements of Gal(E/ℚ)

Practically, the hardest part of the above 4-step process is determining what the Galois group is for a given polynomial. Usually one knows very little about the roots of the polynomial in question, and therefore also knows little about its splitting field. So how to determine the Galois group (the set of automorphisms of the splitting field that fix ℚ) without even knowing what the splitting field is? Well, it turns out that there are some really useful general tricks.

For one: Sometimes you can look closely at the polynomial and discover some simple algebraic relations that must hold among the roots. For instance, say p(x) = x4 + 2x2 – 1. Since there are only even powers of x in p(x), for any root r of p(x), -r must also be a root. This means that roots of p(x) can be written {A, -A, B, -B} for some reals A and B. And by the properties of homomorphisms, whichever root X that A is mapped to by a member of the Galois group, -A must also map to -X. Another way to say this is that the Galois group of any even polynomial is not the full group of permutations of the roots Sn (as the evenness imposes the above restriction on the allowed automorphisms).

A superb trick related to this is to look at the modular algebraic relations between roots. In general, you can take a complicated irreducible polynomial, and break it into smaller irreducible factors modulo a prime p. Dedekind’s theorem tells us that if there are no repeated factors, then the Galois group contains permutations of the roots with cycle type corresponding to the degrees of the factors.

Example 1

  • Let p(x) = x2 – 1.
  • The splitting field E of p(x) is clearly ℚ(√2) = {a + b√2 for all a, b in ℚ}.
  • Gal(E/ℚ) = the set of automorphisms on E that hold fixed all rationals.
  • If f is in Gal(E/ℚ), then f(√2)2 = f(√2 × √2) = f(2) = 2. So f(√2) = ±√2. So there are two automorphisms in Gal(E/ℚ): f(a + b√2) = a + b√2 and f(a + b√2) = a – b√2.
  • So Gal(E/ℚ) 2
  • Since 2 is solvable, so p(x) is solvable (as was obvious from the outset).

Example 2

Let p(x) = x5 + x4 + x + 3. Then we can write:

  1. p(x) = (x + 1)(x2 + x + 1)(x3 + x + 1) mod 2
  2. p(x) = x(x + 2)(x4 + x3 + 2x2 + 2x + 2) mod 3
  3. p(x) = (x + 3)2 (x4 + 4x3 + 3x2 + x + 2) mod 5
  4. p(x) = (x2 + 5x + 2)(x4 + 2x3 + 3x2 + 2x + 5) mod 7
  5. p(x) = (x + 6)(x5 + 5x4 + 4x3 + 9x2 + x + 6) mod 11
  6. p(x) = (x2 + 8x + 1)(x2 + 9x + 10)(x2 + 9x + 12) mod 13

From this and Dedekind’s theorem we can conclude:

  1. Gal(E/ℚ) contains a permutation of cycle type (1,2,3)
  2. Gal(E/ℚ) contains a permutation of cycle type (1,1,4): a 4-cycle
  3. Repeated factor of x+3, so we don’t learn anything
  4. Gal(E/ℚ) contains a permutation of cycle type (2,4)
  5. Gal(E/ℚ) contains a permutation of cycle type (1,5): a 5-cycle
  6. Gal(E/ℚ) contains a permutation of cycle type (2,2,2)

This automatically gives us a lot of insight into Gal(E/ℚ)! We have permutations of the form:

  1. (a b)(c d e)
  2. (a b c d)
  3. N/A
  4. (a b)(c d e f)
  5. (a b c d e)
  6. (a b)(c d)(e f)

We can go further and show that the only group of automorphisms that contains all of these types of elements is S5. (Every symmetric group can generated by a transposition and a cycle. We have a cycle by #5, and we can get a cycle by cubing #1.) And since S5 is not solvable (its commutator subgroup is A5, whose commutator subgroup is itself, and thus the derived series never terminates), the polynomial x5 + x4 + x + 3 is not solvable by radicals.

One final note: There is a fantastic computational algebra system designed to solve problems in high-level mathematics known as Magma. There’s an online Magma calculator here which is free to use. To use it to find the Galois group of the above polynomial (for example), you type:

P<x>:=PolynomialRing(Rationals());
GaloisGroup(x^5+x^4+x+3);

The Inverse Galois Problem

Now we get to the most tantalizing part!

Instead of starting with a polynomial p(x) and finding its Galois group, one could equally well start with a group G and ask whether G is the Galois group of any polynomial with rational coefficients! If so, we say that G is realizable, and is realized by p(x).

It turns out that whether or not every finite group is realizable turns out to be one of the big unsolved questions in mathematics. We’d like to prove that you can find a polynomial with coefficients from ℚ with Galois group G, for every finite group G, but in general it’s not known if this is possible! This paper lists all the non-abelian simple groups of cardinality 100 million that are currently not known to be realizable.

We do know the answer for some general categories of groups. Here are some things that are known, in order of decreasing strength.

  • All solvable groups are realizable.
  • All finite abelian groups are realizable.
  • All the symmetric and alternating groups are realizable.
  • All cyclic groups are realizable (special case of finite abelian groups being realizable)
  • 25 of the 26 sporadic groups are known to be realizable (the missing sporadic group is the Mathieu group M23, whose realizability remains an open question).
    • Amazingly, this implies the existence of a polynomial with rational coefficients whose Galois group is the Monster group!

x^5 – x – 1

One of the strange and wonderful facts of mathematics is the general unsolvability of quintic polynomials. If you haven’t heard about this, prepare to have your mind pleasantly blown. Most people memorize the quadratic equation, the general expression for the roots of any quadratic polynomial, in high school:

Screen Shot 2019-08-27 at 11.19.54 AM.png
Screen Shot 2019-08-27 at 11.14.31 AM.png

And even though very few people learn it in school, it turns out that we know a general solution to any cubic polynomial as well. It’s a little cumbersome to write, but just like the quadratic equation, it expresses the zeroes of any cubic polynomial as a function of its coefficients.

ax3 + bx2 + cx + d = 0
x = (some big complicated function of a, b, c, and d)

And in fact we have a general solution for any quartic polynomial as well, even more cumbersome.

ax4 + bx3 + cx2 + dx + e = 0
x = (some even bigger and more complicated function of a, b, c, d, and e)

Mathematicians struggled for a long time to find a general solution for the roots of quintic polynomials (ax5 + bx4 + cx3 + dx2 + ex + f = 0). Eventually, some mathematicians began to suspect that no such general solution existed. And then in the first few decades of the 1800s, a few mathematicians put out proofs to that effect. The most complete of these proofs was provided by Évariste Galois before he died in a duel at the age of 20 (we really need a big Hollywood movie about Galois’ life).

I recently found a fantastic 15-minute video sketching the proof of the unsolvability of the quintic. The approach taken in this proof is different from any of the original proofs, and it’s much simpler in a bunch of ways.

Here’s the video:

And briefly, here’s a summary of the proof:

  1. The roots of any quintic polynomial obviously depend on the coefficients of the polynomial.
  2. If you imagine taking a particular quintic polynomial (ax5 + bx4 + cx3 + dx2 + ex + f), and making small continuous changes in the coefficients (a, b, c, d, e, f), then the set of zeroes of this polynomial should make similar continuous changes.
  3. If you move each coefficient in a loop in the complex plane, ending at the same value it started at, then you should end up with the same set of zeroes as you started with (the same set, by the way, which doesn’t guarantee that each solution ends at the same place it started).
  4. However, moving each coefficient in a loop in the complex plane sometimes ends up with the solutions switching places. (Illustration below)
  5. For any “ordinary” function of the coefficients (a function that can be written using a finite number of rational numbers and the symbols +, -, ×, /, exp(), log(), √, and so on), you can find loops that don’t cause the solutions to switch places.
  6. But for some quintic polynomials, you can always finds one of these loops that does cause the roots to switch places.
  7. So there are quintic polynomials whose roots cannot be written as any ordinary function of the coefficients (since you can find a loop that permutes the solutions of the quintic but don’t permute the values of any ordinary function).

There are obviously holes in this proof that need to be filled. (1) through (3) should be pretty self-explanatory. And (4) can be illustrated by thinking about functions that involve square roots… on the complex plane, taking a square root of an expression means halving the angle to the expression. So rotating an expression around the complex plane by 2π only ends up rotating its square root by π (and thus bringing it to a different point). Observe:

Square Roots of 2
Red dots are the square roots of the white dot, which starts and ends at 2.

And in general, a function involving an nth root of some ordinary expression will take n rotations around the complex plane to return each solution to its starting point. Take a look for n = 3:

Cube Roots of 2
Red dots are the cube roots of the white dot, which starts and ends at 2.

For (5), you can prove this by using commutator loops (which are formed from two smaller loops L1 and L2 by putting them together as follows: L1 L2 L1-1 L2-1). Ultimately, this works for expressions with roots in them because solutions switch places exactly when you circle the origin of the complex plane, and commutators circle the origin a net zero times. For expressions with one nested root, you actually need to use commutators of commutators (first to return the inner root to its starting value, and then to return the outer root to its starting value). And with two nested roots, you use commutators of commutators of commutators. And so on. In this way, you can construct a loop in the complex plane for any expression you construct from roots and ordinary functions that will bring all values back to their starting points.

And finally, (6) is a result of the structure of the permutation group on five elements (S5). If you look at all commutators of permutations of five elements, you are looking at the commutator subgroup of S5, which is A5 (the set of even permutations of five elements). And the commutator subgroup of A5 is just A5 itself. This means that you can find always find some commutator, or some commutator of commutators, or some commutator of commutators of commutators, (and so on) that is not equal to the identity permutation! (After all, if all commutators of commutators of commutators ended up just being the identity permutation, then the group of commutators of commutators of commutators would just be the trivial group {e}. But we already know that the group of commutators of commutators of commutators is just the commutator subgroup of the commutator subgroup of the commutator subgroup of S5. I.e. the commutator subgroup of A5, which is A5.

And that’s the proof! You can see the whole thing sketched out in much more detail here.

Now, notice that the conclusion was that there are quintic polynomials whose roots cannot be written as any ordinary function of the coefficients. Not that all quintic polynomials’ roots can’t be written as such. Obviously, there are some quintics whose solutions can be written easily. For example…

x5 – 1 = 0
Solutions: x = e2πik/5 for k = 0, 1, 2, 3, 4

But there also exist many quintic polynomials whose roots just have no way of being finitely expressed through ordinary functions! An example of this is the real root of x5 – x – 1.

x5x – 1 = 0
at x ≈ 1.167304…

Screen Shot 2019-08-27 at 11.24.41 AM
Plot of x5 – x – 1

There is no way to write this number using a finite number of the symbols +, -, ×, /, exp(), log(), √, sin(), cos(), and rational numbers. And this is the case even though this number is an algebraic number! I don’t know if there’s a name for this category of “algebraic numbers that cannot be explicitly expressed using ordinary functions” but there should be.

Of course, you can define a function into existence that just returns the solutions of the quintic (which is what the Bring radical allows one to do). And you can also write the solution with ordinary functions if you’re allowed to use an infinity of them. For example, you can use a repeated fifth root:

x5x – 1 = 0
x5 = x + 1
x = (1 + x)1/5
x = (1 + (1 + x)1/5)1/5
x = (1 + (1 + (1 + x)1/5)1/5)1/5
x = (1 + (1 + (1 + (…))1/5)1/5)1/5

But while an infinity of symbols suffices to express this 1.167304…, no finite number cuts it!

Group Theory: The Mathematics of Symmetry?

Often you’ll hear people describe group theory as being “the mathematics of symmetry.” If you’ve been exposed to a little bit of group theory, you might find this a little mystifying. While there are some subsets of group theory that explicitly pertain to symmetry (in particular, the dihedral groups, representing the symmetries of regular polygons), there are many more groups that seem to have nothing to do with symmetry (the rational numbers, for example, or the dicyclic groups). Much of group theory involves proving theorems about the structure of groups in abstract, and many of these theorems again seem to tell us little to nothing about symmetry (what does “for every prime p that divides the size of a group G, there is a subgroup of G with order p” mean as a statement about the nature of symmetry?). So what’s up? Why this identification between groups and symmetry?

Well, I think we can start with asking what we even mean by ‘symmetry’. The first thing I think of when I hear the word ‘symmetry’ is geometric symmetry – the rotations, translations, and reflections that hold fixed a given shape. This is indeed one big part of symmetry, but not all of it. Symmetry is huge in physics as well, and the meaning in this context is rarely geometric symmetry. Instead, ‘symmetry’ finds its deepest application in the notion of symmetries of nature, meaning transformations that hold fixed the form of the laws of nature. For instance, the behavior of physical systems would be identical if we were to uniformly “slide” the universe in one direction by some fixed amount. So the laws of nature must be invariant with respect to this type of uniform translation, and we say that uniform spatial translation is a symmetry of nature. But the behavior of physical systems would be affected if we, say, began uniformly accelerating all of space in some direction. So uniform acceleration is not a symmetry of nature, and the laws of nature will be sensitive to these types of transformations. In this context, symmetry is more or less synonymous with invariance.

What’s the similarity between these two senses of symmetry? Invariance really is the key. Geometric symmetries are transformations that hold fixed certain geometric properties; physical symmetries are transformations that hold fixed certain physical properties. Fundamentally, symmetries are transformations that hold fixed some quantity. When you choose strange abstract qualities, your symmetries can get pretty weird (like the invariance of magnetic fields with respect to addition of curl-less vector fields to the magnetic vector potential, one of the two gauge symmetries of Maxwell’s equations). But they are symmetries nonetheless!

Now, how does this expansive concept of symmetries relate to groups? Well, looking at the group axioms, it becomes fairly obvious that they do actually fit pretty well with the concept. Let’s think of the elements of a group as transformations one could perform upon a system, and the group operation as being ‘transformation composition’ for successive transformations. And finally, we think of the group as being a complete set of symmetries. Now we’ll look at what the group axioms say, one at a time.

  1. Closure
    For any two elements a, b in G, a•b is in G.
    – Translation: If two transformations each individually hold fix some quantity, then doing the two transformations successively will also hold fixed the quantity.
  2. Associativity
    For any a, b, c in G, (a•b)•c = a•(b•c)
    – Translation: The order in which transformations are applied to an object is all that determines the nature of the net transformation. The notion of parentheses doesn’t make sense in this context, there is only the left-to-right order of the application of transformations.
  3. Identity
    There is an element e in G such that for any a in G, a•e = e•a = a.
    – Translation: Doing nothing at all holds fixed whatever quantity you choose.
  4. Inverses
    For any a in G, there is an element a’ in G such that a•a’ = e
    – Translation: Any transformation that holds fixed some quantity will also hold fixed that quantity if done in reverse.

Put together, we see that it makes sense to think about a group as the complete set of reversible transformations that hold fixed a certain quantity. (To be clear, this indicates that any such set of symmetries can be treated as a group, not necessarily that any group can be regarded as a set of symmetries. However, this does turn out to be the case! And in fact, any group whatsoever can be considered as a set of geometric symmetries of some abstract object!)

“Complete” needs to be fleshed out a bit more. When talking about a set of transformations, we need to be picking from a well-defined starting set of ‘allowed transformations’. For example, sets of geometric symmetries of objects are usually drawn from the set of all isometries (transformations that keep all distances the same, i.e. rotations, reflections, and translations). Instead of this, we could also consider the allowed transformations to be restricted to the set of all rotations, in which case we’d just be looking at the set of all rotations that don’t change the object. But in general, we are not allowed to talk about the set of possible transformations as just being some subset of the set of all rotations. Why? Because we require that the set of allowed transformations be closed. If a certain rotation is in consideration as an allowed transformation of an object, then so must be twice that rotation, thrice it, and so on. And so must be the reverse of that rotation. Not all sets of rotations satisfy this constraint!

The general statement is that the set of allowed transformations from which we draw our set of symmetries must be closed. And if it is, then the resulting set of symmetries is well described as some group!

So there is a fairly simple sense in which group theory is the study of symmetries. There are the obvious geometric symmetries that I already mentioned (the dihedral groups). But now we can think further about what other familiar groups are actually symmetries of. (, +) is a group, so what are the integers symmetries of? There’s actually a few answers that work here. One is that is the set of translational symmetries of an infinite line of evenly spaced dots. Translating to the right corresponds to positive numbers, to the left is negatives, and no translation at all is 0. What about (n, +)? It is the set of rotational symmetries of a circle with n dots placed evenly along its perimeter. I encourage you to try to think about other examples of groups (symmetric and alternating groups, ℚ, and so on), and how to think about them as symmetry groups.

This way of thinking about groups is more than just visually appealing, it actually helps clarify some otherwise opaque results in group theory. For example, why is the stabilizer of a group action always a subgroup? Well, the group action defines a set of allowed transformations, and the stabilizer of x is exactly the set of transformations that hold x fixed! So we have an invariant quantity x, and thus it is perfectly reasonable that the stabilizer should be a group.

Thinking of groups as sets of symmetries also allows us a nice physical interpretation of the idea of conjugacy classes, a hugely important tool in group theory. Remember, the conjugacy class of an element x in G is just the set of all conjugates of x; Cl(x) = {gxg-1 for all g in G}. This ABA-1 pattern might be familiar to you if you’ve studied a little bit of matrix representations of coordinate systems in physics. In this context, ABA-1 just represents a coordinate transformation of B! First we apply some transformation A that changes our reference frame, then we apply B, and then we undo A to come back to the original reference frame. The result is basically that we get a transformation that are just like B, but in a different reference frame! So it makes perfect sense to think of the conjugacy class of an element x as just the set of all elements of the same “type” as x.

For a dihedral group, we get one conjugacy class for the identity, another for all rotations, and another for all flips. Why? Well every flip can be represented as any other flip in a rotated reference frame! And similarly, every rotation can be represented as any other rotation in a flipped reference frame. And of course, the do-nothing transformation will look the same in all reference frames, which is why the identity has its own conjugacy class. And for another example, you can see some great visualizations of the conjugacy class breakdown of the cube’s symmetry group here, under Examples at the top. In general, looking at conjugacy classes is a good way to get a sense of the natural breakdown of categories in your set of symmetries.

Now, the famous theorems about group structure (Lagrange, Cauchy, Sylow) all have intriguing interpretations as applied to symmetries. Why must it be that any object with a prime number of symmetries cannot have any non-trivial smaller complete sets of symmetries? And doesn’t it seem totally non-obvious that an object that has a complete set of symmetries whose size is divisible by 3, say, must also have a smaller complete set of symmetries (one drawn from a smaller closed set of allowed transformations) whose size is 3? But it’s true! Think about the symmetries of a regular triangle. There are six of them, and correspondingly there’s a closed set of symmetries of size 3, namely, the do-nothing transformation and the two rotations! And the same thing applies for any symmetry group of size 3N.

Furthermore, Sylow’s theorems give us really strong constraints on the number of subgroups of various sizes, none of which seem at all intuitive. Why must it be that if an object has 9N symmetries for some N that’s not divisible by 3, then N must be divisible by the number of smaller complete sets of symmetries?? Thinking about and trying to make intuitive sense of these connections is quite mind-boggling to me.

Symmetries of Platonic Solids

Definition: A polyhedron is regular if all its faces are the same regular polygon, and all angles are equal.

It turns out that there are only five Platonic solids (convex regular polyhedra).

Here’s a quick proof:

  • For a regular n-gon, the sum of its interior angles is (n – 2)π.
  • So the angle at any one of its corners must be (n – 2) π/n.
  • Now, suppose we have r faces (each a regular n-gon) meeting at every vertex.
  • Since the meeting point for each face is a corner with angle (n – 2)π/n, we have that the sum of the angles is r (n – 2) π/n.
  • And since the polyhedron is convex we must have r (n – 2) π/n < 2π.
  • This can be rewritten as (r – 2)(n – 2) < 4.
  • The only positive integers satisfying this are (nr) = (3, 3), (4, 3), (3, 4), (5, 3), (3, 5).
  • These are our five Platonic solids!

(3,3): Tetrahedron

File:Tetrahedron.gif

Symmetry group: S4

(4,3): Cube

File:Hexahedron.gif

Symmetry group: S4 × 2

(3,4): Octahedron

File:Octahedron.gif

Symmetry group: S4 × 2

(5,3): Dodecahedron

File:Dodecahedron.gif

Symmetry group: A5 × 2

(3,5): Icosahedron

File:Icosahedron.gif

Symmetry group: A5 × 2

— — —

Here’s one more thing to notice; when looking at small dihedral groups, we saw that Dih3 (the symmetry group for a regular triangle) was isomorphic to S3 (the set of permutations of three items). We also saw that this isomorphism did not hold for higher order dihedral groups. In other words, the only regular polygon whose symmetry group was some symmetric group was the triangle.

Now we see that the tetrahedron (which we might consider to be the closest thing to a three-dimensional version of a regular triangle) has as its symmetry group S4. In other words, we get to S4 not by moving to polygons with more sides, but by moving to a higher dimension!

This may bring a conjecture to our minds: perhaps in general, moving up dimensions is the way to get regular geometric shapes that are isomorphic to higher order symmetric groups. That is, perhaps the symmetry group of a regular “triangle-like” objects in n dimensions is isomorphic to Sn. So this would predict that there is some 4D version of the 2D triangle and 3D tetrahedron whose symmetry group is S5. And guess what; this turns out to be right!

The higher dimensional generalization of a triangle and tetrahedron is called a n-simplex. And the symmetry group for each of these simplexes is in fact a symmetric group! So all symmetric groups can be considered to be descriptions of the set of symmetries of some possibly very high-dimensional tetrahedron! This is a fun little piece of trivia that I don’t know quite what to make of.

Some Curious Group Isomorphisms

Curiosity Number One

The additive group of polynomials with integer coefficients is isomorphic to the multiplicative group of positive rational numbers: ([x], +) (ℚ+, ).

Any element of [x] can be written like a0 + a1x + a2x2 + … + anxn, where all coefficients a0, a1, …, an are integers. Consider the following mapping: φ: (a0 + a1x + a2x2 + … + anxn) (p0a0p1a1pnan), where pk refers to the kth prime number. Now, this is a homomorphism from [x] to ℚ+ because φ(p(x) + q(x)) = φ(p(x))φ(q(x)). You can also easily show that it is onto and one-to-one, using the uniqueness of prime factorization. Therefore φ is an isomorphism!

I think this is a good example to test the intuitive notion that once one “knows” a group, one also knows all groups that it is isomorphic to. It’s often useful to think of isomorphic groups as being like one book written in two different languages, with the isomorphism being the translation dictionary. But the non-obviousness of the isomorphism here makes it feel like the two groups in fact have a considerably different internal structure.

Curiosity Number Two

Your friend comes up to you and tells you, “I’m thinking of some finite group. All I’ll tell you is that it has at least two order-2 elements. I’ll give you $20 if you can tell me what the group that’s generated by these two elements is (up to isomorphism)!“

What do you think your chances are of getting this $20 reward? It seems intuitively like your chances are probably pretty dismal… there are probably hundreds or maybe even an infinity of groups that match her description.

But it turns out that you can say exactly what group your friend is thinking of, just from this information!

Call the group G. G is finite and has two order-2 elements, which we’ll call a and b. So we want to find out what <a,b> is. Define r = ab and n = |r|.  Then ara = a(ab)a = (aa)ba = ba = (ab)-1 = r-1 = rn-1. So ara = rn-1, or in other words ar = rn-1a. Also, since b = ar, we can write the group we’re looking for either as <a,b> or as <a,r> (these two are equal).

Thus the group we’re looking for can be described as follows:

<a, r | a2 = rn = e, ar = rn-1a>.

Look familiar? This is just the dihedral group of size 2n; it is the group of symmetries of a regular n-gon, where a is a flip and r is a rotation!

So if G is a finite group with at least two order-2 elements a and b, then <a, b> Dih|ab|.

This serves to explain the ubiquity and importance of the dihedral groups! Any time a group contains more than one order-two element, it must have a dihedral group of some order as a subgroup.

Curiosity Number Three

If K  H G, then H/K G/K and (G/K) / (H/K) ≅ G/H.

(G/K) / (H/K) is a pretty wild object; it is a group of cosets, whose representatives are themselves cosets. But this theorem allows us to treat it as a much less exotic object, an ordinary quotient group with representatives as elements from G.

I don’t have much else to say about this one, besides that I love how the operation of forming a quotient group behaves so much like ordinary division!