Strong Induction Proofs of Cauchy’s Theorem and Sylow’s First Theorem

Here are three really fun little proofs of some of the strongest results in group theory, all of which use very similar methods (strong induction and the class equation) in subtly different ways. They require a little bit of background (only some of which I’ve provided), and you should not expect to grasp it with a quick read-through. I encourage you to read through slowly and carefully, making sure you understand each step of the proofs. I promise it will feel rewarding once you’re done!

Background

These proofs mostly rely on the Class Equation. I’ll spend a little bit of time showing where it comes from. (You can skip this section if you already know or don’t care where it comes from.)

Partition Equation: Suppose G acts on X. Then the orbits of this action partition X. That is, the orbits of any two elements either share all their elements or none of them, and every element in X is in some orbit. So |X| = ∑|orb(x)|, where the sum is taken over representatives from each orbit.

Orbit Stabilizer Theorem: For any x in X, |orb(x)||stab(x)| = |G|. Here stab(x) refers to the subset of G that fixes x (“stabilizes” it) under the group action. So stab(x) = {g in G : g•x = x}.

Stabilizers are Subgroups: The stabilizer of any x in X is a subgroup of G.

Now we look at the special case of G acting on itself by conjugation. I.e. we define g • x = gxg-1. The orbit of any element x is its conjugacy class Cl(x), and stab(x) is x’s centralizer C(x). So the partition equation for the conjugacy action is |G| = ∑|Cl(x)|, where the sum is taken over representatives of conjugacy classes. And the Orbit Stabilizer Theorem becomes |Cl(x)|⋅|C(x)| = |G|.

Cl(x) = {gxg-1 for all g ∈ G}
C(x) = {g in G such that gx = xg}

 You can easily show that for any central element in G (that is, for any element that commutes with every other element in G), its conjugacy class just contains itself. This means that |Z(G)| terms in this sum are equal to 1. So we can rewrite our equation, taking this into account. This gives us the Class Equation.

Class Equation: |G| = |Z(G)| + ∑|Cl(x)|, where the sum is taken over non-central representatives of conjugacy classes.

Some notes about this equation:

  • Z(G) is a subgroup of G, so |Z(G)| ≥ 1.
  • Since the terms in the sum are by definition sizes of conjugacy classes of non-central elements, none of them have size 1. That is, |Cl(x)| > 1 for every non-central x.
  • All terms divide |G|. This follows for Z(G) by Lagrange’s theorem (because it’s a subgroup). And it must be true for each |Cl(x)| by the Orbit-Stabilizer Theorem.

That’s all we need! We’re ready to prove Cauchy’s Theorem and Sylow’s First Theorem!

Proof of Cauchy’s Theorem

Cauchy’s theorem: G has a subgroup of size p for every prime p that divides |G|.

Part 1: First we show that Cauchy’s theorem holds for all abelian groups. This can be proven by strong induction. The base case is |G| = p, which trivially has a subgroup of size p (itself). Now we take a look at any larger group G such that p divides |G|. The strong inductive hypothesis is that Cauchy’s theorem holds for all abelian groups of size less than |G|.

Now, take any non-identity element g in G. Let H = <g>.

  • Case 1: Suppose p divides |H|. Then g|H|/p is an element of order p.
  • Case 2: Suppose p doesn’t divide |H|. Then p must divide |G/H| (since |G| = |H|⋅|G/H|). Since G is an abelian group, G/H is also an abelian group. And since we chose g to be non-identity, |H| > 1, so |G/H| < |G|. Thus by the inductive hypothesis, G/H contains an element xH of order p. But this implies that |x| is a multiple of p. After all, xH is order p, and (xH)|x| = x|x|H = eH = H. So x|x|/p is an element of order p. Done!

Now that we’ve proven Cauchy’s theorem for all abelian groups, we go on to prove it for all groups.

Part 2: We use strong induction again. The base case is |G| = p, which is trivial. Now, suppose G is a group such that |G| is divisible by prime p. The strong induction hypothesis is that Cauchy’s theorem holds for all groups of size less than |G|.

Let’s examine the Class Equation: |G| = |Z(G)| + ∑|Cl(x)|.

  • Case 1: Suppose that for some x, |Cl(x)| is not divisible by p. Then for this x, |C(x)| must be divisible by p (since |Cl(x)||C(x)| = |G|). Since centralizers are subgroups, C(x) is a subgroup with size divisible by p. Furthermore, since |Cl(x)| > 1, |C(x)| < |G|. So by the induction hypothesis, C(x) has a subgroup of size p.
  • Case 2: For all x, |Cl(x)| is divisible by p. But so is |G|. So |Z(G)| must be divisible by p as well. But Z(G) is abelian, so it must have a subgroup of size p (by Part 1).

And we’re done!!

With this result in hand, we can go even deeper and prove Sylow’s First Theorem. This one is slightly more involved, so take a breather if you need one…

… And now let’s dive into it!

Proof of Sylow’s First Theorem

Here’s Sylow’s First Theorem: For every prime power in the factorization of |G|, there is a subgroup of that size. In other words, if G is a finite group such that pn divides |G| for some prime p, then G has a subgroup of order pn.

Again, we proceed by strong induction. Our base case is |G| = p, which is trivial. Now we look at a larger group G, and assume that the theorem is true for all groups of size less than |G|.

Class Equation: |G| = |Z(G)| + ∑|Cl(x)|.

  • Case 1: Suppose at least one of the terms in the sum ∑|Cl(x)| isn’t divisible by p. Then for this x, |C(x)| must be divisible by pn (since |Cl(x)||C(x)| = |G|). Since centralizers are subgroups, we have a subgroup with size divisible by pn. Furthermore, since each |Cl(x)| > 1, |C(x)| < |G|. So by the induction hypothesis, C(x) has a subgroup of size pn.
  • Case 2: Each |Cl(x)| is divisible by p, as is |G|. So |Z(G)| must be divisible by p. So by Cauchy, Z(G) has an element of order p, say g. Define N = <g>. N is normal, since it’s in the center. So G/N is a group. |G/N| = |G|/|N| = |G|/p, so by induction, G/N has a subgroup H of size pn-1. Now, consider the canonical homomorphism φ: G → G/N defined by φ(g) = gN. It turns out that φ-1(H), the inverse image of H, is a subgroup of G with size pn, just what we were looking for!
    • Why is |φ-1(H)| = pn? Well, each element of G/N is a coset, and all cosets have the same size, namely |N|. This means that |N| elements are mapped to any given coset by φ. H is a subset of G/N, so it contains |H| cosets, each of which is mapped to by |N| elements. Since cosets are disjoint, there is no overlap. So φ-1(H) = |N|⋅|H| = p⋅pn-1 = pn.

(Proofs can be found in Judson’s Abstract Algebra)

Table of Small Finite Groups

Order 1: 1 Group

Abelian Cyclic Simple
GroupDiagramMiniC1.svg 1 = S1 = A2 Y Y Y

Order 2: 1 Group

Abelian Cyclic Simple
GroupDiagramMiniC2.svg ℤ2 = S2 Y Y Y

Order 3: 1 Group

Abelian Cyclic Simple
GroupDiagramMiniC3.svg ℤ3 = A3 Y Y Y

Order 4: 2 Groups

Abelian Cyclic Simple
GroupDiagramMiniC4.svg ℤ4 Y Y
GroupDiagramMiniD4.svg ℤ22 = K4 Y

Order 5: 1 Group

Abelian Cyclic Simple
GroupDiagramMiniC5.svg ℤ5 Y Y Y

Order 6: 2 Groups

Abelian Cyclic Simple
GroupDiagramMiniC6.svg ℤ6 = 3 × 2 Y Y
GroupDiagramMiniD6.svg Dih3 = S3 

Order 7: 1 Group

Abelian Cyclic Simple
GroupDiagramMiniC7.svg ℤ7 Y Y Y

Order 8: 5 Groups

Abelian Cyclic Simple
GroupDiagramMiniC8.svg ℤ8 Y Y
GroupDiagramMiniC2C4.svg ℤ4 × 2 Y
GroupDiagramMiniC2x3.svg ℤ23 Y
GroupDiagramMiniD8.svg Dih4
GroupDiagramMiniQ8.svg Dic2 = Q8

Order 9: 2 Groups

Abelian Cyclic Simple
GroupDiagramMiniC9.svg ℤ9 Y Y
GroupDiagramMiniC3x2.svg ℤ3 × 3 Y

Order 10: 2 Groups

Abelian Cyclic Simple
GroupDiagramMiniC10.svg ℤ10 = ℤ5 × 2 Y Y
GroupDiagramMiniD10.svg Dih5

Order 11: 1 Group

Abelian Cyclic Simple
GroupDiagramMiniC11.svg ℤ11 Y Y Y

Order 12: 5 Groups

Abelian Cyclic Simple
GroupDiagramMiniC12.svg ℤ12 = ℤ4 × 3 Y Y
GroupDiagramMiniC2C6.svg ℤ6 × 2 = ℤ3 × 22 Y
GroupDiagramMiniX12.svg Dic3
GroupDiagramMiniA4.svg A4
GroupDiagramMiniD12.svg Dih6 = Dih3 × 2

Order 13: 1 Group

Abelian Cyclic Simple
GroupDiagramMiniC13.svg ℤ13 Y Y Y

Order 14: 2 Groups

Abelian Cyclic Simple
GroupDiagramMiniC14.svg ℤ14 = ℤ7 × 2 Y Y
GroupDiagramMiniD14.svg Dih7

Order 15: 1 Group

Abelian Cyclic Simple
GroupDiagramMiniC15.svg ℤ15 = ℤ5 × 3 Y Y

Order 16: 14 Groups

Abelian Cyclic Simple
GroupDiagramMiniC16.svg ℤ16 Y Y
GroupDiagramC2C8.svg ℤ8 × 2 Y
GroupDiagramMiniC4x2.svg ℤ42 Y
GroupDiagramMiniC2x2C4.svg ℤ4 × 22 Y
GroupDiagramMiniC2x4.svg ℤ24 Y
GroupDiagramMiniD16.svg Dih8
GroupDiagramMiniQ16.svg Dic4
GroupDiagramMiniC2D8.svg Dih4 × 2
GroupDiagramMiniC2Q8.svg Dic2 × 2
GroupDiagramMiniG44.svg K4 ⋊ 4
GroupDiagramMinix3.svg ℤ4 ⋊ 4
GroupDiagramMOD16.svg ℤ8 ⋊ 2
GroupDiagramMiniC2x2C4.svg (4 × 2) ⋊ 2
GroupDiagramMiniQH16.svg QD16

Order 17: 1 Group

Abelian Cyclic Simple
GroupDiagramMiniC17.svg ℤ17 Y Y Y

Order 18: 5 Groups

Abelian Cyclic Simple
GroupDiagramMiniC18.svg ℤ18 = 9 × 2 Y Y
GroupDiagramMiniC3C6.png ℤ6 × 3 = 32 × 2 Y
GroupDiagramMiniD18.png Dih9
GroupDiagramMiniC3D6.png S3 × 3
GroupDiagramMiniG18-4.png (3 × 3) ⋊ 2

Order 19: 1 Group

Abelian Cyclic Simple
GroupDiagramMiniC19.svg ℤ19 Y Y Y

Order 20: 5 Groups

Abelian Cyclic Simple
GroupDiagramMiniC20.svg ℤ20 = 5 × 4 Y Y
GroupDiagramMiniC2C10.png ℤ10 × 2 = 5 × 3 × 2 Y
GroupDiagramMiniD20.png Dih10 = Dih5 × 2
GroupDiagramMiniQ20.png Dic5
GroupDiagramMiniC5semiprodC4.png ℤ5 ⋊ 4

Order 21: 2 Groups

Abelian Cyclic Simple
GroupDiagramMiniC21.svg ℤ20 = 7 × 3 Y Y
Frob21 cycle graph.svg ℤ7 ⋊ 3

Order 22: 2 Groups

Abelian Cyclic Simple
GroupDiagramMiniC22.svg ℤ22 = 11 × 2 Y Y
 Dih11

Order 23: 1 Group

Abelian Cyclic Simple
GroupDiagramMiniC23.svg ℤ23 Y Y Y

Order 24: 14 Groups

Abelian Cyclic Simple
GroupDiagramMiniC24.svg ℤ24 = 8 × 3 Y Y
12 × 2 = 6 × 4 Y
6 × 22 = 3 × 23 Y
Dih12
Dih6 × 2
Dih4 × 3
Symmetric group 4; cycle graph.svg S4
S3 × 4
GroupDiagramMiniA4xC2.png A4 × 2
GroupDiagramMiniQ24.png Dic6
Dic3 × 2
Dic2 × 3
3 ⋊ 8
3 ⋊ Dih4
SL(2,3); Cycle graph.svg SL(2,3) = 2T

Order 25: 2 Groups

Abelian Cyclic Simple
25 Y Y
52 Y

Order 26: 2 Groups

Abelian Cyclic Simple
26 = 13 × 2 Y Y
Dih13

Order 27: 5 Groups

Abelian Cyclic Simple
27 Y Y
9 × 3 Y
33 Y
32 ⋊ 3
9 ⋊ 3

Order 28: 3 Groups

Abelian Cyclic Simple
28 = 7 × 4 Y Y
14 × 2 = 7 × 22 Y
Dih14
Z7 ⋊ Z4

Order 29: 1 Group

Abelian Cyclic Simple
29 Y Y Y

Order 30: 4 Groups

Abelian Cyclic Simple
30 = 15 × 2 = 10 × 3 = 6 × 5 Y Y
Dih15
Dih5 × 3
S3 × 5

Order 31: 1 Group

(Order p)

Abelian Cyclic Simple
31 Y Y Y

Order 32: 51 Groups

I’m not going to list these all here, but seven of them are abelian.

(All cycle graphs stolen from Wikipedia)

Notable Groups

A5: Smallest simple group that isn’t cyclic. Order 60.

22: Smallest non-cyclic group.

Dih3: Smallest non-abelian group.

A4: First example where n divides |G| but there is no element of order n. (n = 6)

Z7 ⋊ Z3: Smallest non-abelian group of odd size.

Dih3: Smallest group with a normal subgroup that isn’t isomorphic to one of its subgroups.

General Patterns

1 group of order p
2 groups of order p2
5 groups of order p3
15 groups of order p4, for p > 2
2 groups of order pq for q-1 divisible by p
1 group of order pq for q-1 not divisible by p

Some Group Factoids

Groups of Size 2n

Most finite groups are size 2n. For example, over 99% of all groups of size 2000 or less are size 1024.

How to Decompose a Group Into a Direct Product

If H and K are subgroups of G with trivial intersection, HK = G, and all elements of H commute with all elements of K, then G is isomorphic to H × K.

Kernel of Homomorphism Is Normal Subgroup

If φ: G → G’ is a homomorphism, then ker(φ) = φ-1(eG’) is a normal subgroup of G.

A homomorphism is a function from one group G to another group G’ that maintains the group structure: φ(ab) = φ(a)•φ(b). It’s similar to an isomorphism, but is not required to be one-to-one or onto (so multiple elements of G can be mapped to the same element of G’). The kernel of φ is the set of elements in G that are sent by φ to the identity in G’.

Cayley’s Theorem

Every group of size n is a subgroup of Sn, the the set of all permutations of n items. In addition, Sn contains Sm for all m < n.

This means that if you want to all groups of size 2000 or less, you just need to look at the structure of S2000 (which has 2000! elements). In some sense, then, the study of finite groups is just the study of the structure of large permutation groups.

Group Presentation Undecidability

A group presentation looks like S|R, where S is a set of generators for the group, and R is a set of relations among the generators.

For example: a | an = e is the cyclic group of size n (Zn). Another example: r,f | rn = e, f2 = e, fr = rn-1f is the dihedral group of size 2n (the set of symmetries of a regular n-gon).

A finite presentation for a group is just a presentation where S and R are both finite. All finite groups have a finite presentation (easy proof: let S contain all the members of the group, and R be its Cayley table), as do some infinite groups (for example, a | ∅⟩, the group generated by a with no constraints).

Now, you can ask whether there exists an algorithm that takes as input a finite presentation, and figures out whether the group it represents has a certain property (like, is it the trivial group? Is it a finite group? Is it abelian? Is it simple? And so on.) It turns out that for each of these, the answer is no! In addition, the question of if two arbitrary words (e.g. for Dihn, frfrfr and rfrrf) represent the same element is undecidable. So is the question of if two finite presentations represent isomorphic groups.

In general, for pretty much any non-trivial property of a group (a property that isn’t shared by all groups), there is no general algorithm that decides whether a finite presentation has that property!

Group Theory Toolkit

The Toolkit

  1. The Group Axioms
    1. Closure: a,b ∈ G, a•b ∈ G
    2. Associativity: a,b,c ∈ G, (a•b)•c = a•(b•c)
    3. Identity: ∈ G s.t. ∈ G, e•a = a•e = a
    4. Inverses: ∈ G, a’ ∈ G s.t. a’•a = e
  2. Cancellation Rule
    (a•b = a•c) (b = c)
  3. Order Doesn’t Matter
    You can write a Cayley table with the elements in any order that’s convenient.
  4. Sudoku Principle
    Each row and column in the Cayley table for a group G contains every element in G exactly once.
  5. Lagrange’s Theorem
    H ≤ G |G| is divisible by |H|
  6. Elements Generate Subgroups
    ∈ G, <g> = {gk for all integer k} ≤ G.
    (|<g>| = n) (gn = e).
  7. Cauchy’s Theorem
    (|G| = pk for prime p) (∈ G s.t. gp = e)
  8. Classification of Finite Abelian Groups
    Every finite abelian group is isomorphic to Zn or direct products of Zn
  9. Orbit-Stabilizer Theorem
    |orb(x)||stab(x)| = |G|
  10. Partition Equation
    ∑|orb(x)| = |X|. The sum is taken over one representative for each orbit.
  11. Center is a Normal Subgroup
    Z(G), the center of G, is a normal subgroup. So G/Z is a group.
  12. Class Equation
    |G| = |Z| + ∑|Cl(x)|, where all terms divide |G| and each |Cl(x)| is neither 1 nor |G|. The sum is taken over non-central representatives of conjugacy classes.
  13. G/Z is Cyclic Implies G is Abelian
    G/Z is cyclic  G is abelian
  14. Size of Product of Subgroups
    |AB| = |A| |B| / |A B|
  15. Sylow’s First Theorem
    If |G| = pnk for prime p and k not divisible by p, then G has a subgroup of size pn.
  16. Sylow’s Third Theorem
    If |G| = pnk for prime p and k not divisible by p, then the number of subgroups of size pn must be 1 mod p, and must divide k.

Exercises

Example 1: Solving the group of size 3

Suppose |G| = 3. By the Identity Axiom (1.3), G has an identity element e. Call the other two elements a and b. So G = {e, a, b}. Let’s write out the Cayley table for the group:

e a b
e e a b
a a
b b

By the Cancellation Rule (2), a•a ≠ a. By Lagrange’s Theorem (5), a•a ≠ e, as then the set {e, a} would be a subgroup of size 2, which doesn’t divide 3. Therefore by Closure (1.1), a•a = b.

e a b
e e a b
a a b
b b

We can fill out the rest of the Cayley table by the Sudoku principle (4).

e a b
e e a b
a a b e
b b e a

Example 2: Solving the group of size 5

e a b c d
e e a b c d
a a
b b
c c
d d

If a•a = e, then {e,a} would be a subgroup of size 2. By Lagrange’s theorem, then, a•a ≠ e. Also, by the Sudoku principle, a•a ≠ a. Thus we can choose the order of the rows/columns such that a•a = b (3).

e a b c d
e e a b c d
a a b
b b
c c
d d

Now look at a•b = a•(a•a) = a3. If a3 = e, then <a> would be a subgroup of size 3 (by 6). 3 doesn’t divide 5, so a•b ≠ e (Lagrange’s theorem again). By the Sudoku principle, a•b also isn’t a or b. So we can arrange the Cayley table so that a•b = c. Since a•b = a3 =  b•a, b•a = c as well.

e a b c d
e e a b c d
a a b c
b b c
c c
d d

Now we can use the Sudoku principle to fill out the rest!

e a b c d
e e a b c d
a a b c d e
b b c d e a
c c d e a b
d d e a b c

Example 3: Solving all groups of prime order

Suppose |G| = p, where p is some prime number. By Lagrange’s theorem every subgroup of G has either size 1 (the trivial subgroup {e}) or size p (G itself). Since the set generated by any element is a subgroup, any element g in G must have the property that |<g>| = 1 or p.

Choose g to be a non-identity element. <g> includes e and g, so |<g>| ≥ 2. So |<g>| = p. So <g> = G. So G is a cyclic group (generated by one of its elements). This fully specifies every entry in the Cayley table (gn•gm = gn+m), so there is only one group of size p up to isomorphism.

Example 4: Solving size 6 groups

Suppose |G| = 6.

By Cauchy’s theorem (7), there exists at least one element of order 2 and at least one element of order 3. Name the order-2 element a and the order-3 b. Fill in the Cayley table accordingly:

e a b ? ? ?
e e a b
a a e
b b
?
?
?

Suppose b2 = a. Then (b2)2 = a2 = e, so b is order 4. But b is order 3. So b2 ≠ a. Also, b2 ≠ e for the same reason, and ≠ b by the Cancellation Rule . So b2 is something new, which we’ll put in our next column:

e a b b2 ? ?
e e a b b2
a a e
b b b2
b2 b2
?
?

b3 = e, so we can fill out b • b2 and b2 • b, as well as b2 • b2 = b3 • b = b

e a b b2 ? ?
e e a b b2
a a e
b b b2 e
b2 b2 e b
?
?

By the Sudoku principle, we can clearly see that a • b is none of e, a, b, or b2. So whatever it is, we’ll put it in the next column as “ab”:

e a b b2 ab ?
e e a b b2 ab
a a e ab
b b b2 e
b2 b2 e b
ab ab
?

Looking at a•b2, we can again see that it is none of the elements we’ve identified so far. So whatever it is, we’ll put it in the next column as “ab2”.

e a b b2 ab ab2
e e a b b2 ab ab2
a a e ab ab2
b b b2 e
b2 b2 e b
ab ab
ab2 ab2

Associativity (1.2) allows us to use the algebraic rules a2  = e and b3 = e to fill in a few more spots (like a•ab = a2b = b).

e a b b2 ab ab2
e e a b b2 ab ab2
a a e ab ab2 b b2
b b b2 e
b2 b2 e b
ab ab ab2 a
ab2 ab2 a ab

Now everything else in the table is determined by a single choice: should we assign b•a to ab, or to ab2? Each leads to a Cayley table consistent with the axioms, so we write them both out, using the Sudoku principle to finish each one entirely.

Group 1: (ba = ab)

e a b b2 ab ab2
e e a b b2 ab ab2
a a e ab ab2 b b2
b b ab b2 e ab2 a
b2 b2 ab2 e b a ab
ab ab b ab2 a b2 e
ab2 ab2 b2 a ab e b

Group 2: (ba = ab2)

e a b b2 ab ab2
e e a b b2 ab ab2
a a e ab ab2 b b2
b b ab2 b2 e a ab
b2 b2 ab e b ab2 a
ab ab b2 ab2 a e b
ab2 ab2 b a ab b2 e

These two groups are clearly not isomorphic, as one is commutative and the other not. This proves that there are two groups of size 6 up to isomorphism!

Example 5: Solving groups of prime order, again

Suppose |G| = p, where p is some prime number. The center of G, Z(G), is a subgroup of G (by 11). So by Lagrange’s theorem, |Z| = 1 or p.

By the Class Equation (12), |G| = |Z| + ∑|Cl(x)|. All terms divide p and each |Cl(x)| ≠ 1, so we have p = |Z| + ∑p. This is only possible if |Z| = 0 or there are no non-central elements. Z contains e, so |Z| ≠ 0. So there are no non-central elements. Thus G = Z, which is to say G is abelian.

By the Classification of Finite Abelian Groups (8), G is isomorphic to Zp.

Example 6: Prime power groups have non-trivial centers

Suppose |G| = pn, where p is some prime number and n is some positive integer. The Class Equation tells us that |G| = |Z| + ∑|Cl(x)|, where |Z| = 1, p, p2, …, or pn and each |Cl(x)| = p, p2, …, or pn-1. Taking the Class Equation modulo p we find that |Z| = 0 (mod p), so |Z| cannot be 1. So G has a non-trivial center!

Example 7: Solving groups of prime-squared order

Suppose |G| = p2, where p is some prime number. The Class Equation tells us that |G| = |Z| + ∑|Cl(x)|, where |Z| = 1, p, or p2 and each |Cl(x)| = p. So we have p2 = |Z| + ∑p. Taking this equation mod p we find that |Z| = 0 (mod p), so |Z| is non-trivial. (This is just a special case of Example 6 above.) So |Z| = p or p2.

Suppose |Z| = p. Then |G/Z| = p2/p = p. So G/Z is cyclic. But then G is abelian (by 13). So G = Z, which implies that |Z| = p2. Contradiction. So |Z| = p2, i.e. G is abelian.

By the classification of finite abelian groups we see that there are two possibilities: G is isomorphic to integers mod p2 or G is isomorphic to Zp x Zp. These are the only two groups (up to isomorphism) of size p2.

Example 8: Proving Cauchy’s Theorem

Suppose |G| = n, where n is divisible by some prime p.

Let X be the subset of p-tuples of elements from G whose product is e. That is, X = {(g1,g2,…,gp) Gp s.t. g1g2…gp = e}. Notice that the number of elements in X is |G|p-1 (there are |G| choices for each of the first p-1 elements, and then the final one is fully determined by the constraint). Let the integers mod p (Zp) act on X by the following rule: the action of 1 on an element of X is 1 • (g1,g2,…,gp) = (gp,g1,g2,…,gp-1). (From this we can deduce the action of all the other integers mod p.)

By the Orbit Stabilizer Theorem (9), the possible orbit sizes for this action must divide the size of the cyclic group. That is, for every x X, |orb(x)| = 1 or p. Let r be the number of orbits of size 1 and s be the number of orbits of size p. Then by the Partition Equation (10), we have r + sp = |X| = |G|p-1. By our starting assumption, we have that the right hand side is divisible by p. And since sp is divisible by p, r must also be divisible by p. That is, r = 0, or p, or 2p, and so on.

Notice that the p-tuple (e,e,…,e) is in X, and that its orbit size is 1. So r is at least 1. So r = p, or 2p, and so on. That means that there is at least one more element of X with orbit size 1. This element must look like (a,a,…,a) for some a G. And since it’s in X, it must satisfy our constraint, namely: ap = e! So there is at least one non-identity element of order p.

Example 9: Subgroups of Group of Size 20

Suppose |G| = 10 = 22*5

By Cauchy’s Theorem and Sylow’s First Theorem (15), we know there must be subgroups of size 2, 4, and 5. We can learn more about the number of subgroups of sizes 4 and 5 using Sylow’s Third Theorem (16).

If N is the number of subgroups of size 4, then N divides 5 (1 or 5), and N = 1 (mod 2). So N = 1 or 5.

If N is the number of subgroups of size 5, then N divides 4 (1, 2, 4), and N = 1 (mod 5). So N = 1.

Example 10: Subgroups of Group of Size p2q

Suppose |G| = p2q, for primes p and q where 2 < p < q and q-1 is not a multiple of p.

Then by (15) we have subgroups of size p2 and q. Again, we use (16) to learn more about these subgroups.

If N is the number of subgroups of size p2, then N divides q (1 or q), and N = 1 (mod p). q ≠ 1 (mod p), N cannot be q. So N = 1.

If N is the number of subgroups of size q, then N divides p2 (1, p, or p2), and N = 1 (mod q) (so N is 1, or q+1, or 2q+1, and so on). But p is smaller than q, so N can’t be p. And if N = p2, then p2 – 1 must be a multiple of q. But p2 – 1 = (p+1)(p-1). Both of these values are too small to contain factors of q. So their product cannot be a multiple of q. So N ≠ p2 either. So N = 1!

Example 11: Subgroups of Group of Size pqn

Suppose |G| = pqn, where p and q are primes and 2 < p < q.

Using Sylow III (16): If N is the number of subgroups of size qn, then N divides p (1 or p) and N = 1 (mod q). p is too small to be a multiple of q plus 1, So N = 1.

Example 12: Classifying Groups of Size 2p

Suppose |G| = 2p for some prime p > 2. Let’s use everything in our toolkit to fully classify the possibilities!

By Cauchy, we have elements of of order 2 and p. Let a be an element of order 2 and b an element of order p. Let A = <a> = {e, a} and B = <b> = {e, b, b2, …, bp-1}. A and B are both cyclic of different orders, so their intersection is trivial. So by (14), |AB| = |A| |B| / |A B| = 2p. So AB = G. This means we can write all the elements of G as follows:

e b b2 bp-1
e e b  b2  bp-1
a a ab ab2 abp-1

So G = {e, b, b2, …, bp-1, a, ab, ab2, …, abp-1}. Call N2 the number of subgroups of size 2 and Np the number of subgroups of size p. Applying Sylow III, it is easy to see that there are two possibilities: (N2 = 1 and Np = 1), or (N2 = p and Np = 1).

Case 1: N2 = p and Np = 1.

Since N2 = p and Np = 1, there are p elements of order 2 (one for each subgroup) and p-1 elements of order p. Adding in e, this accounts for all of G.

Order 1: e
Order p: b, b2, …, bp-1
Order 2: a, ab, ab2, …, abp-1 (the remaining members of G)

Now, since a and b are in G, so must be ba. And since G = AB, ba AB. It’s not e or a power of b (apply the Cancellation Rule), so ba must be abn for some n. This means that ba is order 2, so (ba)2 = e. Expanding and substituting, we get (ba)2 = (ba)(ba) = (ba)(abn) = ba2bn = bn+1 = e = bp.

So n = p – 1, or in other words, ba = abp-1. This actually allows us to fully fill out the rest of the Cayley table, as we can take any expression and move the ‘b’s all the way to the right, ending up with something that looks like anbm.  This is the dihedral group of degree p (the group of symmetries of a regular p-gon)!

Case 2: N2 = 1 and Np = 1.

Order 1: e
Order p: b, b2, …, bp-1
Order 2: a

No other elements can be order 1, 2, or p, so all other elements must be order 2p (think Lagrange’s theorem). For instance, (ab)2p = e and (ab)p ≠ e . You can show that this only works if ab = ba, which allows you to freely move ‘a’s and ‘b’s around in any expression. ((ab)2p = a2p b2p = (a2)p (bp)2 = e, and (ab)p = ap bp = ap = a ≠ e).

Again, we can use this to fill out the entire Cayley table. The group we get is isomorphic to Z2p (the integers mod 2p)!

So if |G| = 2p for some prime p > 2, then G is either the group of symmetries of a regular p-gon, or it’s the integers mod p. Fantastic! If you follow this whole line of reasoning, then I think that you have a good grasp of how to use the toolkit above.

The sweet spot on the structure spectrum

If you’ve looked into abstract algebra at all, you might be intimidated by the variety of strange-sounding names; things like like monoids, magmas, semigroups, loops, fields, and so on. What are all these names about?

In general, the procedure is as follows: Choose some number of sets, equipped with some number of binary operations. Then endow these operations with some general structural constraints (things like commutativity, associativity, and so on). The result is some abstract mathematical object that might be very interesting or just medium interesting, depending on the constraints that you’ve applied. The different subfields within abstract algebra are just the studies of what you get when you apply particular <sets, operations, constraints> triples! Let’s take a little tour of these subfields.

We start with the most basic. The simplest choice you could make for the sets and operations would be one set S with one binary operation • on that set, denoted as (S, •). The most primitive constraint that you can apply to this operation is closure, which is the condition that the output of the binary operation must always be an element of S. Common constraints added on top of this include:

  • Associativity (A): For all a, b, and c in S, (a • b) • c = a • (b • c)
  • Identity (E): there is an element e in S such that for all a in S, e • a = a
  • Invertibility (V): for all a in S, there is an element a’ in S such that a’ • a = e
  • Commutativity (C): for all a and b in S, a • b = b • a

Now we can consider the different types of mathematical objects that arise from different combinations of these constraints!

AEVC = Abelian group
AEV = Group
AEC = Commutative Monoid
AE = Monoid
A = Semigroup
Only closure = Magma

These are just a few examples, and this is all for the case of a single set with a single operation. Two of the biggest objects of study in abstract algebra are rings and fields, both of which involve one set and two binary operations, usually denoted as + and •. The constraints for rings are as follows:

(R, +, •)
(R,+) is an abelian group (AEVC)
(R,•) is a monoid (AE)
Distributivity (of multiplication with respect to addition): a • (b + c) = a•b + a•c

(Rings also have a slightly more bare-bones cousin called rngs, which are basically rings without a multiplicative identity.)

Fields involve even more structure than rings:

(F, +, •)
(F, +) is an abelian group (AEVC) (call its identity 0)
(F \ {0}, •) is an abelian group (AEVC)
Distributivity (of multiplication w.r.t. addition)

There are so many other mathematical objects besides these, each of which has its own field of study. When you survey these fields at a high level of abstraction, an interesting pattern becomes apparent.

Observation 1: More structured objects are simpler and easier to fully characterize.

Think about vector spaces. A vector space has two sets (call them F and V) and four binary operations (two on F x F, one on V x V, and one on F x V). F forms a field with its two operations, and V forms an abelian group with its operation. And there are a few more constraints on the final operation, ensuring that it is well-behaved in a bunch of ways. Overall, vector spaces are highly constrained in many ways. And as such, linear algebra (the study of vector spaces) is a completely solved field. There are pretty much no interesting new discoveries in linear algebra, because there are only a few interesting things you can say about such well-structured objects, and all such things have been quickly discovered. (A quick caveat: linear algebra finds all sorts of applications in more complicated domains of math, many of which have unsolved problems. I’m not considering these problems as part of understanding vector spaces. There are also more applied open questions about, say, how to most efficiently calculate the product of two n x n matrices, which I again do not consider to reflect a lack of understanding of the structure of vector spaces.) And by the way, this is not a knock on linear algebra! I love vector spaces. It’s just that I think there are even more interesting objects out there!

Take another example, fields. Fields are less structured than vectors in a bunch of ways, but all the same, we can fully characterize all finite fields with a few simple equations. The work required to do this is fairly trivial; you can prove the classification theorems in a couple of minutes with barely any background required. For instance, you can quickly prove that all finite fields have sizes that are powers of primes. And furthermore, all finite fields of the same size are isomorphic! So if you were to write out a function f(n) for the number of non-isomorphic fields of size n, it would just be 1 for every prime power and 0 for everything else.

Similarly, compare abelian groups to groups. Abelian groups are strictly more structured objects that groups, being that they are just groups that satisfy the additional constraint of commutativity. They are also significantly less structured than fields, being that they only have to satisfy the constraints for one operation and thus don’t have to worry about distributivity between two operations. It turns out that just like with fields, you can actually classify all finite abelian groups, and write out a (relatively simple) equation for the number of finite abelian groups of size n. This number is the product of the partition numbers of the exponents of each prime in the prime factorization of n. Proving this takes a lot more work than the classification of fields, and the function is clearly much more complicated than the corresponding function for fields. But it’s enormously simpler than classifying all finite groups!

As far as I’m aware, there is no known formula that takes in a number N and returns the number of non-isomorphic groups of size N. Although there are lots of little facts known about the values of this function for particular values of N (for example, for any prime p, f(p) = 1 and f(p^2) = 2), classification in general is enormously difficult. Here are the numbers of finite groups of order N, for small values of N starting at 1.

[1, 1, 1, 2, 1, 2, 1, 5, 2, 2, 1, 5, 1, 2, 1, 14, 1, 5, 1, 5, 2, 2, 1, 15, 2, 2, 5, 4, 1, 4, 1, 51, 1, 2, 1, 14, 1, 2, 2, 14, 1, 6, 1, 4, 2, 2, 1, 52, 2, 5, 1, 5, 1, 15, 2, 13, 2, 2, 1, 13, 1, 2, 4, 267]

(Fun exercise, try seeing how many of these you can prove for yourself!)

Here’s a plot of this sequence, to give you a sense of how complicated and strange it is:

Finite Groups.png

Furthermore, there are bizarre things like the sporadic groups with their stubborn resistance to categorization, and the king of the sporadic groups, the enormous Monster group. All this structure immediately vanishes as soon as you introduce commutativity, leaving you with a much less interesting object!

Observation 2: Highly unstructured objects also become easy to characterize. Think about magmas (M, •), whose only constraint is closure. How many magmas are there of size N? Well, for any two elements in M, there are N choices for the value of their product. So it’s just the number of ways of placing elements from M in N^2 spots, i.e. N^(N^2). Simple! The lack of structure here actually makes classification of the finite magmas extremely quick and easy.

Another example: Take a slightly more structured object, magmas with identity. This basically just fills in 2N – 1 of our slots (for all m, we get that e•m = m and m•e = m), so that our new classification is just N^(N^2 – 2N + 1). What if we add commutativity? Well, that just cuts our degrees of freedom roughly in half, since every time you choose the value of a•b, you have also chosen b•a! (It’s not exactly half, because of the elements that look like a•a).

But if you now add associativity, you end up with a commutative monoid. These objects are vastly more complicated and difficult to characterize. In general, associativity is probably the most interesting single property you can give to an operation (at least, of the commonly used properties we’ve discussed). By constraining an operation to be associative, you automatically endow your mathematical object with a great degree of interesting structure.

The conclusion I’ve drawn from these observations is that there is a sweet spot of subjective “interestingness” that exists between the highly structured and the highly unstructured. I think that objects like groups and rings occupy this sweet spot, and as you go further in either direction of the structure spectrum the objects you find become less interesting and harder to find surprising results about.

Proving Cauchy’s Theorem

In the last post, I linked to a video of me explaining and proving Lagrange’s theorem. This theorem does a lot to empower us to get a deeper understanding of the structure of groups (for instance, allowing us to prove that there is only one single group of size p, where p is any prime number).

This time, we’ll be going a level deeper and proving a much much more powerful theorem: Cauchy’s theorem! While Lagrange’s theorem gives us a necessary condition for the existence of subgroups (that the subgroup’s size divides the group’s size), Cauchy’s theorem gives us a sufficient condition (that the subgroup’s size is a prime factor of the group’s size)!

This is a really powerful tool to add to our toolkit for analyzing groups. It also really nicely complements Lagrange’s theorem, in that Lagrange’s theorem is most powerful when applied to group sizes that are prime or have few prime factors, while Cauchy’s theorem is most powerful when applied to group sizes that are highly composite.

Anyway, here are the links to the two videos in which I go in depth into all of this!

Even though these two videos together are about 35 minutes, the proof itself is actually quite short! Most of the work done in the videos is just building up some powerful concepts (actions, orbits, stabilizers). These concepts allow us to prove the theorem in just a few lines! (In fact, if you look closely at the proof, you might notice a a way to get an even stronger result with just a few more lines!)

Group Theory: Lagrange’s Theorem and the Sudoku Principle

Voltaire said to not let the perfect be the enemy of the good. In that spirit, I’ve decided to post a few (highly imperfect) video lectures I recently on group theory. There’s a lot of things I would do to go back and improve these lectures, but I hope that the content makes up for any flaws!

There will be a third one coming soon, and will be a working-through of an example that shows the power of Lagrange’s theorem. This format is a bit of an experiment for me, and I’ll keep working to make it better. I hope you enjoy!

The Most Irrational of Numbers

Today I have a new interactive demonstration for you!

Controls
Zoom in: W
Zoom out: S
Reset angle: 0
Switch mode (auto or user): Space
On “User Mode”:
– Increase angle: Up
– Decrease angle: Down

[pjs4wp]
void setup() {
size(600,600);
background(0);
stroke(255);
fill(255);
}
float r = 2;
float da = 0;
float dda = .0001;
boolean autoPlay = true;
void draw() {
background(0);
text(“Angle = ” + (float)((int)(da*10000))/100. + “%”, width/2, 20);
text(“Zoom = ” + (float)((int)(r*100))/100., width/2, 35);
if (autoPlay)
text(“Mode: Auto”, width/2, 50);
else
text(“Mode: User”, width/2, 50);
translate(width/2,height/2);
int i = 0;
float angle = 0;
while (width/r > i) {
float x = r*i*cos(angle);
float y = r*i*sin(angle);
ellipse(x,y,5,5);
angle += 2*PI*da;
i++;
}
if (keyPressed) {
if (key == ‘w’)
r *= 1.01;
if (key == ‘s’)
r *= .99;
if (key == ‘0’)
da = 0;
if (!autoPlay) {
if (keyCode == UP)
da += dda;
if (keyCode == DOWN)
da -= dda;
}
}
if (autoPlay)
da += dda;
}
void keyReleased() {
if (key == ‘ ‘)
autoPlay = 1 – autoPlay;
}
[/pjs4wp]

(As always, zooming out too much will make things begin to lag. Also, fair warning, things start getting a little seizure-inducing around a zoom of .5 or less.)

Okay! Pretty, right? But what’s actually going on?

Each frame we are drawing a sequence of dots. We start at the center of the screen, and take constant radial steps outwards. And for each step, we apply some fixed angular rotation. This angular rotation is given as a percentage of 2π on the top of the screen, and is the thing being adjusted frame by frame (either automatically or based on your input, depending on which mode you choose).

What’s cool is what happens when you approach simple rational numbers. Take a look for yourself by going on “User” mode and bringing the angle to 10% (for 1/10), 14.28% (for 1/7), 25% (for 1/4), and so on. Here are two examples:

Screen Shot 2019-07-09 at 11.27.02 PMScreen Shot 2019-07-09 at 11.27.28 PM

Cool, right? At every rational number, we end up with a number of “spokes” corresponding to the denominator of our rational number. People sometimes describe the golden ratio Ф as the “most irrational number”, meaning that in some sense it is the most difficult to approximate by rational numbers. How does this look on our visualization? Check it out!

Screen Shot 2019-07-09 at 11.32.34 PM

And zooming out further, we see…

Screen Shot 2019-07-09 at 11.33.17 PM

We can see that using Ф as our angular rotation gives us the “least spokey” shape possible. As we zoom out further and further, we will always see our points looking basically evenly distributed across the angles at any given radius. This is a wonderfully visual way to understand the irrationality of Ф!

Collatz Tree

For some reason, I have a fascination with all the different sequences of numbers that arise out of thinking about the Collatz conjecture. In this post, we’ll look at a few more of these.

The Collatz chain for a number N is the sequence of numbers that arise from repeatedly iterating the function
Screen Shot 2019-06-16 at 11.09.39 PM
until you get to 1. The Collatz conjecture is the statement that this sequence always terminates, no matter what n you start at.

In other words, f(N) is the function that tells you what N becomes after a single iteration. Let’s look at the inverse of this: what numbers become N after a single iteration?

One such number is 2N: 2N is always even, so after one step it falls down to n. Another possibility is m = (N-1)/3. Why? Well, if m is an odd number, then it will become 3m+1 = 3 (N-1)/3 + 1 = N – 1 + 1 = N. So for each N, we get either one or two new values.

The Collatz tree is what we get if we start at 1 and repeatedly iterate the inverse function. After n iterations, we get to the set of all numbers that take at most n steps to get to 1. Here’s what it looks like!

Screen Shot 2019-06-17 at 6.18.05 PM

I’ve never actually seen it plotted this way, and I think it shows something potentially important. Look at the order in the image! Why do we get this grid of apparently identical rhombuses? I’d love an explanation for this.

I also like the idea of “surfing” the jagged lower end of this graph, staying as small as possible while iterating f. In general, the lower end of the graph is the sequence of smallest numbers that take N steps to get to 1.

Here are the first thirty numbers in this sequence:

1, 2, 4, 8, 16, 5, 10, 3, 6, 12, 24, 48, 17, 34, 11, 22, 7, 14, 28, 9, 18, 36, 72, 25, 49, 98, 33, 65, 130, 43

Plotted, this sequence looks like:

Collatz Minimums

The first twelve of these numbers themselves form a Collatz chain. We leave this particular chain on the thirteenth step, at which point we enter a new chain containing 17. We stay in this new chain until seven steps later, at which point we enter a new chain containing 25. But now we only stay in this new chain for one step before departing for a new one! This is another sequence of numbers that fascinate me: the chain lengths at which our minimum values switch to entirely new chains. Here are the values for up to 100 steps:

12, 23, 24, 26, 27, 34, 36, 37, 38, 39, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56, 60, 61, 63, 64, 65, 72, 73, 74, 75, 76, 97, 98

The differences between these values are plotted here:

Differences

Whenever this plot spikes, this points to an existence of a Collatz chain that surfs the bottom of the Collatz tree for some time before jumping up to a higher value!

Producing all numbers using just four fours

How many of the natural numbers do you think you can produce just by combining four 4s with standard mathematical operations?

The answer might blow your mind a little bit. It’s all of them!

In particular, you can get any natural number by using the symbols ‘4’, ‘√’, ‘log’, and ‘/’. And in particular, you can do it with just four 4s!

Try it for yourself before moving on!

 

 

Continue reading “Producing all numbers using just four fours”