A Talmudic Probability Puzzle

Today we’ll take a little break from the more intense abstract math stuff I’ve been doing, and do a quick dive into a fun probabilistic puzzle I found on the internet.

Background for the puzzle: In Ancient Israel, there was a court of 23 wise men that tried important cases, known as the Sanhedrin. If you were being tried for a crime by the Sanhedrin and a majority of them found you guilty, you were convicted. But there was an interesting twist on this! According to the Talmud (Tractate Sanhedrin: Folio 17a), if the Sanhedrin unanimously found you guilty, you were to be acquitted.

If the Sanhedrin unanimously find [the accused] guilty, he is acquitted. Why? — Because we have learned by tradition that sentence must be postponed till the morrow in hope of finding new points in favour of the defence. But this cannot be anticipated in this case.

Putting aside the dubious logic of this rule, it gives rise to an interesting probability puzzle with a counterintuitive answer. Imagine that an accused murderer has been brought before the Sanhedrin, and that the evidence is strong enough that no judge has any doubt in their mind about his guilt. Each judge obviously wants for the murderer to be convicted, and would ordinarily vote to convict. But under this Talmudic rule, they need to be worried about the prospect of them all voting guilty and therefore letting him off scot-free!

So: If a probability p can be chosen such that each and every judge votes to convict with probability p, and to acquit with probability 1 – p, which p will give them the highest probability of ultimately convicting the guilty man?

Furthermore, imagine that the number of judges is not 23, but some arbitrarily high number. As the number of judges goes to infinity, what does p converge to?

I want you to think about this for a minute and test your intuitions before moving on.

 

(…)

 

 

(…)

 

 

(…)

 

So, it turns out that the optimal p for 23 judges is actually ≈ 75.3%. And as the number of judges goes to infinity? The optimal value of p converges to…

80%!

This was a big shock to me. I think the natural first thought is that when you have thousands and thousands of judges, you only need a minuscule chance for any one judge to vote ‘acquit’ in order to ensure a majority and prevent him from getting off free. So I initially guessed that p would be something like 99%, and would converge to 100% in the limit of infinite judges.

But this is wrong! And of the small sample of mathematically gifted friends I asked this question to, they mostly guessed the same as me.

There’s clearly a balance going on between the risk of a minority voting to convict and the risk of a unanimous vote to convict. For small p, the first of these is ~1 and the second is ~0, and for p ~ 1, the first is ~0 and the second ~1.  It seems that we are naturally underemphasizing the danger of a minority vote to convict, and overemphasizing the danger of the unanimous vote.

Here are some plots of the various relevant values, for different numbers of judges:

10 Judges23 Judges100 Judges150 Judges

One thing to notice is that as the number of judges gets larger, the graph’s peak becomes more and more of a plateau. And in the limit of infinite judges, you can show that the graph is actually just a simple step function: Pr(conviction) = 0 if p < .5, and 1 if p > .5. This means that while yes, technically, 80% is the optimal value, you can do pretty much equally well by choosing any value of p greater than 50%.

My challenge to you is to come up with some justification for the value 80%. Good luck!

Some Curious Group Isomorphisms

Curiosity Number One

The additive group of polynomials with integer coefficients is isomorphic to the multiplicative group of positive rational numbers: ([x], +) (ℚ+, ).

Any element of [x] can be written like a0 + a1x + a2x2 + … + anxn, where all coefficients a0, a1, …, an are integers. Consider the following mapping: φ: (a0 + a1x + a2x2 + … + anxn) (p0a0p1a1pnan), where pk refers to the kth prime number. Now, this is a homomorphism from [x] to ℚ+ because φ(p(x) + q(x)) = φ(p(x))φ(q(x)). You can also easily show that it is onto and one-to-one, using the uniqueness of prime factorization. Therefore φ is an isomorphism!

I think this is a good example to test the intuitive notion that once one “knows” a group, one also knows all groups that it is isomorphic to. It’s often useful to think of isomorphic groups as being like one book written in two different languages, with the isomorphism being the translation dictionary. But the non-obviousness of the isomorphism here makes it feel like the two groups in fact have a considerably different internal structure.

Curiosity Number Two

Your friend comes up to you and tells you, “I’m thinking of some finite group. All I’ll tell you is that it has at least two order-2 elements. I’ll give you $20 if you can tell me what the group that’s generated by these two elements is (up to isomorphism)!“

What do you think your chances are of getting this $20 reward? It seems intuitively like your chances are probably pretty dismal… there are probably hundreds or maybe even an infinity of groups that match her description.

But it turns out that you can say exactly what group your friend is thinking of, just from this information!

Call the group G. G is finite and has two order-2 elements, which we’ll call a and b. So we want to find out what <a,b> is. Define r = ab and n = |r|.  Then ara = a(ab)a = (aa)ba = ba = (ab)-1 = r-1 = rn-1. So ara = rn-1, or in other words ar = rn-1a. Also, since b = ar, we can write the group we’re looking for either as <a,b> or as <a,r> (these two are equal).

Thus the group we’re looking for can be described as follows:

<a, r | a2 = rn = e, ar = rn-1a>.

Look familiar? This is just the dihedral group of size 2n; it is the group of symmetries of a regular n-gon, where a is a flip and r is a rotation!

So if G is a finite group with at least two order-2 elements a and b, then <a, b> Dih|ab|.

This serves to explain the ubiquity and importance of the dihedral groups! Any time a group contains more than one order-two element, it must have a dihedral group of some order as a subgroup.

Curiosity Number Three

If K  H G, then H/K G/K and (G/K) / (H/K) ≅ G/H.

(G/K) / (H/K) is a pretty wild object; it is a group of cosets, whose representatives are themselves cosets. But this theorem allows us to treat it as a much less exotic object, an ordinary quotient group with representatives as elements from G.

I don’t have much else to say about this one, besides that I love how the operation of forming a quotient group behaves so much like ordinary division!

Group Theory Toolkit

The Toolkit

  1. The Group Axioms
    1. Closure: a,b ∈ G, a•b ∈ G
    2. Associativity: a,b,c ∈ G, (a•b)•c = a•(b•c)
    3. Identity: ∈ G s.t. ∈ G, e•a = a•e = a
    4. Inverses: ∈ G, a’ ∈ G s.t. a’•a = e
  2. Cancellation Rule
    (a•b = a•c) (b = c)
  3. Order Doesn’t Matter
    You can write a Cayley table with the elements in any order that’s convenient.
  4. Sudoku Principle
    Each row and column in the Cayley table for a group G contains every element in G exactly once.
  5. Lagrange’s Theorem
    H ≤ G |G| is divisible by |H|
  6. Elements Generate Subgroups
    ∈ G, <g> = {gk for all integer k} ≤ G.
    (|<g>| = n) (gn = e).
  7. Cauchy’s Theorem
    (|G| = pk for prime p) (∈ G s.t. gp = e)
  8. Classification of Finite Abelian Groups
    Every finite abelian group is isomorphic to Zn or direct products of Zn
  9. Orbit-Stabilizer Theorem
    |orb(x)||stab(x)| = |G|
  10. Partition Equation
    ∑|orb(x)| = |X|. The sum is taken over one representative for each orbit.
  11. Center is a Normal Subgroup
    Z(G), the center of G, is a normal subgroup. So G/Z is a group.
  12. Class Equation
    |G| = |Z| + ∑|Cl(x)|, where all terms divide |G| and each |Cl(x)| is neither 1 nor |G|. The sum is taken over non-central representatives of conjugacy classes.
  13. G/Z is Cyclic Implies G is Abelian
    G/Z is cyclic  G is abelian
  14. Size of Product of Subgroups
    |AB| = |A| |B| / |A B|
  15. Sylow’s First Theorem
    If |G| = pnk for prime p and k not divisible by p, then G has a subgroup of size pn.
  16. Sylow’s Third Theorem
    If |G| = pnk for prime p and k not divisible by p, then the number of subgroups of size pn must be 1 mod p, and must divide k.

Exercises

Example 1: Solving the group of size 3

Suppose |G| = 3. By the Identity Axiom (1.3), G has an identity element e. Call the other two elements a and b. So G = {e, a, b}. Let’s write out the Cayley table for the group:

e a b
e e a b
a a
b b

By the Cancellation Rule (2), a•a ≠ a. By Lagrange’s Theorem (5), a•a ≠ e, as then the set {e, a} would be a subgroup of size 2, which doesn’t divide 3. Therefore by Closure (1.1), a•a = b.

e a b
e e a b
a a b
b b

We can fill out the rest of the Cayley table by the Sudoku principle (4).

e a b
e e a b
a a b e
b b e a

Example 2: Solving the group of size 5

e a b c d
e e a b c d
a a
b b
c c
d d

If a•a = e, then {e,a} would be a subgroup of size 2. By Lagrange’s theorem, then, a•a ≠ e. Also, by the Sudoku principle, a•a ≠ a. Thus we can choose the order of the rows/columns such that a•a = b (3).

e a b c d
e e a b c d
a a b
b b
c c
d d

Now look at a•b = a•(a•a) = a3. If a3 = e, then <a> would be a subgroup of size 3 (by 6). 3 doesn’t divide 5, so a•b ≠ e (Lagrange’s theorem again). By the Sudoku principle, a•b also isn’t a or b. So we can arrange the Cayley table so that a•b = c. Since a•b = a3 =  b•a, b•a = c as well.

e a b c d
e e a b c d
a a b c
b b c
c c
d d

Now we can use the Sudoku principle to fill out the rest!

e a b c d
e e a b c d
a a b c d e
b b c d e a
c c d e a b
d d e a b c

Example 3: Solving all groups of prime order

Suppose |G| = p, where p is some prime number. By Lagrange’s theorem every subgroup of G has either size 1 (the trivial subgroup {e}) or size p (G itself). Since the set generated by any element is a subgroup, any element g in G must have the property that |<g>| = 1 or p.

Choose g to be a non-identity element. <g> includes e and g, so |<g>| ≥ 2. So |<g>| = p. So <g> = G. So G is a cyclic group (generated by one of its elements). This fully specifies every entry in the Cayley table (gn•gm = gn+m), so there is only one group of size p up to isomorphism.

Example 4: Solving size 6 groups

Suppose |G| = 6.

By Cauchy’s theorem (7), there exists at least one element of order 2 and at least one element of order 3. Name the order-2 element a and the order-3 b. Fill in the Cayley table accordingly:

e a b ? ? ?
e e a b
a a e
b b
?
?
?

Suppose b2 = a. Then (b2)2 = a2 = e, so b is order 4. But b is order 3. So b2 ≠ a. Also, b2 ≠ e for the same reason, and ≠ b by the Cancellation Rule . So b2 is something new, which we’ll put in our next column:

e a b b2 ? ?
e e a b b2
a a e
b b b2
b2 b2
?
?

b3 = e, so we can fill out b • b2 and b2 • b, as well as b2 • b2 = b3 • b = b

e a b b2 ? ?
e e a b b2
a a e
b b b2 e
b2 b2 e b
?
?

By the Sudoku principle, we can clearly see that a • b is none of e, a, b, or b2. So whatever it is, we’ll put it in the next column as “ab”:

e a b b2 ab ?
e e a b b2 ab
a a e ab
b b b2 e
b2 b2 e b
ab ab
?

Looking at a•b2, we can again see that it is none of the elements we’ve identified so far. So whatever it is, we’ll put it in the next column as “ab2”.

e a b b2 ab ab2
e e a b b2 ab ab2
a a e ab ab2
b b b2 e
b2 b2 e b
ab ab
ab2 ab2

Associativity (1.2) allows us to use the algebraic rules a2  = e and b3 = e to fill in a few more spots (like a•ab = a2b = b).

e a b b2 ab ab2
e e a b b2 ab ab2
a a e ab ab2 b b2
b b b2 e
b2 b2 e b
ab ab ab2 a
ab2 ab2 a ab

Now everything else in the table is determined by a single choice: should we assign b•a to ab, or to ab2? Each leads to a Cayley table consistent with the axioms, so we write them both out, using the Sudoku principle to finish each one entirely.

Group 1: (ba = ab)

e a b b2 ab ab2
e e a b b2 ab ab2
a a e ab ab2 b b2
b b ab b2 e ab2 a
b2 b2 ab2 e b a ab
ab ab b ab2 a b2 e
ab2 ab2 b2 a ab e b

Group 2: (ba = ab2)

e a b b2 ab ab2
e e a b b2 ab ab2
a a e ab ab2 b b2
b b ab2 b2 e a ab
b2 b2 ab e b ab2 a
ab ab b2 ab2 a e b
ab2 ab2 b a ab b2 e

These two groups are clearly not isomorphic, as one is commutative and the other not. This proves that there are two groups of size 6 up to isomorphism!

Example 5: Solving groups of prime order, again

Suppose |G| = p, where p is some prime number. The center of G, Z(G), is a subgroup of G (by 11). So by Lagrange’s theorem, |Z| = 1 or p.

By the Class Equation (12), |G| = |Z| + ∑|Cl(x)|. All terms divide p and each |Cl(x)| ≠ 1, so we have p = |Z| + ∑p. This is only possible if |Z| = 0 or there are no non-central elements. Z contains e, so |Z| ≠ 0. So there are no non-central elements. Thus G = Z, which is to say G is abelian.

By the Classification of Finite Abelian Groups (8), G is isomorphic to Zp.

Example 6: Prime power groups have non-trivial centers

Suppose |G| = pn, where p is some prime number and n is some positive integer. The Class Equation tells us that |G| = |Z| + ∑|Cl(x)|, where |Z| = 1, p, p2, …, or pn and each |Cl(x)| = p, p2, …, or pn-1. Taking the Class Equation modulo p we find that |Z| = 0 (mod p), so |Z| cannot be 1. So G has a non-trivial center!

Example 7: Solving groups of prime-squared order

Suppose |G| = p2, where p is some prime number. The Class Equation tells us that |G| = |Z| + ∑|Cl(x)|, where |Z| = 1, p, or p2 and each |Cl(x)| = p. So we have p2 = |Z| + ∑p. Taking this equation mod p we find that |Z| = 0 (mod p), so |Z| is non-trivial. (This is just a special case of Example 6 above.) So |Z| = p or p2.

Suppose |Z| = p. Then |G/Z| = p2/p = p. So G/Z is cyclic. But then G is abelian (by 13). So G = Z, which implies that |Z| = p2. Contradiction. So |Z| = p2, i.e. G is abelian.

By the classification of finite abelian groups we see that there are two possibilities: G is isomorphic to integers mod p2 or G is isomorphic to Zp x Zp. These are the only two groups (up to isomorphism) of size p2.

Example 8: Proving Cauchy’s Theorem

Suppose |G| = n, where n is divisible by some prime p.

Let X be the subset of p-tuples of elements from G whose product is e. That is, X = {(g1,g2,…,gp) Gp s.t. g1g2…gp = e}. Notice that the number of elements in X is |G|p-1 (there are |G| choices for each of the first p-1 elements, and then the final one is fully determined by the constraint). Let the integers mod p (Zp) act on X by the following rule: the action of 1 on an element of X is 1 • (g1,g2,…,gp) = (gp,g1,g2,…,gp-1). (From this we can deduce the action of all the other integers mod p.)

By the Orbit Stabilizer Theorem (9), the possible orbit sizes for this action must divide the size of the cyclic group. That is, for every x X, |orb(x)| = 1 or p. Let r be the number of orbits of size 1 and s be the number of orbits of size p. Then by the Partition Equation (10), we have r + sp = |X| = |G|p-1. By our starting assumption, we have that the right hand side is divisible by p. And since sp is divisible by p, r must also be divisible by p. That is, r = 0, or p, or 2p, and so on.

Notice that the p-tuple (e,e,…,e) is in X, and that its orbit size is 1. So r is at least 1. So r = p, or 2p, and so on. That means that there is at least one more element of X with orbit size 1. This element must look like (a,a,…,a) for some a G. And since it’s in X, it must satisfy our constraint, namely: ap = e! So there is at least one non-identity element of order p.

Example 9: Subgroups of Group of Size 20

Suppose |G| = 10 = 22*5

By Cauchy’s Theorem and Sylow’s First Theorem (15), we know there must be subgroups of size 2, 4, and 5. We can learn more about the number of subgroups of sizes 4 and 5 using Sylow’s Third Theorem (16).

If N is the number of subgroups of size 4, then N divides 5 (1 or 5), and N = 1 (mod 2). So N = 1 or 5.

If N is the number of subgroups of size 5, then N divides 4 (1, 2, 4), and N = 1 (mod 5). So N = 1.

Example 10: Subgroups of Group of Size p2q

Suppose |G| = p2q, for primes p and q where 2 < p < q and q-1 is not a multiple of p.

Then by (15) we have subgroups of size p2 and q. Again, we use (16) to learn more about these subgroups.

If N is the number of subgroups of size p2, then N divides q (1 or q), and N = 1 (mod p). q ≠ 1 (mod p), N cannot be q. So N = 1.

If N is the number of subgroups of size q, then N divides p2 (1, p, or p2), and N = 1 (mod q) (so N is 1, or q+1, or 2q+1, and so on). But p is smaller than q, so N can’t be p. And if N = p2, then p2 – 1 must be a multiple of q. But p2 – 1 = (p+1)(p-1). Both of these values are too small to contain factors of q. So their product cannot be a multiple of q. So N ≠ p2 either. So N = 1!

Example 11: Subgroups of Group of Size pqn

Suppose |G| = pqn, where p and q are primes and 2 < p < q.

Using Sylow III (16): If N is the number of subgroups of size qn, then N divides p (1 or p) and N = 1 (mod q). p is too small to be a multiple of q plus 1, So N = 1.

Example 12: Classifying Groups of Size 2p

Suppose |G| = 2p for some prime p > 2. Let’s use everything in our toolkit to fully classify the possibilities!

By Cauchy, we have elements of of order 2 and p. Let a be an element of order 2 and b an element of order p. Let A = <a> = {e, a} and B = <b> = {e, b, b2, …, bp-1}. A and B are both cyclic of different orders, so their intersection is trivial. So by (14), |AB| = |A| |B| / |A B| = 2p. So AB = G. This means we can write all the elements of G as follows:

e b b2 bp-1
e e b  b2  bp-1
a a ab ab2 abp-1

So G = {e, b, b2, …, bp-1, a, ab, ab2, …, abp-1}. Call N2 the number of subgroups of size 2 and Np the number of subgroups of size p. Applying Sylow III, it is easy to see that there are two possibilities: (N2 = 1 and Np = 1), or (N2 = p and Np = 1).

Case 1: N2 = p and Np = 1.

Since N2 = p and Np = 1, there are p elements of order 2 (one for each subgroup) and p-1 elements of order p. Adding in e, this accounts for all of G.

Order 1: e
Order p: b, b2, …, bp-1
Order 2: a, ab, ab2, …, abp-1 (the remaining members of G)

Now, since a and b are in G, so must be ba. And since G = AB, ba AB. It’s not e or a power of b (apply the Cancellation Rule), so ba must be abn for some n. This means that ba is order 2, so (ba)2 = e. Expanding and substituting, we get (ba)2 = (ba)(ba) = (ba)(abn) = ba2bn = bn+1 = e = bp.

So n = p – 1, or in other words, ba = abp-1. This actually allows us to fully fill out the rest of the Cayley table, as we can take any expression and move the ‘b’s all the way to the right, ending up with something that looks like anbm.  This is the dihedral group of degree p (the group of symmetries of a regular p-gon)!

Case 2: N2 = 1 and Np = 1.

Order 1: e
Order p: b, b2, …, bp-1
Order 2: a

No other elements can be order 1, 2, or p, so all other elements must be order 2p (think Lagrange’s theorem). For instance, (ab)2p = e and (ab)p ≠ e . You can show that this only works if ab = ba, which allows you to freely move ‘a’s and ‘b’s around in any expression. ((ab)2p = a2p b2p = (a2)p (bp)2 = e, and (ab)p = ap bp = ap = a ≠ e).

Again, we can use this to fill out the entire Cayley table. The group we get is isomorphic to Z2p (the integers mod 2p)!

So if |G| = 2p for some prime p > 2, then G is either the group of symmetries of a regular p-gon, or it’s the integers mod p. Fantastic! If you follow this whole line of reasoning, then I think that you have a good grasp of how to use the toolkit above.

Producing all numbers using just four fours

How many of the natural numbers do you think you can produce just by combining four 4s with standard mathematical operations?

The answer might blow your mind a little bit. It’s all of them!

In particular, you can get any natural number by using the symbols ‘4’, ‘√’, ‘log’, and ‘/’. And in particular, you can do it with just four 4s!

Try it for yourself before moving on!

 

 

Continue reading “Producing all numbers using just four fours”

Hopping Midpoints

Put down three points on a piece of paper. Choose one of them as your “starting point”. Now, randomly choose one of the three points and hop from your starting point, halfway over to the chosen point. Mark down where you’ve landed. Then repeat: randomly choose one of the three starting points, and move halfway from your newly marked point to this new chosen point. Mark where you land. And on, and on, to infinity.

What pattern will arise? Watch and see!

Controls:
E to increase points/second.
Q to decrease points/second.
Click and drag the red points to move them around.
Pressing a number key will make a polygon with that number of sides.
[pjs4wp]
float N = 3;
float[] X = new float(N);
float[] Y = new float(N);
float radius = 600/2 – 20;
float i = 0;
while (N > i)
{
X[i] = radius * cos(2*PI*i/N – PI/2 + 2*PI/N);
Y[i] = radius * sin(2*PI*i/N – PI/2 + 2*PI/N);
i += 1;
}
float xNow = X[0];
float yNow = Y[0];
float speed = 1;
int selected = -1;
void setup()
{
size(600,600);
frameRate(10);
background(0);
}
void draw()
{
fill(255);
stroke(255);
text((str)(speed*10) + ” points / second\nE to speed up\nQ to slow down\nClick and drag red points!”,15,25);
translate(width/2, height/2);
float i = 0;
while(i j)
{
ellipse(X[j],Y[j],10,10);
j += 1;
}
}
void keyReleased()
{
bool reset = 0;
if (key == ‘e’) speed *= 10;
else if (key == ‘q’ && speed > 1) speed /= 10;
else if (key == ‘2’) N = 2;
else if (key == ‘3’) N = 3;
else if (key == ‘4’) N = 4;
else if (key == ‘5’) N = 5;
else if (key == ‘6’) N = 6;
else if (key == ‘7’) N = 7;
else if (key == ‘8’) N = 8;
else if (key == ‘9’) N = 9;
else reset = 1;
if (reset == 0)
{
background(0);
float i = 0;
while (N > i)
{
X[i] = radius * cos(2*PI*i/N – PI/2);
Y[i] = radius * sin(2*PI*i/N – PI/2);
i += 1;
}
xNow = X[0];
yNow = Y[0];
}
}
void mousePressed()
{
if (selected == -1)
{
float i = 0;
while (N > i)
{
if ((X[i] + 5 >= mouseX – width/2) && (mouseX – width/2 >= X[i] – 5))
if ((Y[i] + 5 >= mouseY – height/2) && (mouseY – height/2 >= Y[i] – 5))
selected = i;
i += 1;
}
}
}
void mouseReleased()
{
if (selected != -1)
{
X[selected] = mouseX – width/2;
Y[selected] = mouseY – height/2;
selected = -1;
xNow = X[0];
yNow = Y[0];
background(0);
}
}
[/pjs4wp]

Exploring Julia Sets

Here’s a natural follow-up to my last post on the Mandelbrot set – an interactive Julia set explorer!

The Julia set corresponding to a particular point c = x + iy in the complex plane is defined as the set of complex numbers z that stay finite upon arbitrary iterations of the following function: fc(z) = z2 + c. The Mandelbrot set, by comparison, is defined as the set of complex numbers c such that the value obtained by starting with 0 and iterating the function fc arbitrarily many times converges.

What’s remarkable is now beautiful and complex the patterns that arise from this simple equation are. Take a look for yourself: just hover over a point to see its corresponding Julia set!

[pjs4wp]
float xmin = -1.8;
float xmax = 1.8;
float ymin = -1.2;
float ymax = 1.2;
float resolution = 30;
void setup()
{
size(600,400);
background(0);
stroke(255);
}
void draw()
{
background(0);
float cx = xmin + (xmax-xmin)*(float)(mouseX)/(float)(width);
float cy = ymin + (ymax-ymin)*(float)(mouseY)/(float)(height);
drawJuliaSet(cx, cy);
stroke(255);
fill(255);
line(-xmin*width/(xmax-xmin),0,-xmin*width/(xmax-xmin),height);
line(0,-ymin*height/(ymax-ymin),width,-ymin*height/(ymax-ymin));
}
void keyPressed()
{
if (key == ‘e’) resolution += 10;
if (key == ‘q’) resolution -= 10;
if (key == ‘ ‘) resolution = 30;
}
void drawJuliaSet(float cx, float cy)
{
text(“(” + (str)((int)(cx*100)/100.) + “,” + (str)(-(int)(cy*100)/100.) + “)”, 20, 20);
float i = 0;
float j = 0;
while (height > j)
{
x = xmin + (xmax-xmin)*(float)i/width;
y = ymin + (ymax-ymin)*(float)j/height;
float step = 0;
while (step 4) break;
step += 1;
}
stroke(color(0,255,0));
fill(color(0,255,0));
if (4 > x*x + y*y) point(i,j);
i += 1;
if (i == width)
{
i = 0;
j += 1;
}
}
stroke(0);
fill(color(255,0,0));
ellipse((cx-xmin)*width/(xmax-xmin),(cy-ymin)*height/(ymax-ymin),10,10);
}
[/pjs4wp]
Resolution is preset at a value good for seeing lots of details and loading at a reasonable speed, but should you want to change it, controls are ‘E’ to increase it and ‘Q’ to decrease it. To reset to default, press ‘SPACE’.

Hopping Midpoints and Mathematical Snowflakes

Let me start this off by saying that if you’re reading this blog and haven’t ever checked out the Youtube channel Numberphile, you need to go there right away and start bingeing their videos. That’s what I’ve been doing for the last few days, and it’s given me tons of cool new puzzles to consider.

Here’s one:

Naturally, after watching this video I wanted to try this out for myself. Here you see the pattern arising beautifully from the randomness:

Serpinski

I urge you to think hard about why the Sierpinski triangle would arise from something as simple as randomly hopping between midpoints. It’s very non-obvious, and although I have a few ideas, I’m still missing a clear intuition.

I also made some visualizations for other shapes. I’ll show some of them, but encourage you to make predictions about what pattern you’d expect to see before scrolling down to see the actual result.

First:

Square

Instead of three points arranged as above, we will start out with four points arranged in a perfect square. Then, as before, we’ll jump from our starting point halfway to one of these four, and will continue this procedure ad infinitum.

What pattern will arise? Do you think that we’ll have “missing regions” where no points can land, like with the triangle?

Scroll down to see the answer…

(…)

(…)

(…)

Square

Okay! So it looks like the whole square gets filled out, with no missing regions. This was pretty surprising to me; given that three points gave rise to a intricate fractal pattern, why wouldn’t four points do the same? What’s special about “3” .

Well, perhaps things will be different if we tweak the positions of the corners slightly? Will any quadrilateral have the same behavior of filling out all the points, or will the blank regions re-arise? Again, make a prediction!

Quadrilaterals

Let’s see:

4 TemplesCross4 Double Temples4 Rolos

Okay, now we see that apparently the square was actually a very special case! Pretty much any quadrilateral we can construct will give us a nested infinity of blank regions, as long as at least one angle is not equal to 90º. Again, this is fascinating and puzzling to me. Why do 90º angles invariably cause the whole region to fill out? I’m not sure.

Let’s move on to a pentagon! Do you think that a regular pentagon will behave more like a triangle or a square?

Pentagon

Take a look…

Pentagon

And naturally, the next question is what about a hexagon?

Hexagon

Hexagon

Notice the difference between the hexagon and all the previous ones! Rather than having small areas of points that are never reached, it appears that suddenly we get lines! Again, I encourage you to try to think about why this might be (what’s so special about 6?) and leave a comment if you have any ideas.

Now, I because curious about what other types of patterns we can generate with simple rules like these. I wondered what would happen if instead of simply jumping to the average of the current point and a randomly chosen point, we built a pattern with some “memory”. For instance, what if we didn’t just look at the current point and the randomly chosen point, but also at the last chosen point? We could then take the middle of the triangle formed by these three points as our new point.

It turns out that the patterns that arise from this are even more beautiful than the previous ones! (In my opinion, of course)

Take a look:

Triangle

Threes Triangle

Square

Threes Square

Pentagon

Threes Pentagon

Hexagon

Threes Hexagon

Heptagon

Threes Heptagon

I’ll stop here, but this is a great example of how beautiful and surprising math can be. I would have never guessed that such intricate fractal patterns would arise from such simple random rules.

Backwards induction and rationality

A fun problem I recently came across:

Consider two players: Alice and Bob. Alice moves first. At the start of the game, Alice has two piles of coins in front of her: one pile contains 4 coins and the other pile contains 1 coin. Each player has two moves available: either “take” the larger pile of coins and give the smaller pile to the other player or “push” both piles across the table to the other player. Each time the piles of coins pass across the table, the quantity of coins in each pile doubles. For example, assume that Alice chooses to “push” the piles on her first move, handing the piles of 1 and 4 coins over to Bob, doubling them to 2 and 8. Bob could now use his first move to either “take” the pile of 8 coins and give 2 coins to Alice, or he can “push” the two piles back across the table again to Alice, again increasing the size of the piles to 4 and 16 coins. The game continues for a fixed number of rounds or until a player decides to end the game by pocketing a pile of coins.

(from the wiki)

(Assume that if the game gets to the final round and the last player decides to “push”, the pot is doubled and they get the smaller pile.)

Assuming that they are self-interested, what do you think is the rational strategy for each of Alice and Bob to adopt? What is the rational strategy if they each know that the other reasons about decision-making in the same way that they themselves do? And what happens if two updateless decision theorists are pitted against each other?


If you have some prior familiarity with game theory, you might have seen the backwards induction proof right away. It turns out that standard game theory teaches us that the Nash equilibrium is to defect as soon as you can, thus never exploiting the “doubling” feature of the setup.

Why? Supposing that you have made it to the final round of the game, you stand to get a larger payout by “defecting” and taking the larger pile rather than the doubled smaller pile. But your opponent knows that you’ll reason this way, so they reason that they are better off defecting the round before… and so on all the way to the first round.

This sucks. The game ends right away, and none of that exponential goodness gets taken advantage of. If only Alice and Bob weren’t so rational! 

We can show that this conclusion follows as long as the three things are true of Alice and Bob:

  1. Given a choice between a definite value A and a smaller value B, both Alice and Bob will choose the larger value (A).
  2. Both Alice and Bob can accurately perform deductive reasoning.
  3. Both (1.) and (2.) are common knowledge to Alice and Bob.

It’s pretty hard to deny the reasonableness of any of these three assumptions!


Here’s a related problem:

An airline loses two suitcases belonging to two different travelers. Both suitcases happen to be identical and contain identical antiques. An airline manager tasked to settle the claims of both travelers explains that the airline is liable for a maximum of $100 per suitcase—he is unable to find out directly the price of the antiques.

To determine an honest appraised value of the antiques, the manager separates both travelers so they can’t confer, and asks them to write down the amount of their value at no less than $2 and no larger than $100. He also tells them that if both write down the same number, he will treat that number as the true dollar value of both suitcases and reimburse both travelers that amount. However, if one writes down a smaller number than the other, this smaller number will be taken as the true dollar value, and both travelers will receive that amount along with a bonus/malus: $2 extra will be paid to the traveler who wrote down the lower value and a $2 deduction will be taken from the person who wrote down the higher amount. The challenge is: what strategy should both travelers follow to decide the value they should write down?

(again, from the wiki)

Suppose you put no value on honesty, and only care about getting the most money possible. Further, suppose that both travelers reason the same way about decision problems, and that they both know this fact (and that they both know that they both know this fact, and so on).

The first intuition you might have is that both should just write down $100. But if you know that your partner is going to write down $100, then you stand to gain one whole dollar by defecting and writing $99 (thus collecting the $2 bonus for a total of $101). But if they know that you’re going to write $99, then they stand to gain one whole dollar by defecting and writing $98 (thus netting $100). And so on.

In the end both of these unfortunate “rational” individuals end up writing down $2. Once again, we see the tragedy of being a rational individual.


Of course, we could take these thought experiments to be an indication not of the inherent tragedy of rationality, but instead of the need for a better theory of rationality.

For instance, you might have noticed that the arguments we used in both cases relied on a type of reasoning where each agent assumes that they can change their decision, holding fixed the decision of the other agent. This is not a valid move in general, as it assumes independence! It might very well be that the information about what decision you make is relevant to your knowledge about what the other agent’s decision will be. In fact, when we stipulated that you reason similarly to the other agent, we are in essence stipulating an evidential relationship between your decision and theirs! So the arguments we gave above need to be looked at more closely.

If the agents do end up taking into account their similarity, then their behavior is radically different. For example, we can look at the behavior of updateless decision theory: two UDTs playing each other in the Centipede game “push” every single round (including the final one!), thus ending up with exponentially higher rewards (on the order of $2N, where N is the number of rounds). And two UDTs in the Traveller’s Dilemma would write down $100, thus both ending up roughly $98 better off than otherwise. So perhaps we aren’t doomed to a gloomy view of rationality as a burden eternally holding us back!


 

One final problem.

Two players, this time with just one pile of coins in front of them. Initially this pile contains just 1 coin. The players take turns, and each turn they can either take the whole pile or push it to the other side, in which case the size of the pile will double. This will continue for a fixed number of rounds or until a player ends the game by taking the pile.

On the final round, the last player has a choice of either taking all the coins or pushing them over, thus giving the entire doubled pile to their opponent. Both players are perfectly self-interested, and this fact is common knowledge. And finally, suppose that who goes first is determined by a coin flip.

Standard decision theory obviously says that the first person should just take the 1 coin and the game ends there. What would UDT do here? What do you think is the rational policy for each player?

A probability puzzle

probpuzzle.jpg

To be totally clear: the question is not assuming that there is ONLY one student whose neighbors both flipped heads, just that there is AT LEAST one such student. You can imagine that the teacher first asks for all students whose neighbors both flipped heads to step forward, then randomly selected one of the students that had stepped forward.

Now, take a minute to think about this before reading on…

It seemed initially obvious to me that the teacher was correct. There are exactly as many possible worlds in which the three students are HTH as there worlds in which they are HHH, right? Knowing how your neighbors’ coins landed shouldn’t give you any information about how your own coin landed, and to think otherwise seems akin to the Gambler’s fallacy.

But in fact, the teacher is wrong! It is in fact more likely that the student flipped tails than heads! Why? Let’s simplify the problem.

Suppose there are just three students standing in a circle (/triangle). There are eight possible ways that their coins might have landed, namely:

HHH
HHT
HTH
HTT
THH
THT
TTH
TTT

Now, the teacher asks all those students whose neighbors both have “H” to step forward, and AT LEAST ONE steps forward. What does this tell us about the possible world we’re in? Well, it rules out all of the worlds in which no student could be surrounded by both ‘H’, namely… TTT, TTH, THT, and HTT. We’re left with the following…

HHH
HHT
HTH
THH

One thing to notice is that we’re left with mostly worlds with lots of heads. The expected total of heads is 2.25, while the expected total of tails is just 0.75. So maybe we should expect that the student is actually more likely to have heads than tails!

But this is wrong. What we want to see is what proportion of those surrounded by heads are heads in each possible world.

HHH: 3/3 have H (100%)
HHT: 0/1 have H (0%)
HTH: 0/1 have H (0%)
THH: 0/1 have H (0%)

Since each of these worlds is equally likely, what we end up with is a 25% chance of 100% heads, and a 75% chance of 0% heads. In other words, our credence in the student having heads should be just 25%!

Now, what about for N students? I wrote a program that does a brute-force calculation of the final answer for any N, and here’s what you get:

N

cr(heads)

~

3

1/4

0.25

4

3/7

0.4286

5

4/9

0.4444

6

13/32

0.4063

7

1213/2970

0.4084

8

6479/15260

0.4209

9

10763/25284

0.4246

10

998993/2329740

0.4257

11

24461/56580

0.4323

12

11567641/26580015

0.4352

13

1122812/2564595

0.4378

14

20767139/47153106

0.4404

15

114861079/259324065

0.4430

16

2557308958/5743282545

0.4453

17

70667521/157922688

0.4475

These numbers are not very pretty, though they appear to be gradually converging (I’d guess to 50%).

Can anybody see any patterns here? Or some simple intuitive way to arrive at these numbers?