(Most of these are taken from Ian Hacking’s *Introduction to Probability and Inductive Logic*.)

- About as many boys as girls are born in hospitals. Many babies are born every week at City General. In Cornwall, a country town, there is a small hospital where only a few babies are born every week.
Define a normal week as one where between 45% and 55% of babies are female. An unusual week is one where more than 55% or less than 45% are girls.

Which of the following is true:

(a) Unusual weeks occur equally often at City General and at Cornwall.

(b) Unusual weeks are more common at City General than at Cornwall.

(c) Unusual weeks are more common at Cornwall than at City General. - Pia is 31 years old, single, outspoken, and smart. She was a philosophy major. When a student, she was an ardent supporter of Native American rights, and she picketed a department store that had no facilities for nursing mothers.
Which of the following statements are most probable? Which are least probable?

(a) Pia is an active feminist.

(b) Pia is a bank teller.

(c) Pia works in a small bookstore.

(d) Pia is a bank teller and an active feminist.

(e) Pia is a bank teller and an active feminist who takes yoga classes.

(f) Pia works in a small bookstore and is an active feminist who takes yoga classes. - You have been called to jury duty in a town with only green and blue taxis. Green taxis dominate the market, with 85% of the taxis on the road.
On a misty winter night a taxi sideswiped another car and drove off. A witness said it was a blue cab. This witness is tested under similar conditions, and gets the color right 80% of the time.

You conclude about the sideswiping taxi:

(a) The probability that it is blue is 80%.

(b) It is probably blue, but with a lower probability than 80%.

(c) It is equally likely to be blue or green.

(d) It is more likely than not to be green. - You are a physician. You think that it’s quite likely that a patient of yours has strep throat. You take five swabs from the throat of this patient and send them to a lab for testing.
If the patient has strep throat, the lab results are right 70% of the time. If not, then the lab is right 90% of the time.

The test results come back: YES, NO, NO, YES, NO

You conclude:

(a) The results are worthless.

(b) It is likely that the patient does not have strep throat.

(c) It is slightly more likely than not that the patient does have strep throat.

(d) It is very much more likely than not that the patient does have strep throat. - In a country, all families wants a boy. They keep having babies till a boy is born. What is the expected ratio of boys and girls in the country?
- Answer the following series of questions:
If you flip a fair coin twice, do you have the same chance of getting HH as you have of getting HT?

If you flip the coin repeatedly until you get HH, does this result in the same average number of flips as if you repeat until you get HT?

If you flip it repeatedly until either HH emerges or HT emerges, is either outcome equally likely?

You play a game with a friend in which you each choose a sequence of three possible flips (e.g HHT and TTH). You then flip the coin repeatedly until one of the two patterns emerges, and whosever pattern it is wins the game. You get to see your friend’s choice of pattern before deciding yours. Are you ever able to bias the game in your favor?

Are you

*always*able to bias the game in your favor?

## Solutions (and lessons)

- The correct answer is (a): Unusual weeks occur more often at Cornwall than at City General. Even though the chance of a boy is the same at Cornwall as it is at City General, the percentage of boys from week to week is larger in the smaller city (for N patients a week, the percentage boys goes like 1/sqrt(N)). Indeed, if you think about an extreme case where Cornwall has only one birth a week, then every week will be an unusual week (100% boys or 0% boys).
- There is room to debate the exact answer but whatever it is, it has to obey some constraints. Namely, the most probable statement cannot be (d), (e), or (f), and the least probable statement cannot be (a), (b), or (c). Why? Because of the conjunction rule of probability: each of (d), (e), and (f) are conjunctions of (a), (b), and (c), so they cannot be more likely. P(A & B) ≤ P(A).
It turns out that most people violate this constraint. Many people answer that (f) is the most probable description, and (b) is the least probable. This result is commonly interpreted to reveal a cognitive bias known as the representativeness heuristic – essentially, that our judgements of likelihood are done by considering which descriptions most closely resemble the known facts. In this case,

Another factor to consider is that prior to considering the evidence, your odds on a given person being a bank teller as opposed to working in a small bookstore should be heavily weighted towards her being a bank teller. There are just far more bank tellers than small bookstore workers (maybe a factor of around 20:1). This does not necessarily mean that (b) is more likely than (c), but it does mean that the evidence must discriminate strongly enough against her being a bank teller so as to overcome the prior odds.

This leads us to another lesson, which is to

*not neglect the base rate*. It is easy to ignore the prior odds when it feels like we have strong evidence (Pia’s age, her personality, her major, etc.). But the base rate on small bookstore workers and bank tellers are very relevant to the final judgement. - The correct answer is (d) – it is more likely than not that the sideswiper was green. This is a basic case of base rate neglect – many people would see that the witness is right 80% of the time and conclude that the witness’s testimony has an 80% chance of being correct. But this is ignoring the prior odds on the
*content*of the witness’s testimony.In this case, there were prior odds of 17:3 (85%:15%) in favor of the taxi being green. The evidence had a strength of 1:4 (20%:80%), resulting in the final odds being 17:12 in favor of the taxi being green. Translating from odds to probabilities, we get a roughly 59% chance of the taxi having been green.

We could have concluded (d) very simply by just comparing the prior probability (85% for green) with the evidence (80% for blue), and noticing that the evidence would not be strong enough to make blue more likely than green (since 85% > 80%). Being able to very quickly translate between statistics and conclusions is a valuable skill to foster.

- The right answer is (d). We calculate this just like we did the last time:
The results were YES, NO, NO, YES, NO.

Each YES provides evidence with strength 7:1 (70%/10%) in favor of strep, and each NO provides evidence with strength 1:3 (30%/90%).

So our strength of evidence is 7:1 ⋅ 1:3 ⋅ 1:3 ⋅ 7:1 ⋅ 1:3 = 49:27, or roughly 1.81:1 in favor of strep. This might be a little surprising… we got more NOs than YESs and the NO was correct 90% of the time for people without strep, compared to the YES being correct only 70% of the time in people with strep.

Since the evidence is in favor of strep, and we started out already thinking that strep was quite likely, in the end we should be very convinced that they have strep. If our prior on the patient having strep was 75% (3:1 odds), then our probability after getting evidence will be 84% (49:9 odds).

Again, surprising! The patient who sees these results and hears the doctor declaring that the test strengthens their belief that the patient has strep might feel that this is irrational and object to the conclusion. But the doctor would be right!

- Supposing as before that the chance of any given birth being a boy is equal to the chance of it being a girl, we end up concluding…
The expected ratio of boys and girls in the country is 1! That is, this strategy doesn’t allow you to “cheat” – it has no impact at all on the ratio. Why? I’ll leave this one for you to figure out. Here’s a diagram for a hint:

This is important because it applies to the problem of p-hacking. Imagine that all researchers just repeatedly do studies until they get the results they like, and only publish these results. Now suppose that all the researchers in the world are required to publish every study that they do. Now, can they still get a bias in favor of results they like? No! Even though they always stop when getting the result they like, the aggregate of their studies is unbiased evidence. They can’t game the system!

- Answers, in order:
If you flip a fair coin twice, do you have the same chance of getting HH as you have of getting HT? (Yes)

If you flip it repeatedly until you get HH, does this result in the same average number of flips as if you repeat until you get HT? (No)

If you flip it repeatedly until either HH emerges or HT emerges, is either outcome equally likely? (Yes)

You play a game with a friend in which you each choose a sequence of three coin flips (e.g HHT and TTH). You then flip a coin repeatedly until one of the two patterns emerges, and whosever pattern it is wins the game. You get to see your friend’s choice of pattern before deciding yours. Are you ever able to bias the game in your favor? (Yes)

Are you

*always*able to bias the game in your favor? (Yes!)Here’s a wiki page with a good explanation of this: LINK. A table from that page illustrating a winning strategy for any choice your friend makes:

1st player’s choice 2nd player’s choice Odds in favour of 2nd player __H__H**H****T**__HH__7 to 1 __H__T**H****T**__HH__3 to 1 __H__H**T****H**__HT__2 to 1 __H__T**T****H**__HT__2 to 1 __T__H**H****T**__TH__2 to 1 __T__T**H****T**__TH__2 to 1 __T__H**T****H**__TT__3 to 1 __T__T**T****H**__TT__7 to 1

#4 is wrong. I’m sorry, you can’t figure it that way. If the patient has strep throat then chance of YYNNN is A = (.7)^2 times (.1)^3. If patient does not have strep throat, then chance of YYNNN is B = (.1)^2 times (.9)^3. Then, chance that patient has strep throat, given the results is A/(A+B)

Nope.

Pr(test Y | actual Y) = .7

Pr(test Y | actual N) = .1

Pr(test N | actual Y) = .3

Pr(test N | actual N) = .9

So assuming independence of tests,

Pr(test YNNYN | actual Y) = (.7)^2 (.3)^3 = .01323

Pr(test YNNYN | actual N) = (.1)^2 (.9)^3 = 00729

This gives a likelihood ratio of .01323/.00729 = 1.81481, exactly as I said in the post.