comment by lil

a thoughtful web.

Good ideas and conversation. No ads, no tracking. Login or Take a Tour!

comment by lil

lil · 3753 days ago · link · · parent · post: For everyone in the book exchange: The Necessity of Marginalia

What is the probability? If the witness is right 80% of the time, is the probability 80%?

wasoxygen · 3753 days ago · link ·

Based on the evidence given, we can calculate a probability of 41% that the cab was blue. This witness is pretty reliable, but blue cabs are rare and it is more likely that the witness is mistaken than that the cab was really blue.

My approach was to imagine 100 hit-and-run accidents. In 85 of them, the cab will be green, in 15 blue.

Of the 85 green accidents, the witness will correctly see 67 green cabs (eighty percent accuracy) and mistakenly see 17 blue cabs.

Of the 15 blue accidents, the witness will correctly see 12 blue cabs and mistakenly see 3 blue cabs.

We are told that the witness saw blue. So we are considering one of the 17 mistaken identifications or 12 correct identifications, a total of 29 cases. The witness is correct (and the cab is really blue) in 12 out of 29 cases, about 41%.

+discuss+discuss

–

b_b · 3753 days ago · link ·

I'm glad we arrived at he same answer. I suppose lil will have a bit easier time following your logic. I had only read the bullet points from your post, and not the text of the book page, so I think I may have rehashed some unnecessary stuff.

Anyway, lil, the main point is that there are a lot of counter intuitive things in statistics.

+discuss+discuss

–

wasoxygen · 3753 days ago · link ·

Yes, and our number agrees with the footnote in the book, but I still don't follow the explanation there. Where did the number 1.706 come from?

Simpson's Paradox is my favorite statistical anomaly.

+discuss+discuss

–

b_b · 3752 days ago · link ·

Where did the number 1.706 come from?

I'm not sure either. I had tried to manipulate the equation to read something of the form P(A)/(1 + P(A)), but I couldn't find an obvious way to do that. Also, it doesn't work the other direction. That is, the probability of the car being green when it's reported green does not equal the probability of the car being reported blue divided by 1 plus that probability, so it's certainly not a general solution to the problem, but perhaps a weird coincidence of the way he did the arithmetic.

+discuss+discuss

b_b · 3753 days ago · link ·

It's complicated, actually.

Conditional probabilities are calculated as such, via Bayes' theorem: P(A given B) = P(B given A)P(A)/P(B)

The way this is read is, "Probability of A given B," which refers to the probability of A happening, given that we already know that B happened, where A and B are some kind of event in the world.

If we assign the probabilities thusly:

A = car is blue

B = car reported blue

then,

P(A) = 0.15 (given)

P(B) = 0.15 x 0.8 + 0.85 x 0.2 (this is complicated; the terms are such that the total number of times that the car is reported blue are the proportion that are blue times the percent of times blue is correctly identified plus the proportion of green times the percent of times green is incorrectly identified)

= 0.12 plus 0.17

= 0.29

P(B|A) = 0.8 (this is the proportion of correct identifications)

Therefore,

P(A given B) = 0.8 x 0.15/0.28

= 41.4%

Alternately, we can easily do the same calculation with the green car:

A = car is green

B = car reported green

P(A) = 0.85

P(B) = 0.85 x 0.8 plus 0.15 x 0.2

= 0.68 plus 0.03

= 0.71

P(B given A) = 0.8

Therefore,

P(A given B) = 0.8 x 0.85/0.71

= 95.8%

As you can see, for the car that is predominant, there are very few errors, but for the car that is less common, the error rate is remarkably high, at greater than 50%!!! That is, don't trust your memory :)

Sorry it's so hard to read! When you try to put math symbols in, you just end up with a bunch of bold, italics and quotes.