a thoughtful web.
Good ideas and conversation. No ads, no tracking.   Login or Take a Tour!
comment by wasoxygen
wasoxygen  ·  3655 days ago  ·  link  ·    ·  parent  ·  post: For everyone in the book exchange: The Necessity of Marginalia

    flagamuffin and wasoxygen refused to write in books
After that conversation I made an effort to write in the next book I read, Thinking, Fast and Slow, in addition to my usual habit of taking notes on a bookmark (a Titanic postcard, in this case).

Here I take issue with the author's seemingly overcomplicated solution to the taxi problem below (though his purpose may have been to demonstrate Bayes' Theorem).

    A cab was involved in a hit and run accident at night. Two cab companies, the Green and the Blue, operate in the city. You are given the following data:

    • 85% of the cabs in the city are Green and 15% are Blue.

    • A witness identified the cab as Blue. The court tested the reliability of the witness under the same circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colors 80% of the time and failed 20% of the time.

    What is the probability that the cab involved in the accident was Blue rather than Green?





lil  ·  3655 days ago  ·  link  ·  

What is the probability? If the witness is right 80% of the time, is the probability 80%?

wasoxygen  ·  3655 days ago  ·  link  ·  

Based on the evidence given, we can calculate a probability of 41% that the cab was blue. This witness is pretty reliable, but blue cabs are rare and it is more likely that the witness is mistaken than that the cab was really blue.

My approach was to imagine 100 hit-and-run accidents. In 85 of them, the cab will be green, in 15 blue.

Of the 85 green accidents, the witness will correctly see 67 green cabs (eighty percent accuracy) and mistakenly see 17 blue cabs.

Of the 15 blue accidents, the witness will correctly see 12 blue cabs and mistakenly see 3 blue cabs.

We are told that the witness saw blue. So we are considering one of the 17 mistaken identifications or 12 correct identifications, a total of 29 cases. The witness is correct (and the cab is really blue) in 12 out of 29 cases, about 41%.

b_b  ·  3655 days ago  ·  link  ·  

I'm glad we arrived at he same answer. I suppose lil will have a bit easier time following your logic. I had only read the bullet points from your post, and not the text of the book page, so I think I may have rehashed some unnecessary stuff.

Anyway, lil, the main point is that there are a lot of counter intuitive things in statistics.

wasoxygen  ·  3655 days ago  ·  link  ·  

Yes, and our number agrees with the footnote in the book, but I still don't follow the explanation there. Where did the number 1.706 come from?

Simpson's Paradox is my favorite statistical anomaly.

b_b  ·  3654 days ago  ·  link  ·  

    Where did the number 1.706 come from?

I'm not sure either. I had tried to manipulate the equation to read something of the form P(A)/(1 + P(A)), but I couldn't find an obvious way to do that. Also, it doesn't work the other direction. That is, the probability of the car being green when it's reported green does not equal the probability of the car being reported blue divided by 1 plus that probability, so it's certainly not a general solution to the problem, but perhaps a weird coincidence of the way he did the arithmetic.

b_b  ·  3655 days ago  ·  link  ·  

It's complicated, actually.

Conditional probabilities are calculated as such, via Bayes' theorem: P(A given B) = P(B given A)P(A)/P(B)

The way this is read is, "Probability of A given B," which refers to the probability of A happening, given that we already know that B happened, where A and B are some kind of event in the world.

If we assign the probabilities thusly:

A = car is blue

B = car reported blue

then,

P(A) = 0.15 (given)

P(B) = 0.15 x 0.8 + 0.85 x 0.2 (this is complicated; the terms are such that the total number of times that the car is reported blue are the proportion that are blue times the percent of times blue is correctly identified plus the proportion of green times the percent of times green is incorrectly identified)

= 0.12 plus 0.17

= 0.29

P(B|A) = 0.8 (this is the proportion of correct identifications)

Therefore,

P(A given B) = 0.8 x 0.15/0.28

= 41.4%

Alternately, we can easily do the same calculation with the green car:

A = car is green

B = car reported green

P(A) = 0.85

P(B) = 0.85 x 0.8 plus 0.15 x 0.2

= 0.68 plus 0.03

= 0.71

P(B given A) = 0.8

Therefore,

P(A given B) = 0.8 x 0.85/0.71

= 95.8%

As you can see, for the car that is predominant, there are very few errors, but for the car that is less common, the error rate is remarkably high, at greater than 50%!!! That is, don't trust your memory :)

Sorry it's so hard to read! When you try to put math symbols in, you just end up with a bunch of bold, italics and quotes.