Fall 2011 Strategic Practice 3: Section 2 (Simpson's Paradox) - Question 1
Consider the following:
(a) Is it possible to have events A, B, E such that P(A|E) < P(B|E) and $P(A|E^{c}), yet P(A) > P(B)? That is, A is less likely under B given that E is true, and also given that E is false, yet A is more likely than B if given no information about E. Show this is impossible (with a short proof) or find a counterexample (with a "story" interpreting A, B, E).
(b) Is it possible to have events A, B, E such that $P(A|B, E) and $P(A|B, E^{c}), yet $P(A|B)>P(A|B^{c})$? That is, given that E is true, learning B is evidence against A, and similarly given that E is false; but given no information about E, learning that B is true is evidence in favor of A. Show this is impossible (with a short proof) or find a counterexample (with a "story" interpreting A, B, E).
Solution: (a) It is not possible, as seen using the law of total probability. (b) Yes, this is possible: this is the structure of Simpson's Paradox! For example, consider the Stampy problem above. Or, consider the two doctors example discussed in class with Dr. Hibbert and Dr. Nick and heart surgeries versus band-aid removal. There are many real-life examples of Simpson's Paradox. For example, it is possible for one baseball player to have a higher batting average than another in each of two seasons, yet a lower batting average when the two seasons are aggregated. Simpson's Paradox illustrates the importance of controlling for additional variables that interfere with the analysis (known as confounders).
"Mathematics is the logic of certainty, but statistics is the logic of uncertainty."