Simpson's Paradox

What is Simpson's Paradox?

Put simply, Simpson's Paradox is a phenomenon found in probability in which a trend appears in several different groups, but vanishes or reverses when the groups are combined. The paradox is often found in the social-science and medical-science disciplines and requires casual relations to be appropriately addressed to mitigate inference errors due to the phenomenon. First described by Edward H. Simpson in his 1951 paper, the phenomenon itself had been observed and mentioned in works as early as 1899.


How does Simpson's Paradox work?

Let's imagine two baseball players are comparing their batting averages. Given Simpson's Paradox, it is theoretically possible for Player A to have a consistently better batting average each year than Player B. However, when the averages are combined over a period of several years, Player B could have a better batting average. This exact scenario played out in 1995 and 1996 when mathematician Ken Ross compared the batting averages of Derek Jeter and David Justice. For each individual year, David Justice had a better batting average, however the combined averages over the two years was less than Jeter's combined average.

Applications of Simpson's Paradox

Simpson's Paradox is less a phenomenon to be applied than it is one to be uncovered and minimized. The effect itself can misconstrue data in such a way that an observer may incorrectly attribute the results to one factor or another. In 1973, UC Berkeley's admissions rate indicated that men were more likely to be admitted than women. However, a breakdown of each individual department's acceptance rate showed that, in fact, there was a small bias in favor of women. This example of Simpson's Paradox informed the admissions department that rather than there being a gender bias, women were more likely to choose competitive departments, and men were more likely to choose less competitive ones.