Computational Social Scientist Beware: Simpson's Paradox in Behavioral Data

10/24/2017
by   Kristina Lerman, et al.
0

Observational data about human behavior is often heterogeneous, i.e., generated by subgroups within the population under study that vary in size and behavior. Heterogeneity predisposes analysis to Simpson's paradox, whereby the trends observed in data that has been aggregated over the entire population may be substantially different from those of the underlying subgroups. I illustrate Simpson's paradox with several examples coming from studies of online behavior and show that aggregate response leads to wrong conclusions about the underlying individual behavior. I then present a simple method to test whether Simpson's paradox is affecting results of analysis. The presence of Simpson's paradox in social data suggests that important behavioral differences exist within the population, and failure to take these differences into account can distort the studies' findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2018

Can you Trust the Trend: Discovering Simpson's Paradoxes in Social Data

We investigate how Simpson's paradox affects analysis of trends in socia...
research
05/08/2018

Using Simpson's Paradox to Discover Interesting Patterns in Behavioral Data

We describe a data-driven discovery method that leverages Simpson's para...
research
05/11/2020

Collecting big behavioral data for measuring behavior against obesity

Obesity is currently affecting very large portions of the global populat...
research
03/03/2022

The world seems different in a social context: a neural network analysis of human experimental data

Human perception and behavior are affected by the situational context, i...
research
02/14/2022

Automatic Generation of Individual Fuzzy Cognitive Maps from Longitudinal Data

Fuzzy Cognitive Maps (FCMs) are computational models that represent how ...
research
06/11/2021

Interpreting Expert Annotation Differences in Animal Behavior

Hand-annotated data can vary due to factors such as subjective differenc...
research
02/23/2020

Sample Debiasing in the Themis Open World Database System (Extended Version)

Open world database management systems assume tuples not in the database...

Please sign up or login with your details

Forgot password? Click here to reset