ProPublica's COMPAS Data Revisited

06/11/2019
by   Matias Barenstein, et al.
0

In this paper I re-examine the COMPAS recidivism score and criminal history data collected by ProPublica in 2016, which has fueled intense debate and research in the nascent field of `algorithmic fairness' or `fair machine learning' over the past three years. ProPublica's COMPAS data is used in an ever-increasing number of studies to test various definitions and methodologies of algorithmic fairness. This paper takes a closer look at the actual datasets put together by ProPublica. By doing so, I find that ProPublica made an important data processing mistake when it created some of the key datasets most often used by other researchers. In particular, the datasets built to study the likelihood of recidivism within two years of the original COMPAS screening date. As I show in this paper, ProPublica made a mistake implementing the two-year sample cutoff rule for recidivists in such datasets (whereas it implemented an appropriate two-year sample cutoff rule for non-recidivists). As a result, ProPublica incorrectly kept a disproportionate share of recidivists. This data processing mistake leads to biased two-year recidivism datasets, with artificially high recidivism rates. This also affects the positive and negative predictive values. On the other hand, this data processing mistake does not impact some of the key statistical measures highlighted by ProPublica and other researchers, such as the false positive and false negative rates, nor the overall accuracy.

READ FULL TEXT

page 5

page 6

page 7

page 8

page 10

page 12

page 22

page 24

research
05/11/2022

Is calibration a fairness requirement? An argument from the point of view of moral philosophy and decision theory

In this paper, we provide a moral analysis of two criteria of statistica...
research
07/06/2020

Fairness in machine learning: against false positive rate equality as a measure of fairness

As machine learning informs increasingly consequential decisions, differ...
research
05/24/2022

Beyond Impossibility: Balancing Sufficiency, Separation and Accuracy

Among the various aspects of algorithmic fairness studied in recent year...
research
07/31/2018

The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning

In one broad class of supervised machine learning problems, researchers ...
research
04/01/2022

Measuring Diagnostic Test Performance Using Imperfect Reference Tests: A Partial Identification Approach

Diagnostic tests are almost never perfect. Studies quantifying their per...
research
03/16/2021

Predicting Early Dropout: Calibration and Algorithmic Fairness Considerations

In this work, the problem of predicting dropout risk in undergraduate st...
research
10/29/2021

The Golden Rule as a Heuristic to Measure the Fairness of Texts Using Machine Learning

To treat others as one would wish to be treated is a common formulation ...

Please sign up or login with your details

Forgot password? Click here to reset