Detecting Bias in Black-Box Models Using Transparent Model Distillation

10/17/2017
by   Sarah Tan, et al.
0

Black-box risk scoring models permeate our lives, yet are typically proprietary and opaque. We propose a transparent model distillation approach to detect bias in such models. Model distillation was originally designed to distill knowledge from a large, complex teacher model to a faster, simpler student model without significant loss in prediction accuracy. We add a third restriction - transparency. In this paper we use data sets that contain two labels to train on: the risk score predicted by a black-box model, as well as the actual outcome the risk score was intended to predict. This allows us to compare models that predict each label. For a particular class of student models - interpretable tree additive models with pairwise interactions (GA2Ms) - we provide confidence intervals for the difference between the risk score and actual outcome models. This presents a new method for detecting bias in black-box risk scores by assessing if contributions of protected features to the risk score are statistically different from their contributions to the actual outcome.

READ FULL TEXT
research
01/26/2018

Transparent Model Distillation

Model distillation was originally designed to distill knowledge from a l...
research
08/12/2022

RuDi: Explaining Behavior Sequence Models by Automatic Statistics Generation and Rule Distillation

Risk scoring systems have been widely deployed in many applications, whi...
research
10/30/2019

Distilling Black-Box Travel Mode Choice Model for Behavioral Interpretation

Machine learning has proved to be very successful for making predictions...
research
06/09/2022

Distillation Decision Tree

Black-box machine learning models are criticized as lacking interpretabi...
research
11/22/2022

A Generic Approach for Reproducible Model Distillation

Model distillation has been a popular method for producing interpretable...
research
05/22/2023

Risk Scores, Label Bias, and Everything but the Kitchen Sink

In designing risk assessment algorithms, many scholars promote a "kitche...
research
03/08/2023

Learning Hybrid Interpretable Models: Theory, Taxonomy, and Methods

A hybrid model involves the cooperation of an interpretable model and a ...

Please sign up or login with your details

Forgot password? Click here to reset