Fairwashing: the risk of rationalization

01/28/2019
by   Ulrich Aïvodji, et al.
0

Black-box explanation is the problem of explaining how a machine learning model -- whose internal logic is hidden to the auditor and generally complex -- produces its outcomes. Current approaches for solving this problem include model explanation, outcome explanation as well as model inspection. While these techniques can be beneficial by providing interpretability, they can be used in a negative manner to perform fairwashing, which we define as promoting the perception that a machine learning model respects some ethical values while it might not be the case. In particular, we demonstrate that it is possible to systematically rationalize decisions taken by an unfair black-box model using the model explanation as well as the outcome explanation approaches with a given fairness metric. Our solution, LaundryML, is based on a regularized rule list enumeration algorithm whose objective is to search for fair rule lists approximating an unfair black-box model. We empirically evaluate our rationalization technique on black-box models trained on real-world datasets and show that one can obtain rule lists with high fidelity to the black-box model while being considerably less unfair at the same time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2021

Characterizing the risk of fairwashing

Fairwashing refers to the risk that an unfair black-box model can be exp...
research
02/06/2018

A Survey Of Methods For Explaining Black Box Models

In the last years many accurate decision support systems have been const...
research
10/15/2020

Interpreting Deep Learning Model Using Rule-based Method

Deep learning models are favored in many research and industry areas and...
research
10/12/2021

A Rate-Distortion Framework for Explaining Black-box Model Decisions

We present the Rate-Distortion Explanation (RDE) framework, a mathematic...
research
09/18/2017

Human Understandable Explanation Extraction for Black-box Classification Models Based on Matrix Factorization

In recent years, a number of artificial intelligent services have been d...
research
07/29/2022

SHAP for additively modeled features in a boosted trees model

An important technique to explore a black-box machine learning (ML) mode...
research
03/10/2020

Metafeatures-based Rule-Extraction for Classifiers on Behavioral and Textual Data

Machine learning using behavioral and text data can result in highly acc...

Please sign up or login with your details

Forgot password? Click here to reset