The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations

05/06/2022
by   Aparna Balagopalan, et al.
0

Machine learning models in safety-critical settings like healthcare are often blackboxes: they contain a large number of parameters which are not transparent to users. Post-hoc explainability methods where a simple, human-interpretable model imitates the behavior of these blackbox models are often proposed to help users trust model predictions. In this work, we audit the quality of such explanations for different protected subgroups using real data from four settings in finance, healthcare, college admissions, and the US justice system. Across two different blackbox model architectures and four popular explainability methods, we find that the approximation quality of explanation models, also known as the fidelity, differs significantly between subgroups. We also demonstrate that pairing explainability methods with recent advances in robust machine learning can improve explanation fairness in some settings. However, we highlight the importance of communicating details of non-zero fidelity gaps to users, since a single solution might not exist across all settings. Finally, we discuss the implications of unfair explanation models as a challenging and understudied problem facing the machine learning community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2022

Explaining Predictions from Machine Learning Models: Algorithms, Users, and Pedagogy

Model explainability has become an important problem in machine learning...
research
09/30/2021

On the Trustworthiness of Tree Ensemble Explainability Methods

The recent increase in the deployment of machine learning models in crit...
research
05/14/2021

Agree to Disagree: When Deep Learning Models With Identical Architectures Produce Distinct Explanations

Deep Learning of neural networks has progressively become more prominent...
research
10/14/2021

Brittle interpretations: The Vulnerability of TCAV and Other Concept-based Explainability Tools to Adversarial Attack

Methods for model explainability have become increasingly critical for t...
research
12/07/2017

Network Analysis for Explanation

Safety critical systems strongly require the quality aspects of artifici...
research
08/05/2022

Parameter Averaging for Robust Explainability

Neural Networks are known to be sensitive to initialisation. The explana...
research
09/15/2023

Can Users Correctly Interpret Machine Learning Explanations and Simultaneously Identify Their Limitations?

Automated decision-making systems are becoming increasingly ubiquitous, ...

Please sign up or login with your details

Forgot password? Click here to reset