DeepAI AI Chat
Log In Sign Up

The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations

by   Aparna Balagopalan, et al.

Machine learning models in safety-critical settings like healthcare are often blackboxes: they contain a large number of parameters which are not transparent to users. Post-hoc explainability methods where a simple, human-interpretable model imitates the behavior of these blackbox models are often proposed to help users trust model predictions. In this work, we audit the quality of such explanations for different protected subgroups using real data from four settings in finance, healthcare, college admissions, and the US justice system. Across two different blackbox model architectures and four popular explainability methods, we find that the approximation quality of explanation models, also known as the fidelity, differs significantly between subgroups. We also demonstrate that pairing explainability methods with recent advances in robust machine learning can improve explanation fairness in some settings. However, we highlight the importance of communicating details of non-zero fidelity gaps to users, since a single solution might not exist across all settings. Finally, we discuss the implications of unfair explanation models as a challenging and understudied problem facing the machine learning community.


page 1

page 2

page 3

page 4


Explaining Predictions from Machine Learning Models: Algorithms, Users, and Pedagogy

Model explainability has become an important problem in machine learning...

On the Trustworthiness of Tree Ensemble Explainability Methods

The recent increase in the deployment of machine learning models in crit...

Agree to Disagree: When Deep Learning Models With Identical Architectures Produce Distinct Explanations

Deep Learning of neural networks has progressively become more prominent...

Network Analysis for Explanation

Safety critical systems strongly require the quality aspects of artifici...

Parameter Averaging for Robust Explainability

Neural Networks are known to be sensitive to initialisation. The explana...

Can Users Correctly Interpret Machine Learning Explanations and Simultaneously Identify Their Limitations?

Automated decision-making systems are becoming increasingly ubiquitous, ...