Don't be fooled: label leakage in explanation methods and the importance of their quantitative evaluation

02/24/2023
by   Neil Jethani, et al.
0

Feature attribution methods identify which features of an input most influence a model's output. Most widely-used feature attribution methods (such as SHAP, LIME, and Grad-CAM) are "class-dependent" methods in that they generate a feature attribution vector as a function of class. In this work, we demonstrate that class-dependent methods can "leak" information about the selected class, making that class appear more likely than it is. Thus, an end user runs the risk of drawing false conclusions when interpreting an explanation generated by a class-dependent method. In contrast, we introduce "distribution-aware" methods, which favor explanations that keep the label's distribution close to its distribution given all features of the input. We introduce SHAP-KL and FastSHAP-KL, two baseline distribution-aware methods that compute Shapley values. Finally, we perform a comprehensive evaluation of seven class-dependent and three distribution-aware methods on three clinical datasets of different high-dimensional data types: images, biosignals, and text.

READ FULL TEXT
research
03/14/2022

Rethinking Stability for Attribution-based Explanations

As attribution-based explanation methods are increasingly used to establ...
research
01/23/2021

Show or Suppress? Managing Input Uncertainty in Machine Learning Model Explanations

Feature attribution is widely used in interpretable machine learning to ...
research
07/10/2019

Explaining an increase in predicted risk for clinical alerts

Much work aims to explain a model's prediction on a static input. We con...
research
03/23/2022

On Understanding the Influence of Controllable Factors with a Feature Attribution Algorithm: a Medical Case Study

Feature attribution XAI algorithms enable their users to gain insight in...
research
11/11/2020

GANMEX: One-vs-One Attributions using GAN-based Model Explainability

Attribution methods have been shown as promising approaches for identify...
research
09/23/2020

Information-Theoretic Visual Explanation for Black-Box Classifiers

In this work, we attempt to explain the prediction of any black-box clas...
research
11/14/2021

"Will You Find These Shortcuts?" A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification

Feature attribution a.k.a. input salience methods which assign an import...

Please sign up or login with your details

Forgot password? Click here to reset