Debugging Tests for Model Explanations

by   Julius Adebayo, et al.

We investigate whether post-hoc model explanations are effective for diagnosing model errors–model debugging. In response to the challenge of explaining a model's prediction, a vast array of explanation methods have been proposed. Despite increasing use, it is unclear if they are effective. To start, we categorize bugs, based on their source, into: data, model, and test-time contamination bugs. For several explanation methods, we assess their ability to: detect spurious correlation artifacts (data contamination), diagnose mislabeled training examples (data contamination), differentiate between a (partially) re-initialized model and a trained one (model contamination), and detect out-of-distribution inputs (test-time contamination). We find that the methods tested are able to diagnose a spurious background bug, but not conclusively identify mislabeled training examples. In addition, a class of methods, that modify the back-propagation algorithm are invariant to the higher layer parameters of a deep network; hence, ineffective for diagnosing model contamination. We complement our analysis with a human subject study, and find that subjects fail to identify defective models using attributions, but instead rely, primarily, on model predictions. Taken together, our results provide guidance for practitioners and researchers turning to explanations as tools for model debugging.


page 2

page 4

page 7

page 8

page 20

page 21

page 25

page 35


Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation

We investigate whether three types of post hoc model explanations–featur...

On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples

Local explanation methods such as LIME have become popular in MIR as too...

Explaining Software Bugs Leveraging Code Structures in Neural Machine Translation

Software bugs claim approximately 50 economy billions of dollars. Once a...

Revisiting Methods for Finding Influential Examples

Several instance-based explainability methods for finding influential tr...

Right for the Wrong Reason: Can Interpretable ML Techniques Detect Spurious Correlations?

While deep neural network models offer unmatched classification performa...

This looks more like that: Enhancing Self-Explaining Models by Prototypical Relevance Propagation

Current machine learning models have shown high efficiency in solving a ...

Explaining Full-disk Deep Learning Model for Solar Flare Prediction using Attribution Methods

This paper contributes to the growing body of research on deep learning ...

Please sign up or login with your details

Forgot password? Click here to reset