Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations

03/10/2017
by   Andrew Slavin Ross, et al.
0

Neural networks are among the most accurate supervised learning methods in use today, but their opacity makes them difficult to trust in critical applications, especially when conditions in training differ from those in test. Recent work on explanations for black-box models has produced tools (e.g. LIME) to show the implicit rules behind predictions, which can help us identify when models are right for the wrong reasons. However, these methods do not scale to explaining entire datasets and cannot correct the problems they reveal. We introduce a method for efficiently explaining and regularizing differentiable models by examining and selectively penalizing their input gradients, which provide a normal to the decision boundary. We apply these penalties both based on expert annotation and in an unsupervised fashion that encourages diverse models with qualitatively different decision boundaries for the same classification problem. On multiple datasets, we show our approach generates faithful explanations and models that generalize much better when conditions differ between training and test.

READ FULL TEXT

page 4

page 5

page 6

page 7

research
09/29/2018

Training Machine Learning Models by Regularizing their Explanations

Neural networks are among the most accurate supervised learning methods ...
research
01/15/2020

Right for the Wrong Scientific Reasons: Revising Deep Networks by Interacting with their Explanations

Deep neural networks have shown excellent performances in many real-worl...
research
10/07/2020

Why do you think that? Exploring Faithful Sentence-Level Rationales Without Supervision

Evaluating the trustworthiness of a model's prediction is essential for ...
research
05/29/2023

Faithfulness Tests for Natural Language Explanations

Explanations of neural models aim to reveal a model's decision-making pr...
research
10/13/2022

Self-explaining deep models with logic rule reasoning

We present SELOR, a framework for integrating self-explaining capabiliti...
research
03/22/2021

Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals

There has been a recent resurgence of interest in explainable artificial...
research
03/21/2023

Using Explanations to Guide Models

Deep neural networks are highly performant, but might base their decisio...

Please sign up or login with your details

Forgot password? Click here to reset