Contrastive Explanations for Model Interpretability

03/02/2021
by   Alon Jacovi, et al.
35

Contrastive explanations clarify why an event occurred in contrast to another. They are more inherently intuitive to humans to both produce and comprehend. We propose a methodology to produce contrastive explanations for classification models by modifying the representation to disregard non-contrastive information, and modifying model behavior to only be based on contrastive reasoning. Our method is based on projecting model representation to a latent space that captures only the features that are useful (to the model) to differentiate two potential decisions. We demonstrate the value of contrastive explanations by analyzing two different scenarios, using both high-level abstract concept attribution and low-level input token/span attribution, on two widely used text classification tasks. Specifically, we produce explanations for answering: for which label, and against which alternative label, is some aspect of the input useful? And which aspects of the input are useful for and against particular decisions? Overall, our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.

READ FULL TEXT

page 1

page 3

research
12/27/2020

Explaining NLP Models via Minimal Contrastive Editing (MiCE)

Humans give contrastive explanations that explain why an observed event ...
research
08/01/2020

Contrastive Explanations in Neural Networks

Visual explanations are logical arguments based on visual features that ...
research
07/06/2023

Contrast Is All You Need

In this study, we analyze data-scarce classification scenarios, where av...
research
04/09/2021

Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals

Token-level attributions have been extensively studied to explain model ...
research
01/18/2021

Generative Counterfactuals for Neural Networks via Attribute-Informed Perturbation

With the wide use of deep neural networks (DNN), model interpretability ...
research
08/20/2021

VAE-CE: Visual Contrastive Explanation using Disentangled VAEs

The goal of a classification model is to assign the correct labels to da...
research
05/29/2019

Generating Contrastive Explanations with Monotonic Attribute Functions

Explaining decisions of deep neural networks is a hot research topic wit...

Please sign up or login with your details

Forgot password? Click here to reset