Meaningfully Explaining a Model's Mistakes

06/24/2021
by   Abubakar Abid, et al.
12

Understanding and explaining the mistakes made by trained models is critical to many machine learning objectives, such as improving robustness, addressing concept drift, and mitigating biases. However, this is often an ad hoc process that involves manually looking at the model's mistakes on many test samples and guessing at the underlying reasons for those incorrect predictions. In this paper, we propose a systematic approach, conceptual explanation scores (CES), that explains why a classifier makes a mistake on a particular test sample(s) in terms of human-understandable concepts (e.g. this zebra is misclassified as a dog because of faint stripes). We base CES on two prior ideas: counterfactual explanations and concept activation vectors, and validate our approach on well-known pretrained models, showing that it explains the models' mistakes meaningfully. We also train new models with intentional and known spurious correlations, which CES successfully identifies from a single misclassified test sample. The code for CES is publicly available and can easily be applied to new models.

READ FULL TEXT
research
06/23/2020

Counterfactual Explanations of Concept Drift

The notion of concept drift refers to the phenomenon that the distributi...
research
03/16/2023

Model Based Explanations of Concept Drift

The notion of concept drift refers to the phenomenon that the distributi...
research
10/09/2021

Self-explaining Neural Network with Plausible Explanations

Explaining the predictions of complex deep learning models, often referr...
research
04/25/2022

Integrating Prior Knowledge in Post-hoc Explanations

In the field of eXplainable Artificial Intelligence (XAI), post-hoc inte...
research
12/18/2018

Interactive Naming for Explaining Deep Neural Networks: A Formative Study

We consider the problem of explaining the decisions of deep neural netwo...
research
05/31/2021

DISSECT: Disentangled Simultaneous Explanations via Concept Traversals

Explaining deep learning model inferences is a promising venue for scien...
research
10/28/2019

A Game Theoretic Approach to Class-wise Selective Rationalization

Selection of input features such as relevant pieces of text has become a...

Please sign up or login with your details

Forgot password? Click here to reset