Unifying Model Explainability and Robustness via Machine-Checkable Concepts

07/01/2020
by   Vedant Nanda, et al.
8

As deep neural networks (DNNs) get adopted in an ever-increasing number of applications, explainability has emerged as a crucial desideratum for these models. In many real-world tasks, one of the principal reasons for requiring explainability is to in turn assess prediction robustness, where predictions (i.e., class labels) that do not conform to their respective explanations (e.g., presence or absence of a concept in the input) are deemed to be unreliable. However, most, if not all, prior methods for checking explanation-conformity (e.g., LIME, TCAV, saliency maps) require significant manual intervention, which hinders their large-scale deployability. In this paper, we propose a robustness-assessment framework, at the core of which is the idea of using machine-checkable concepts. Our framework defines a large number of concepts that the DNN explanations could be based on and performs the explanation-conformity check at test time to assess prediction robustness. Both steps are executed in an automated manner without requiring any human intervention and are easily scaled to datasets with a very large number of classes. Experiments on real-world datasets and human surveys show that our framework is able to enhance prediction robustness significantly: the predictions marked to be robust by our framework have significantly higher accuracy and are more robust to adversarial perturbations.

READ FULL TEXT

page 6

page 14

page 17

page 19

page 21

page 22

research
11/09/2022

On the Robustness of Explanations of Deep Neural Network Models: A Survey

Explainability has been widely stated as a cornerstone of the responsibl...
research
06/14/2023

Selective Concept Models: Permitting Stakeholder Customisation at Test-Time

Concept-based models perform prediction using a set of concepts that are...
research
06/27/2022

RES: A Robust Framework for Guiding Visual Explanation

Despite the fast progress of explanation techniques in modern Deep Neura...
research
09/13/2022

Concept-Based Explanations for Tabular Data

The interpretability of machine learning models has been an essential ar...
research
09/07/2023

Automatic Concept Embedding Model (ACEM): No train-time concepts, No issue!

Interpretability and explainability of neural networks is continuously i...
research
08/31/2022

Formalising the Robustness of Counterfactual Explanations for Neural Networks

The use of counterfactual explanations (CFXs) is an increasingly popular...
research
03/06/2023

IFAN: An Explainability-Focused Interaction Framework for Humans and NLP Models

Interpretability and human oversight are fundamental pillars of deployin...

Please sign up or login with your details

Forgot password? Click here to reset