Analyzing the Interpretability Robustness of Self-Explaining Models

05/27/2019
by   Haizhong Zheng, et al.
0

Recently, interpretable models called self-explaining models (SEMs) have been proposed with the goal of providing interpretability robustness. We evaluate the interpretability robustness of SEMs and show that explanations provided by SEMs as currently proposed are not robust to adversarial inputs. Specifically, we successfully created adversarial inputs that do not change the model outputs but cause significant changes in the explanations. We find that even though current SEMs use stable co-efficients for mapping explanations to output labels, they do not consider the robustness of the first stage of the model that creates interpretable basis concepts from the input, leading to non-robust explanations. Our work makes a case for future work to start examining how to generate interpretable basis concepts in a robust way.

READ FULL TEXT
research
06/21/2018

On the Robustness of Interpretability Methods

We argue that robustness of explanations---i.e., that similar inputs sho...
research
06/20/2018

Towards Robust Interpretability with Self-Explaining Neural Networks

Most recent work on interpretability of complex machine learning models ...
research
12/07/2022

Learning to Select Prototypical Parts for Interpretable Sequential Data Modeling

Prototype-based interpretability methods provide intuitive explanations ...
research
03/11/2023

Robust Learning from Explanations

Machine learning from explanations (MLX) is an approach to learning that...
research
05/25/2023

Rectifying Group Irregularities in Explanations for Distribution Shift

It is well-known that real-world changes constituting distribution shift...
research
12/03/2020

Interpretable Graph Capsule Networks for Object Recognition

Capsule Networks, as alternatives to Convolutional Neural Networks, have...
research
05/03/2023

Explaining Language Models' Predictions with High-Impact Concepts

The emergence of large-scale pretrained language models has posed unprec...

Please sign up or login with your details

Forgot password? Click here to reset