Towards Robust Interpretability with Self-Explaining Neural Networks

06/20/2018
by   David Alvarez-Melis, et al.
0

Most recent work on interpretability of complex machine learning models has focused on estimating a posteriori explanations for previously trained models around specific predictions. Self-explaining models where interpretability plays a key role already during learning have received much less attention. We propose three desiderata for explanations in general -- explicitness, faithfulness, and stability -- and show that existing methods do not satisfy them. In response, we design self-explaining models in stages, progressively generalizing linear classifiers to complex yet architecturally explicit models. Faithfulness and stability are enforced via regularization specifically tailored to such models. Experimental results across various benchmark datasets show that our framework offers a promising direction for reconciling model complexity and interpretability.

READ FULL TEXT

page 17

page 18

page 19

research
05/27/2019

Analyzing the Interpretability Robustness of Self-Explaining Models

Recently, interpretable models called self-explaining models (SEMs) have...
research
06/18/2019

Model Explanations under Calibration

Explaining and interpreting the decisions of recommender systems are bec...
research
09/18/2023

On Model Explanations with Transferable Neural Pathways

Neural pathways as model explanations consist of a sparse set of neurons...
research
12/07/2022

Learning to Select Prototypical Parts for Interpretable Sequential Data Modeling

Prototype-based interpretability methods provide intuitive explanations ...
research
12/03/2020

Self-Explaining Structures Improve NLP Models

Existing approaches to explaining deep learning models in NLP usually su...
research
07/01/2020

In-Distribution Interpretability for Challenging Modalities

It is widely recognized that the predictions of deep neural networks are...
research
01/27/2022

LAP: An Attention-Based Module for Faithful Interpretation and Knowledge Injection in Convolutional Neural Networks

Despite the state-of-the-art performance of deep convolutional neural ne...

Please sign up or login with your details

Forgot password? Click here to reset