Have We Learned to Explain?: How Interpretability Methods Can Learn to Encode Predictions in their Interpretations

03/02/2021
by   Neil Jethani, et al.
0

While the need for interpretable machine learning has been established, many common approaches are slow, lack fidelity, or hard to evaluate. Amortized explanation methods reduce the cost of providing interpretations by learning a global selector model that returns feature importances for a single instance of data. The selector model is trained to optimize the fidelity of the interpretations, as evaluated by a predictor model for the target. Popular methods learn the selector and predictor model in concert, which we show allows predictions to be encoded within interpretations. We introduce EVAL-X as a method to quantitatively evaluate interpretations and REAL-X as an amortized explanation method, which learn a predictor model that approximates the true data generating distribution given any subset of the input. We show EVAL-X can detect when predictions are encoded in interpretations and show the advantages of REAL-X through quantitative and radiologist evaluation.

READ FULL TEXT
research
06/08/2021

On the Lack of Robust Interpretability of Neural Text Classifiers

With the ever-increasing complexity of neural language models, practitio...
research
04/26/2020

An Extension of LIME with Improvement of Interpretability and Fidelity

While deep learning makes significant achievements in Artificial Intelli...
research
06/16/2021

Developing a Fidelity Evaluation Approach for Interpretable Machine Learning

Although modern machine learning and deep learning methods allow for com...
research
10/15/2020

Altruist: Argumentative Explanations through Local Interpretations of Predictive Models

Interpretable machine learning is an emerging field providing solutions ...
research
12/06/2022

Explainability as statistical inference

A wide variety of model explanation approaches have been proposed in rec...
research
01/14/2019

Interpretable machine learning: definitions, methods, and applications

Machine-learning models have demonstrated great success in learning comp...
research
04/12/2021

Evaluating Saliency Methods for Neural Language Models

Saliency methods are widely used to interpret neural network predictions...

Please sign up or login with your details

Forgot password? Click here to reset