Interpreting Black Box Models with Statistical Guarantees

03/29/2019
by   Collin Burns, et al.
0

While many methods for interpreting machine learning models have been proposed, they are frequently ad hoc, difficult to evaluate, and come with no statistical guarantees on the error rate. This is especially problematic in scientific domains, where interpretations must be accurate and reliable. In this paper, we cast black box model interpretation as a hypothesis testing problem. The task is to discover "important" features by testing whether the model prediction is significantly different from what would be expected if the features were replaced with randomly-sampled counterfactuals. We derive a multiple hypothesis testing framework for finding important features that enables control over the false discovery rate. We propose two testing methods, as well as analogs of one-sided and two-sided tests. In simulation, the methods have high power and compare favorably against existing interpretability methods. When applied to vision and language models, the framework selects features that intuitively explain model predictions.

READ FULL TEXT

page 6

page 13

page 14

page 15

page 16

research
07/21/2020

An Interpretable Probabilistic Approach for Demystifying Black-box Predictive Models

The use of sophisticated machine learning models for critical decision m...
research
02/02/2023

Hypothesis Testing and Machine Learning: Interpreting Variable Effects in Deep Artificial Neural Networks using Cohen's f2

Deep artificial neural networks show high predictive performance in many...
research
06/23/2022

Backward baselines: Is your model predicting the past?

When does a machine learning model predict the future of individuals and...
research
07/31/2020

Deep Direct Likelihood Knockoffs

Predictive modeling often uses black box machine learning methods, such ...
research
11/18/2018

Understanding Learned Models by Identifying Important Features at the Right Resolution

In many application domains, it is important to characterize how complex...
research
10/03/2021

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

We introduce Learn then Test, a framework for calibrating machine learni...
research
10/07/2021

Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates

Among the most critical limitations of deep learning NLP models are thei...

Please sign up or login with your details

Forgot password? Click here to reset