Interpreting Neural Networks With Nearest Neighbors

09/08/2018
by   Eric Wallace, et al.
0

Local model interpretation methods explain individual predictions by assigning an importance value to each input feature. This value is often determined by measuring the change in confidence when a feature is removed. However, the confidence of neural networks is not a robust measure of model uncertainty. This issue makes reliably judging the importance of the input features difficult. We address this by changing the test-time behavior of neural networks using Deep k-Nearest Neighbors. Without harming text classification accuracy, this algorithm provides a more robust uncertainty metric which we use to generate feature importance values. The resulting interpretations better align with human perception than baseline methods. Finally, we use our interpretation method to analyze model predictions on dataset annotation artifacts.

READ FULL TEXT
research
10/29/2017

Interpretation of Neural Networks is Fragile

In order for machine learning to be deployed and trusted in many applica...
research
10/18/2019

Many Faces of Feature Importance: Comparing Built-in and Post-hoc Feature Importance in Text Classification

Feature importance is commonly used to explain machine predictions. Whil...
research
10/01/2019

Randomized Ablation Feature Importance

Given a model f that predicts a target y from a vector of input features...
research
03/21/2019

Interpreting Neural Networks Using Flip Points

Neural networks have been criticized for their lack of easy interpretati...
research
06/08/2020

A Baseline for Shapely Values in MLPs: from Missingness to Neutrality

Being able to explain a prediction as well as having a model that perfor...
research
03/27/2013

Flexible Interpretations: A Computational Model for Dynamic Uncertainty Assessment

The investigations reported in this paper center on the process of dynam...
research
07/24/2023

Feature Gradient Flow for Interpreting Deep Neural Networks in Head and Neck Cancer Prediction

This paper introduces feature gradient flow, a new technique for interpr...

Please sign up or login with your details

Forgot password? Click here to reset