On the Faithfulness Measurements for Model Interpretations

04/18/2021
by   Fan Yin, et al.
0

Recent years have witnessed the emergence of a variety of post-hoc interpretations that aim to uncover how natural language processing (NLP) models make predictions. Despite the surge of new interpretations, it remains an open problem how to define and quantitatively measure the faithfulness of interpretations, i.e., to what extent they conform to the reasoning process behind the model. To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations, that quantify different notions of faithfulness, and propose novel paradigms to systematically evaluate interpretations in NLP. Our results show that the performance of interpretations under different criteria of faithfulness could vary substantially. Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial robustness domain. Empirical results show that our proposed methods achieve top performance under all three criteria. Along with experiments and analysis on both the text classification and the dependency parsing tasks, we come to a more comprehensive understanding of the diverse set of interpretations.

READ FULL TEXT
research
08/11/2021

Perturbing Inputs for Fragile Interpretations in Deep Natural Language Processing

Interpretability methods like Integrated Gradient and LIME are popular c...
research
01/07/2022

Decision problem of some bundled FOML fragments

Over increasing domain interpretations, ∃and ∀bundled fragments are deci...
research
11/17/2020

Impact of Accuracy on Model Interpretations

Model interpretations are often used in practice to extract real world i...
research
10/20/2021

Exploring the Relationship Between "Positive Risk Balance" and "Absence of Unreasonable Risk"

International discussions on the overarching topic of how to define and ...
research
05/23/2022

A Question-Answer Driven Approach to Reveal Affirmative Interpretations from Verbal Negations

This paper explores a question-answer driven approach to reveal affirmat...
research
06/08/2021

On the Lack of Robust Interpretability of Neural Text Classifiers

With the ever-increasing complexity of neural language models, practitio...
research
04/12/2022

A Comparative Study of Faithfulness Metrics for Model Interpretability Methods

Interpretation methods to reveal the internal reasoning processes behind...

Please sign up or login with your details

Forgot password? Click here to reset