More Than Words: Towards Better Quality Interpretations of Text Classifiers

12/23/2021
by   Muhammad Bilal Zafar, et al.
0

The large size and complex decision mechanisms of state-of-the-art text classifiers make it difficult for humans to understand their predictions, leading to a potential lack of trust by the users. These issues have led to the adoption of methods like SHAP and Integrated Gradients to explain classification decisions by assigning importance scores to input tokens. However, prior work, using different randomization tests, has shown that interpretations generated by these methods may not be robust. For instance, models making the same predictions on the test set may still lead to different feature importance rankings. In order to address the lack of robustness of token-based interpretability, we explore explanations at higher semantic levels like sentences. We use computational metrics and human subject studies to compare the quality of sentence-based interpretations against token-based ones. Our experiments show that higher-level feature attributions offer several advantages: 1) they are more robust as measured by the randomization tests, 2) they lead to lower variability when using approximation-based methods like SHAP, and 3) they are more intelligible to humans in situations where the linguistic coherence resides at a higher granularity level. Based on these findings, we show that token-based interpretability, while being a convenient first choice given the input interfaces of the ML models, is not the most effective one in all situations.

READ FULL TEXT
research
06/08/2021

On the Lack of Robust Interpretability of Neural Text Classifiers

With the ever-increasing complexity of neural language models, practitio...
research
08/11/2021

Perturbing Inputs for Fragile Interpretations in Deep Natural Language Processing

Interpretability methods like Integrated Gradient and LIME are popular c...
research
07/04/2022

Comparing Feature Importance and Rule Extraction for Interpretability on Text Data

Complex machine learning algorithms are used more and more often in crit...
research
11/16/2021

Will We Trust What We Don't Understand? Impact of Model Interpretability and Outcome Feedback on Trust in AI

Despite AI's superhuman performance in a variety of domains, humans are ...
research
04/24/2019

Generating Token-Level Explanations for Natural Language Inference

The task of Natural Language Inference (NLI) is widely modeled as superv...
research
04/25/2022

Can Rationalization Improve Robustness?

A growing line of work has investigated the development of neural NLP mo...
research
04/12/2022

A Comparative Study of Faithfulness Metrics for Model Interpretability Methods

Interpretation methods to reveal the internal reasoning processes behind...

Please sign up or login with your details

Forgot password? Click here to reset