Post-hoc Interpretability for Neural NLP: A Survey

08/10/2021
by   Andreas Madsen, et al.
23

Natural Language Processing (NLP) models have become increasingly more complex and widespread. With recent developments in neural networks, a growing concern is whether it is responsible to use these models. Concerns such as safety and ethics can be partially addressed by providing explanations. Furthermore, when models do fail, providing explanations is paramount for accountability purposes. To this end, interpretability serves to provide these explanations in terms that are understandable to humans. Central to what is understandable is how explanations are communicated. Therefore, this survey provides a categorization of how recent interpretability methods communicate explanations and discusses the methods in depth. Furthermore, the survey focuses on post-hoc methods, which provide explanations after a model is learned and generally model-agnostic. A common concern for this class of methods is whether they accurately reflect the model. Hence, how these post-hoc methods are evaluated is discussed throughout the paper.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/25/2022

Testing the effectiveness of saliency-based explainability in NLP using randomized survey-based experiments

As the applications of Natural Language Processing (NLP) in sensitive ar...
research
12/10/2022

Identifying the Source of Vulnerability in Explanation Discrepancy: A Case Study in Neural Text Classification

Some recent works observed the instability of post-hoc explanations when...
research
07/22/2019

The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations

Post-hoc interpretability approaches have been proven to be powerful too...
research
03/01/2021

ToxCCIn: Toxic Content Classification with Interpretability

Despite the recent successes of transformer-based models in terms of eff...
research
12/24/2020

Sentence-Based Model Agnostic NLP Interpretability

Today, interpretability of Black-Box Natural Language Processing (NLP) m...
research
05/24/2022

Interpretation Quality Score for Measuring the Quality of interpretability methods

Machine learning (ML) models have been applied to a wide range of natura...
research
04/04/2019

A Categorisation of Post-hoc Explanations for Predictive Models

The ubiquity of machine learning based predictive models in modern socie...

Please sign up or login with your details

Forgot password? Click here to reset