COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP tasks

05/11/2023
by   Fanny Jourdan, et al.
0

Transformer architectures are complex and their use in NLP, while it has engendered many successes, makes their interpretability or explainability challenging. Recent debates have shown that attention maps and attribution methods are unreliable (Pruthi et al., 2019; Brunner et al., 2019). In this paper, we present some of their limitations and introduce COCKATIEL, which successfully addresses some of them. COCKATIEL is a novel, post-hoc, concept-based, model-agnostic XAI technique that generates meaningful explanations from the last layer of a neural net model trained on an NLP classification task by using Non-Negative Matrix Factorization (NMF) to discover the concepts the model leverages to make predictions and by exploiting a Sensitivity Analysis to estimate accurately the importance of each of these concepts for the model. It does so without compromising the accuracy of the underlying model or requiring a new one to be trained. We conduct experiments in single and multi-aspect sentiment analysis tasks and we show COCKATIEL's superior ability to discover concepts that align with humans' on Transformer models without any supervision, we objectively verify the faithfulness of its explanations through fidelity metrics, and we showcase its ability to provide meaningful explanations in two different datasets.

READ FULL TEXT

page 7

page 14

research
06/27/2020

Improving Interpretability of CNN Models Using Non-Negative Concept Activation Vectors

Convolutional neural network (CNN) models for computer vision are powerf...
research
05/06/2022

The Unreliability of Explanations in Few-Shot In-Context Learning

How can prompting a large language model like GPT-3 with explanations im...
research
11/17/2022

CRAFT: Concept Recursive Activation FacTorization for Explainability

Attribution methods are a popular class of explainability methods that u...
research
05/03/2023

Explaining Language Models' Predictions with High-Impact Concepts

The emergence of large-scale pretrained language models has posed unprec...
research
12/24/2022

Rank-LIME: Local Model-Agnostic Feature Attribution for Learning to Rank

Understanding why a model makes certain predictions is crucial when adap...
research
08/25/2021

Inducing Semantic Grouping of Latent Concepts for Explanations: An Ante-Hoc Approach

Self-explainable deep models are devised to represent the hidden concept...
research
08/31/2021

Thermostat: A Large Collection of NLP Model Explanations and Analysis Tools

In the language domain, as in other domains, neural explainability takes...

Please sign up or login with your details

Forgot password? Click here to reset