Explaining Classes through Word Attribution

08/31/2021
by   Samuel Rönnqvist, et al.
0

In recent years, several methods have been proposed for explaining individual predictions of deep learning models, yet there has been little study of how to aggregate these predictions to explain how such models view classes as a whole in text classification tasks. In this work, we propose a method for explaining classes using deep learning models and the Integrated Gradients feature attribution technique by aggregating explanations of individual examples in text classification to general descriptions of the classes. We demonstrate the approach on Web register (genre) classification using the XML-R model and the Corpus of Online Registers of English (CORE), finding that the method identifies plausible and discriminative keywords characterizing all but the smallest class.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2019

Towards Aggregating Weighted Feature Attributions

Current approaches for explaining machine learning models fall into two ...
research
01/01/2021

On Explaining Your Explanations of BERT: An Empirical Study with Sequence Classification

BERT, as one of the pretrianed language models, attracts the most attent...
research
11/08/2018

Looking Deeper into Deep Learning Model: Attribution-based Explanations of TextCNN

Layer-wise Relevance Propagation (LRP) and saliency maps have been recen...
research
10/22/2021

Double Trouble: How to not explain a text classifier's decisions using counterfactuals synthesized by masked language models?

Explaining how important each input feature is to a classifier's decisio...
research
06/19/2019

Incorporating Priors with Feature Attribution on Text Classification

Feature attribution methods, proposed recently, help users interpret the...
research
09/27/2017

Case Study: Explaining Diabetic Retinopathy Detection Deep CNNs via Integrated Gradients

In this report, we applied integrated gradients to explaining a neural n...
research
05/06/2022

Necessity and Sufficiency for Explaining Text Classifiers: A Case Study in Hate Speech Detection

We present a novel feature attribution method for explaining text classi...

Please sign up or login with your details

Forgot password? Click here to reset