Log In Sign Up

A Fine-grained Interpretability Evaluation Benchmark for Neural NLP

by   Lijie Wang, et al.

While there is increasing concern about the interpretability of neural models, the evaluation of interpretability remains an open problem, due to the lack of proper evaluation datasets and metrics. In this paper, we present a novel benchmark to evaluate the interpretability of both neural models and saliency methods. This benchmark covers three representative NLP tasks: sentiment analysis, textual similarity and reading comprehension, each provided with both English and Chinese annotated data. In order to precisely evaluate the interpretability, we provide token-level rationales that are carefully annotated to be sufficient, compact and comprehensive. We also design a new metric, i.e., the consistency between the rationales before and after perturbations, to uniformly evaluate the interpretability of models and saliency methods on different tasks. Based on this benchmark, we conduct experiments on three typical models with three saliency methods, and unveil their strengths and weakness in terms of interpretability. We will release this benchmark at <https://xyz> and hope it can facilitate the research in building trustworthy systems.


page 1

page 2

page 3

page 4


DuTrust: A Sentiment Analysis Dataset for Trustworthiness Evaluation

While deep learning models have greatly improved the performance of most...

An Interpretability Evaluation Benchmark for Pre-trained Language Models

While pre-trained language models (LMs) have brought great improvements ...

What do different evaluation metrics tell us about saliency models?

How best to evaluate a saliency model's ability to predict where humans ...

Comparing radiologists' gaze and saliency maps generated by interpretability methods for chest x-rays

The interpretability of medical image analysis models is considered a ke...

Understanding Neural Networks through Representation Erasure

While neural networks have been successfully applied to many natural lan...

Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models

In the last year, new neural architectures and multilingual pre-trained ...

ERASER: A Benchmark to Evaluate Rationalized NLP Models

State-of-the-art models in NLP are now predominantly based on deep neura...