DeepAI AI Chat
Log In Sign Up

QUACKIE: A NLP Classification Task With Ground Truth Explanations

by   Yves Rychener, et al.

NLP Interpretability aims to increase trust in model predictions. This makes evaluating interpretability approaches a pressing issue. There are multiple datasets for evaluating NLP Interpretability, but their dependence on human provided ground truths raises questions about their unbiasedness. In this work, we take a different approach and formulate a specific classification task by diverting question-answering datasets. For this custom classification task, the interpretability ground-truth arises directly from the definition of the classification problem. We use this method to propose a benchmark and lay the groundwork for future research in NLP interpretability by evaluating a wide range of current state of the art methods.


page 1

page 2

page 3

page 4


BIM: Towards Quantitative Evaluation of Interpretability Methods with Ground Truth

Interpretability is rising as an important area of research in machine l...

Improvement of Classification in One-Stage Detector

RetinaNet proposed Focal Loss for classification task and improved one-s...

ERASER: A Benchmark to Evaluate Rationalized NLP Models

State-of-the-art models in NLP are now predominantly based on deep neura...

Understanding How BERT Learns to Identify Edits

Pre-trained transformer language models such as BERT are ubiquitous in N...

ECG Feature Importance Rankings: Cardiologists vs. Algorithms

Feature importance methods promise to provide a ranking of features acco...

A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets

The development of large language models (LLMs) such as ChatGPT has brou...