Measuring Association Between Labels and Free-Text Rationales

10/24/2020
by   Sarah Wiegreffe, et al.
0

Interpretable NLP has taking increasing interest in ensuring that explanations are faithful to the model's decision-making process. This property is crucial for machine learning researchers and practitioners using explanations to better understand models. While prior work focuses primarily on extractive rationales (a subset of the input elements), we investigate their less-studied counterpart: free-text natural language rationales. We demonstrate that existing models for faithful interpretability do not extend cleanly to tasks where free-text rationales are needed. We turn to models that jointly predict and rationalize, a common class of models for free-text rationalization whose faithfulness is not yet established. We propose measurements of label-rationale association, a necessary property of faithful rationales, for these models. Using our measurements, we show that a state-of-the-art joint model based on T5 has strengths and weaknesses for producing faithful rationales.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2021

Few-Shot Self-Rationalization with Natural Language Prompts

Self-rationalization models that predict task labels and generate free-t...
research
04/24/2023

Understanding and Predicting Human Label Variation in Natural Language Inference through Explanation

Human label variation (Plank 2022), or annotation disagreement, exists i...
research
06/25/2021

Rationale-Inspired Natural Language Explanations with Commonsense

Explainable machine learning models primarily justify predicted labels u...
research
07/02/2022

FRAME: Evaluating Simulatability Metrics for Free-Text Rationales

Free-text rationales aim to explain neural language model (LM) behavior ...
research
04/21/2020

Considering Likelihood in NLP Classification Explanations with Occlusion and Language Modeling

Recently, state-of-the-art NLP models gained an increasing syntactic and...
research
05/24/2022

Interpretation Quality Score for Measuring the Quality of interpretability methods

Machine learning (ML) models have been applied to a wide range of natura...
research
11/09/2021

DataWords: Getting Contrarian with Text, Structured Data and Explanations

Our goal is to build classification models using a combination of free-t...

Please sign up or login with your details

Forgot password? Click here to reset