Measuring Association Between Labels and Free-Text Rationales

by   Sarah Wiegreffe, et al.
Georgia Institute of Technology
Allen Institute for Artificial Intelligence

Interpretable NLP has taking increasing interest in ensuring that explanations are faithful to the model's decision-making process. This property is crucial for machine learning researchers and practitioners using explanations to better understand models. While prior work focuses primarily on extractive rationales (a subset of the input elements), we investigate their less-studied counterpart: free-text natural language rationales. We demonstrate that existing models for faithful interpretability do not extend cleanly to tasks where free-text rationales are needed. We turn to models that jointly predict and rationalize, a common class of models for free-text rationalization whose faithfulness is not yet established. We propose measurements of label-rationale association, a necessary property of faithful rationales, for these models. Using our measurements, we show that a state-of-the-art joint model based on T5 has strengths and weaknesses for producing faithful rationales.


page 1

page 2

page 3

page 4


Few-Shot Self-Rationalization with Natural Language Prompts

Self-rationalization models that predict task labels and generate free-t...

Understanding and Predicting Human Label Variation in Natural Language Inference through Explanation

Human label variation (Plank 2022), or annotation disagreement, exists i...

Rationale-Inspired Natural Language Explanations with Commonsense

Explainable machine learning models primarily justify predicted labels u...

FRAME: Evaluating Simulatability Metrics for Free-Text Rationales

Free-text rationales aim to explain neural language model (LM) behavior ...

Considering Likelihood in NLP Classification Explanations with Occlusion and Language Modeling

Recently, state-of-the-art NLP models gained an increasing syntactic and...

Interpretation Quality Score for Measuring the Quality of interpretability methods

Machine learning (ML) models have been applied to a wide range of natura...

DataWords: Getting Contrarian with Text, Structured Data and Explanations

Our goal is to build classification models using a combination of free-t...

Please sign up or login with your details

Forgot password? Click here to reset