Evaluating and Characterizing Human Rationales

10/09/2020
by   Samuel Carton, et al.
0

Two main approaches for evaluating the quality of machine-generated rationales are: 1) using human rationales as a gold standard; and 2) automated metrics based on how rationales affect model behavior. An open question, however, is how human rationales fare with these automatic metrics. Analyzing a variety of datasets and models, we find that human rationales do not necessarily perform well on these metrics. To unpack this finding, we propose improved metrics to account for model-dependent baseline performance. We then propose two methods to further characterize rationale quality, one based on model retraining and one on using "fidelity curves" to reveal properties such as irrelevance and redundancy. Our work leads to actionable suggestions for evaluating and characterizing rationales.

READ FULL TEXT

page 5

page 13

research
10/06/2020

GRUEN for Evaluating Linguistic Quality of Generated Text

Automatic evaluation metrics are indispensable for evaluating generated ...
research
07/06/2018

The price of debiasing automatic metrics in natural language evaluation

For evaluating generation systems, automatic metrics such as BLEU cost n...
research
07/22/2021

To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation

Automatic metrics are commonly used as the exclusive tool for declaring ...
research
05/11/2022

SubER: A Metric for Automatic Evaluation of Subtitle Quality

This paper addresses the problem of evaluating the quality of automatica...
research
09/14/2023

Measuring the Quality of Text-to-Video Model Outputs: Metrics and Dataset

Evaluating the quality of videos generated from text-to-video (T2V) mode...
research
09/04/2019

Answers Unite! Unsupervised Metrics for Reinforced Summarization Models

Abstractive summarization approaches based on Reinforcement Learning (RL...
research
05/16/2023

Comparison of classifiers in challenge scheme

In recent decades, challenges have become very popular in scientific res...

Please sign up or login with your details

Forgot password? Click here to reset