Measuring Attribution in Natural Language Generation Models

12/23/2021
by   Hannah Rashkin, et al.
9

With recent improvements in natural language generation (NLG) models for various applications, it has become imperative to have the means to identify and evaluate whether NLG output is only sharing verifiable information about the external world. In this work, we present a new evaluation framework entitled Attributable to Identified Sources (AIS) for assessing the output of natural language generation models, when such output pertains to the external world. We first define AIS and introduce a two-stage annotation pipeline for allowing annotators to appropriately evaluate model output according to AIS guidelines. We empirically validate this approach on three generation datasets (two in the conversational QA domain and one in summarization) via human evaluation studies that suggest that AIS could serve as a common framework for measuring whether model-generated statements are supported by underlying sources. We release guidelines for the human evaluation studies.

READ FULL TEXT

page 34

page 35

research
06/12/2019

Keeping Notes: Conditional Natural Language Generation with a Scratchpad Mechanism

We introduce the Scratchpad Mechanism, a novel addition to the sequence-...
research
10/25/2019

Measuring Conversational Fluidity in Automated Dialogue Agents

We present an automated evaluation method to measure fluidity in convers...
research
04/11/2022

A Multilingual Perspective Towards the Evaluation of Attribution Methods in Natural Language Inference

Most evaluations of attribution methods focus on the English language. I...
research
02/12/2020

Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models

Automated evaluation of open domain natural language generation (NLG) mo...
research
08/18/2017

Assessing the Stylistic Properties of Neurally Generated Text in Authorship Attribution

Recent applications of neural language models have led to an increased i...
research
02/02/2021

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

We introduce GEM, a living benchmark for natural language Generation (NL...
research
12/02/2021

InfoLM: A New Metric to Evaluate Summarization Data2Text Generation

Assessing the quality of natural language generation systems through hum...

Please sign up or login with your details

Forgot password? Click here to reset