DeepAI AI Chat
Log In Sign Up

Tracing and Removing Data Errors in Natural Language Generation Datasets

12/21/2022
by   Faisal Ladhak, et al.
0

Recent work has identified noisy and misannotated data as a core cause of hallucinations and unfaithful outputs in Natural Language Generation (NLG) tasks. Consequently, identifying and removing these examples is a key open challenge in creating reliable NLG systems. In this work, we introduce a framework to identify and remove low-quality training instances that lead to undesirable outputs, such as faithfulness errors in text summarization. We show that existing approaches for error tracing, such as gradient-based influence measures, do not perform reliably for detecting faithfulness errors in summarization. We overcome the drawbacks of existing error tracing methods through a new, contrast-based estimate that compares undesired generations to human-corrected outputs. Our proposed method can achieve a mean average precision of 0.91 across synthetic tasks with known ground truth and can achieve a two-fold reduction in hallucinations on a real entity hallucination evaluation on the NYT dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/23/2022

Tracing Knowledge in Language Models Back to the Training Data

Neural language models (LMs) have been shown to memorize a great deal of...
12/20/2022

Toward Human-Like Evaluation for Natural Language Generation with Error Analysis

The state-of-the-art language model-based automatic metrics, e.g. BARTSc...
05/08/2021

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks

Recently, an increasing number of works have introduced models capable o...
05/25/2022

Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors

The propensity of abstractive summarization systems to make factual erro...
04/04/2019

Unifying Human and Statistical Evaluation for Natural Language Generation

How can we measure whether a natural language generation system produces...
11/11/2022

Improving Factual Consistency in Summarization with Compression-Based Post-Editing

State-of-the-art summarization models still struggle to be factually con...
10/10/2019

Automatic Quality Estimation for Natural Language Generation: Ranting (Jointly Rating and Ranking)

We present a recurrent neural network based system for automatic quality...