MAUVE: Human-Machine Divergence Curves for Evaluating Open-Ended Text Generation

02/02/2021
by   Krishna Pillutla, et al.
0

Despite major advances in open-ended text generation, there has been limited progress in designing evaluation metrics for this task. We propose MAUVE – a metric for open-ended text generation, which directly compares the distribution of machine-generated text to that of human language. MAUVE measures the mean area under the divergence curve for the two distributions, exploring the trade-off between two types of errors: those arising from parts of the human distribution that the model distribution approximates well, and those it does not. We present experiments across two open-ended generation tasks in the web text domain and the story domain, and a variety of decoding algorithms and model sizes. Our results show that evaluation under MAUVE indeed reflects the more natural behavior with respect to model size, compared to prior metrics. MAUVE's ordering of the decoding algorithms also agrees with that of generation perplexity, the most widely used metric in open-ended text generation; however, MAUVE presents a more principled evaluation metric for the task as it considers both model and human text.

READ FULL TEXT
research
10/07/2022

Visualize Before You Write: Imagination-Guided Open-Ended Text Generation

Recent advances in text-to-image synthesis make it possible to visualize...
research
06/20/2023

Open-Domain Text Evaluation via Meta Distribution Modeling

Recent advances in open-domain text generation models powered by large p...
research
02/03/2020

CoTK: An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation

In text generation evaluation, many practical issues, such as inconsiste...
research
09/14/2021

The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation

Recent text generation research has increasingly focused on open-ended d...
research
05/22/2020

Investigating Label Bias in Beam Search for Open-ended Text Generation

Beam search is an effective and widely used decoding algorithm in many s...
research
12/04/2022

Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation

Large pre-trained language models have recently enabled open-ended gener...
research
05/22/2023

Look-back Decoding for Open-Ended Text Generation

Given a prefix (context), open-ended generation aims to decode texts tha...

Please sign up or login with your details

Forgot password? Click here to reset