Perception Score, A Learned Metric for Open-ended Text Generation Evaluation

08/07/2020
by   Jing Gu, et al.
0

Automatic evaluation for open-ended natural language generation tasks remains a challenge. Existing metrics such as BLEU show a low correlation with human judgment. We propose a novel and powerful learning-based evaluation metric: Perception Score. The method measures the overall quality of the generation and scores holistically instead of only focusing on one evaluation criteria, such as word overlapping. Moreover, it also shows the amount of uncertainty about its evaluation result. By connecting the uncertainty, Perception Score gives a more accurate evaluation for the generation system. Perception Score provides state-of-the-art results on two conditional generation tasks and two unconditional generation tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/13/2021

Perturbation CheckLists for Evaluating NLG Evaluation Metrics

Natural Language Generation (NLG) evaluation is a multifaceted task requ...
02/02/2021

MAUVE: Human-Machine Divergence Curves for Evaluating Open-Ended Text Generation

Despite major advances in open-ended text generation, there has been lim...
05/24/2018

Coarse-to-fine Seam Estimation for Image Stitching

Seam-cutting and seam-driven techniques have been proven effective for h...
04/14/2020

A Human Evaluation of AMR-to-English Generation Systems

Most current state-of-the art systems for generating English text from A...
06/23/2015

deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets

We introduce Discriminative BLEU (deltaBLEU), a novel metric for intrins...
03/28/2021

On Hallucination and Predictive Uncertainty in Conditional Language Generation

Despite improvements in performances on different natural language gener...
02/28/2022

Rethinking and Refining the Distinct Metric

Distinct is a widely used automatic metric for evaluating the diversity ...