Perception Score, A Learned Metric for Open-ended Text Generation Evaluation

by   Jing Gu, et al.

Automatic evaluation for open-ended natural language generation tasks remains a challenge. Existing metrics such as BLEU show a low correlation with human judgment. We propose a novel and powerful learning-based evaluation metric: Perception Score. The method measures the overall quality of the generation and scores holistically instead of only focusing on one evaluation criteria, such as word overlapping. Moreover, it also shows the amount of uncertainty about its evaluation result. By connecting the uncertainty, Perception Score gives a more accurate evaluation for the generation system. Perception Score provides state-of-the-art results on two conditional generation tasks and two unconditional generation tasks.


page 1

page 2

page 3

page 4


Perturbation CheckLists for Evaluating NLG Evaluation Metrics

Natural Language Generation (NLG) evaluation is a multifaceted task requ...

INSTRUCTSCORE: Towards Explainable Text Generation Evaluation with Automatic Feedback

The field of automatic evaluation of text generation made tremendous pro...

Coarse-to-fine Seam Estimation for Image Stitching

Seam-cutting and seam-driven techniques have been proven effective for h...

On Hallucination and Predictive Uncertainty in Conditional Language Generation

Despite improvements in performances on different natural language gener...

A Human Evaluation of AMR-to-English Generation Systems

Most current state-of-the art systems for generating English text from A...

deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets

We introduce Discriminative BLEU (deltaBLEU), a novel metric for intrins...

Human or Machine: Automating Human Likeliness Evaluation of NLG Texts

Automatic evaluation of various text quality criteria produced by data-d...

Please sign up or login with your details

Forgot password? Click here to reset