MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

09/05/2019
by   Wei Zhao, et al.
0

A robust evaluation metric has a profound impact on the development of text generation systems. A desirable metric compares system output against references based on their semantics rather than surface forms. In this paper we investigate strategies to encode system and reference texts to devise a metric that shows a high correlation with human judgment of text quality. We validate our new metric, namely MoverScore, on a number of text generation tasks including summarization, machine translation, image captioning, and data-to-text generation, where the outputs are produced by a variety of neural and non-neural systems. Our findings suggest that metrics combining contextualized representations with a distance measure perform the best. Such metrics also demonstrate strong generalization capability across tasks. For ease-of-use we make our metrics available as web service.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2018

Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation

Neural text generation, including neural machine translation, image capt...
research
08/27/2021

Automatic Text Evaluation through the Lens of Wasserstein Barycenters

A new metric to evaluate text generation based on deep contextualized e...
research
09/12/2019

VizSeq: A Visual Analysis Toolkit for Text Generation Tasks

Automatic evaluation of text generation tasks (e.g. machine translation,...
research
03/30/2021

Evaluating the Morphosyntactic Well-formedness of Generated Texts

Text generation systems are ubiquitous in natural language processing ap...
research
06/03/2019

Handling Divergent Reference Texts when Evaluating Table-to-Text Generation

Automatically constructed datasets for generating text from semi-structu...
research
12/28/2020

Neural Text Generation with Artificial Negative Examples

Neural text generation models conditioning on given input (e.g. machine ...
research
05/10/2017

Analysing Data-To-Text Generation Benchmarks

Recently, several data-sets associating data to text have been created t...

Please sign up or login with your details

Forgot password? Click here to reset