Automatic Text Evaluation through the Lens of Wasserstein Barycenters

08/27/2021
by   Pierre Colombo, et al.
0

A new metric to evaluate text generation based on deep contextualized embeddings (e.g., BERT, Roberta, ELMo) is introduced. This metric is motivated by a new framework relying on optimal transport tools, i.e., Wasserstein distance and barycenter. By modelling the layer output of deep contextualized embeddings as a probability distribution rather than by a vector embedding; this framework provides a natural way to aggregate the different outputs through the Wasserstein space topology. In addition, it provides theoretical grounds to our metric and offers an alternative to available solutions (e.g., MoverScore and BertScore). Numerical evaluation is performed on four different tasks: machine translation, summarization, data2text generation and image captioning. Our results show that outperforms other BERT based metrics and exhibits more consistent behaviour in particular for text summarization.

READ FULL TEXT
research
09/05/2019

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance

A robust evaluation metric has a profound impact on the development of t...
research
02/13/2019

Wasserstein Barycenter Model Ensembling

In this paper we propose to perform model ensembling in a multiclass or ...
research
04/21/2019

BERTScore: Evaluating Text Generation with BERT

We propose BERTScore, an automatic evaluation metric for text generation...
research
07/22/2022

Exploring Wasserstein Distance across Concept Embeddings for Ontology Matching

Measuring the distance between ontological elements is a fundamental com...
research
09/12/2019

VizSeq: A Visual Analysis Toolkit for Text Generation Tasks

Automatic evaluation of text generation tasks (e.g. machine translation,...
research
10/10/2022

Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis

Is it possible to build a general and automatic natural language generat...
research
05/24/2023

MuLER: Detailed and Scalable Reference-based Evaluation

We propose a novel methodology (namely, MuLER) that transforms any refer...

Please sign up or login with your details

Forgot password? Click here to reset