BLEURT: Learning Robust Metrics for Text Generation

04/09/2020
by   Thibault Sellam, et al.
0

Text generation has made significant advances in the last few years. Yet, evaluation metrics have lagged behind, as the most popular choices (e.g., BLEU and ROUGE) may correlate poorly with human judgments. We propose BLEURT, a learned evaluation metric based on BERT that can model human judgments with a few thousand possibly biased training examples. A key aspect of our approach is a novel pre-training scheme that uses millions of synthetic examples to help the model generalize. BLEURT provides state-of-the-art results on the last three years of the WMT Metrics shared task and the WebNLG Competition dataset. In contrast to a vanilla BERT-based approach, it yields superior results even when the training data is scarce and out-of-distribution.

READ FULL TEXT
research
10/08/2021

Global Explainability of BERT-Based Evaluation Metrics by Disentangling along Linguistic Factors

Evaluation metrics are a key ingredient for progress of text generation ...
research
07/17/2021

Generative Pretraining for Paraphrase Evaluation

We introduce ParaBLEU, a paraphrase representation learning model and ev...
research
05/23/2023

INSTRUCTSCORE: Towards Explainable Text Generation Evaluation with Automatic Feedback

The field of automatic evaluation of text generation made tremendous pro...
research
10/13/2020

Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance

Recent advances in automatic evaluation metrics for text have shown that...
research
06/20/2023

Open-Domain Text Evaluation via Meta Distribution Modeling

Recent advances in open-domain text generation models powered by large p...
research
07/03/2020

On the Relation between Quality-Diversity Evaluation and Distribution-Fitting Goal in Text Generation

The goal of text generation models is to fit the underlying real probabi...
research
11/19/2022

An Empirical Study On Contrastive Search And Contrastive Decoding For Open-ended Text Generation

In the study, we empirically compare the two recently proposed decoding ...

Please sign up or login with your details

Forgot password? Click here to reset