DeepAI AI Chat
Log In Sign Up

SEScore2: Retrieval Augmented Pretraining for Text Generation Evaluation

12/19/2022
by   Wenda Xu, et al.
ByteDance Inc.
The Regents of the University of California
0

Is it possible to leverage large scale raw and raw parallel corpora to build a general learned metric? Existing learned metrics have gaps to human judgements, are model-dependent or are limited to the domains or tasks where human ratings are available. In this paper, we propose SEScore2, a model-based metric pretrained over million-scale synthetic dataset constructed by our novel retrieval augmented data synthesis pipeline. SEScore2 achieves high correlation to human judgements without any human rating supervisions. Importantly, our unsupervised SEScore2 can outperform supervised metrics, which are trained on the News human ratings, at the TED domain. We evaluate SEScore2 over four text generation tasks across three languages. SEScore2 outperforms all prior unsupervised evaluation metrics in machine translation, speech translation, data-to-text and dialogue generation, with average Kendall improvements 0.158. SEScore2 even outperforms SOTA supervised BLEURT at data-to-text, dialogue generation and overall correlation.

READ FULL TEXT
10/10/2022

Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis

Is it possible to build a general and automatic natural language generat...
06/06/2023

Correction of Errors in Preference Ratings from Automated Metrics for Text Generation

A major challenge in the field of Text Generation is evaluation: Human e...
09/06/2022

Layer or Representation Space: What makes BERT-based Evaluation Metrics Robust?

The evaluation of recent embedding-based evaluation metrics for text gen...
02/02/2022

A Survey on Retrieval-Augmented Text Generation

Recently, retrieval-augmented text generation attracted increasing atten...
04/30/2020

NUBIA: NeUral Based Interchangeability Assessor for Text Generation

We present NUBIA, a methodology to build automatic evaluation metrics fo...
10/16/2021

FrugalScore: Learning Cheaper, Lighter and Faster Evaluation Metricsfor Automatic Text Generation

Fast and reliable evaluation metrics are key to R D progress. While tr...
10/13/2020

Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance

Recent advances in automatic evaluation metrics for text have shown that...