T5Score: Discriminative Fine-tuning of Generative Evaluation Metrics

12/12/2022
by   Yiwei Qin, et al.
0

Modern embedding-based metrics for evaluation of generated text generally fall into one of two paradigms: discriminative metrics that are trained to directly predict which outputs are of higher quality according to supervised human annotations, and generative metrics that are trained to evaluate text based on the probabilities of a generative model. Both have their advantages; discriminative metrics are able to directly optimize for the problem of distinguishing between good and bad outputs, while generative metrics can be trained using abundant raw text. In this paper, we present a framework that combines the best of both worlds, using both supervised and unsupervised signals from whatever data we have available. We operationalize this idea by training T5Score, a metric that uses these training signals with mT5 as the backbone. We perform an extensive empirical comparison with other existing metrics on 5 datasets, 19 languages and 280 systems, demonstrating the utility of our method. Experimental results show that: T5Score achieves the best performance on all datasets against existing top-scoring metrics at the segment level. We release our code and models at https://github.com/qinyiwei/T5Score.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

Faithful Low-Resource Data-to-Text Generation through Cycle Training

Methods to generate text from structured data have advanced significantl...
research
06/15/2023

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Recent text-to-image generative models can generate high-fidelity images...
research
12/19/2022

LENS: A Learnable Evaluation Metric for Text Simplification

Training learnable metrics using modern language models has recently eme...
research
06/07/2021

Unsupervised Representation Disentanglement of Text: An Evaluation on Synthetic Datasets

To highlight the challenges of achieving representation disentanglement ...
research
11/25/2022

CodeExp: Explanatory Code Document Generation

Developing models that can automatically generate detailed code explanat...
research
02/01/2021

Text-to-hashtag Generation using Seq2seq Learning

In this paper, we studied if models based on BiLSTM and BERT can generat...
research
08/17/2020

Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

Large generative language models such as GPT-2 are well-known for their ...

Please sign up or login with your details

Forgot password? Click here to reset