FrugalScore: Learning Cheaper, Lighter and Faster Evaluation Metricsfor Automatic Text Generation

10/16/2021
by   Moussa Kamal Eddine, et al.
0

Fast and reliable evaluation metrics are key to R D progress. While traditional natural language generation metrics are fast, they are not very reliable. Conversely, new metrics based on large pretrained language models are much more reliable, but require significant computational resources. In this paper, we propose FrugalScore, an approach to learn a fixed, low cost version of any expensive NLG metric, while retaining most of its original performance. Experiments with BERTScore and MoverScore on summarization and translation show that FrugalScore is on par with the original metrics (and sometimes better), while having several orders of magnitude less parameters and running several times faster. On average over all learned metrics, tasks, and variants, FrugalScore retains 96.8 times less parameters than the original metrics. We make our trained metrics publicly available, to benefit the entire NLP community and in particular researchers and practitioners with limited resources.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2020

Evaluation of Text Generation: A Survey

The paper surveys evaluation methods of natural language generation (NLG...
research
01/17/2021

GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation

Leaderboards have eased model development for many NLP datasets by stand...
research
06/28/2017

Data-driven Natural Language Generation: Paving the Road to Success

We argue that there are currently two major bottlenecks to the commercia...
research
08/01/2019

Simple and Effective Text Matching with Richer Alignment Features

In this paper, we present a fast and strong neural approach for general ...
research
12/19/2022

SEScore2: Retrieval Augmented Pretraining for Text Generation Evaluation

Is it possible to leverage large scale raw and raw parallel corpora to b...
research
05/26/2023

AlignScore: Evaluating Factual Consistency with a Unified Alignment Function

Many text generation applications require the generated text to be factu...
research
05/03/2023

Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs

Large language models (LLMs) power many state-of-the-art systems in natu...

Please sign up or login with your details

Forgot password? Click here to reset