The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

02/02/2021
by   Sebastian Gehrmann, et al.
5

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. However, due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of corpora and evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the initial release for which we are organizing a shared task at our ACL 2021 Workshop and to which we invite the entire NLG community to participate.

READ FULL TEXT

Authors

page 3

page 9

page 10

page 15

page 20

page 21

page 23

page 24

06/28/2017

Data-driven Natural Language Generation: Paving the Road to Success

We argue that there are currently two major bottlenecks to the commercia...
06/26/2020

Evaluation of Text Generation: A Survey

The paper surveys evaluation methods of natural language generation (NLG...
04/16/2021

IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

A benchmark provides an ecosystem to measure the advancement of models w...
10/26/2020

Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale

Automatic evaluation of language generation systems is a well-studied pr...
08/27/2020

A Survey of Evaluation Metrics Used for NLG Systems

The success of Deep Learning has created a surge in interest in a wide a...
04/11/2022

Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet Datasets

In addition to generating data and annotations, devising sensible data s...
12/23/2021

Measuring Attribution in Natural Language Generation Models

With recent improvements in natural language generation (NLG) models for...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.