GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation

01/17/2021
by   Daniel Khashabi, et al.
7

Leaderboards have eased model development for many NLP datasets by standardizing their evaluation and delegating it to an independent external repository. Their adoption, however, is so far limited to tasks that can be reliably evaluated in an automatic manner. This work introduces GENIE, an extensible human evaluation leaderboard, which brings the ease of leaderboards to text generation tasks. GENIE automatically posts leaderboard submissions to crowdsourcing platforms asking human annotators to evaluate them on various axes (e.g., correctness, conciseness, fluency) and compares their answers to various automatic metrics. We introduce several datasets in English to GENIE, representing four core challenges in text generation: machine translation, summarization, commonsense reasoning, and machine comprehension. We provide formal granular evaluation metrics and identify areas for future research. We make GENIE publicly available and hope that it will spur progress in language generation models as well as their automatic and manual evaluation.

READ FULL TEXT
research
06/26/2020

Evaluation of Text Generation: A Survey

The paper surveys evaluation methods of natural language generation (NLG...
research
06/17/2020

Automatically Ranked Russian Paraphrase Corpus for Text Generation

The article is focused on automatic development and ranking of a large c...
research
12/15/2022

ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning

Large language models show improved downstream task performance when pro...
research
10/16/2021

FrugalScore: Learning Cheaper, Lighter and Faster Evaluation Metricsfor Automatic Text Generation

Fast and reliable evaluation metrics are key to R D progress. While tr...
research
04/04/2023

Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation

Human evaluation is critical for validating the performance of text-to-i...
research
06/21/2022

Automatic Pull Request Title Generation

Pull Requests (PRs) are a mechanism on modern collaborative coding platf...
research
10/24/2022

On the Effectiveness of Automated Metrics for Text Generation Systems

A major challenge in the field of Text Generation is evaluation because ...

Please sign up or login with your details

Forgot password? Click here to reset