InfoLM: A New Metric to Evaluate Summarization Data2Text Generation

12/02/2021
by   Pierre Colombo, et al.
0

Assessing the quality of natural language generation systems through human annotation is very expensive. Additionally, human annotation campaigns are time-consuming and include non-reusable human labour. In practice, researchers rely on automatic metrics as a proxy of quality. In the last decade, many string-based metrics (e.g., BLEU) have been introduced. However, such metrics usually rely on exact matches and thus, do not robustly handle synonyms. In this paper, we introduce InfoLM a family of untrained metrics that can be viewed as a string-based metric that addresses the aforementioned flaws thanks to a pre-trained masked language model. This family of metrics also makes use of information measures allowing the adaptation of InfoLM to various evaluation criteria. Using direct assessment, we demonstrate that InfoLM achieves statistically significant improvement and over 10 points of correlation gains in many configurations on both summarization and data2text generation.

READ FULL TEXT

page 6

page 14

page 16

research
10/09/2020

Mark-Evaluate: Assessing Language Generation using Population Estimation Methods

We propose a family of metrics to assess language generation derived fro...
research
12/08/2022

SpeechLMScore: Evaluating speech generation using speech language model

While human evaluation is the most reliable metric for evaluating speech...
research
07/06/2018

The price of debiasing automatic metrics in natural language evaluation

For evaluating generation systems, automatic metrics such as BLEU cost n...
research
05/13/2021

Towards Human-Free Automatic Quality Evaluation of German Summarization

Evaluating large summarization corpora using humans has proven to be exp...
research
03/07/2023

Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Recently, the emergence of ChatGPT has attracted wide attention from the...
research
12/23/2021

Measuring Attribution in Natural Language Generation Models

With recent improvements in natural language generation (NLG) models for...
research
04/15/2021

Rethinking Automatic Evaluation in Sentence Simplification

Automatic evaluation remains an open research question in Natural Langua...

Please sign up or login with your details

Forgot password? Click here to reset