Language Model Augmented Relevance Score

08/19/2021
by   Ruibo Liu, et al.
8

Although automated metrics are commonly used to evaluate NLG systems, they often correlate poorly with human judgements. Newer metrics such as BERTScore have addressed many weaknesses in prior metrics such as BLEU and ROUGE, which rely on n-gram matching. These newer methods, however, are still limited in that they do not consider the generation context, so they cannot properly reward generated text that is correct but deviates from the given reference. In this paper, we propose Language Model Augmented Relevance Score (MARS), a new context-aware metric for NLG evaluation. MARS leverages off-the-shelf language models, guided by reinforcement learning, to create augmented references that consider both the generation context and available human references, which are then used as additional references to score generated text. Compared with seven existing metrics in three common NLG tasks, MARS not only achieves higher correlation with human reference judgements, but also differentiates well-formed candidates from adversarial samples to a larger degree.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2022

QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance

Existing metrics for assessing question generation not only require cost...
research
09/24/2018

Sentence-Level Fluency Evaluation: References Help, But Can Be Spared!

Motivated by recent findings on the probabilistic modeling of acceptabil...
research
08/06/2023

Towards Multiple References Era – Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation

N-gram matching-based evaluation metrics, such as BLEU and chrF, are wid...
research
03/27/2023

KPEval: Towards Fine-grained Semantic-based Evaluation of Keyphrase Extraction and Generation Systems

Despite the significant advancements in keyphrase extraction and keyphra...
research
04/30/2020

Improved Natural Language Generation via Loss Truncation

Neural language models are usually trained to match the distributional p...
research
10/20/2020

Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation

Neural image-to-text radiology report generation systems offer the poten...
research
03/05/2020

BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward

Measuring the quality of a generated sequence against a set of reference...

Please sign up or login with your details

Forgot password? Click here to reset