Sentence-Level Fluency Evaluation: References Help, But Can Be Spared!

09/24/2018
by   Katharina Kann, et al.
0

Motivated by recent findings on the probabilistic modeling of acceptability judgments, we propose syntactic log-odds ratio (SLOR), a normalized language model score, as a metric for referenceless fluency evaluation of natural language generation output at the sentence level. We further introduce WPSLOR, a novel WordPiece-based version, which harnesses a more compact language model. Even though word-overlap metrics like ROUGE are computed with the help of hand-written references, our referenceless methods obtain a significantly higher correlation with human fluency scores on a benchmark dataset of compressed sentences. Finally, we present ROUGE-LM, a reference-based metric which is a natural extension of WPSLOR to the case of available references. We show that ROUGE-LM yields a significantly higher correlation with human judgments than all baseline metrics, including WPSLOR on its own.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2021

Language Model Augmented Relevance Score

Although automated metrics are commonly used to evaluate NLG systems, th...
research
04/13/2020

BLEU might be Guilty but References are not Innocent

The quality of automatic metrics for machine translation has been increa...
research
09/16/2023

SLIDE: Reference-free Evaluation for Machine Translation using a Sliding Document Window

Reference-based metrics that operate at the sentence level typically out...
research
01/14/2022

Multi-Narrative Semantic Overlap Task: Evaluation and Benchmark

In this paper, we introduce an important yet relatively unexplored NLP t...
research
08/05/2017

Referenceless Quality Estimation for Natural Language Generation

Traditional automatic evaluation measures for natural language generatio...
research
09/11/2018

Evaluating Semantic Rationality of a Sentence: A Sememe-Word-Matching Neural Network based on HowNet

Automatic evaluation of semantic rationality is an important yet challen...
research
04/30/2020

Improved Natural Language Generation via Loss Truncation

Neural language models are usually trained to match the distributional p...

Please sign up or login with your details

Forgot password? Click here to reset