GRUEN for Evaluating Linguistic Quality of Generated Text

10/06/2020
by   Wanzheng Zhu, et al.
0

Automatic evaluation metrics are indispensable for evaluating generated text. To date, these metrics have focused almost exclusively on the content selection aspect of the system output, ignoring the linguistic quality aspect altogether. We bridge this gap by proposing GRUEN for evaluating Grammaticality, non-Redundancy, focUs, structure and coherENce of generated text. GRUEN utilizes a BERT-based model and a class of syntactic, semantic, and contextual features to examine the system output. Unlike most existing evaluation metrics which require human references as an input, GRUEN is reference-less and requires only the system output. Besides, it has the advantage of being unsupervised, deterministic, and adaptable to various tasks. Experiments on seven datasets over four language generation tasks show that the proposed metric correlates highly with human judgments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/24/2020

How To Evaluate Your Dialogue System: Probe Tasks as an Alternative for Token-level Evaluation Metrics

Though generative dialogue modeling is widely seen as a language modelin...
research
10/09/2020

Evaluating and Characterizing Human Rationales

Two main approaches for evaluating the quality of machine-generated rati...
research
03/12/2018

Concept2vec: Metrics for Evaluating Quality of Embeddings for Ontological Concepts

Although there is an emerging trend towards generating embeddings for pr...
research
06/18/2023

MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types

With the growing interest in large language models, the need for evaluat...
research
05/24/2022

A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation – through the Lens of Semantic Similarity Rating

Evaluating the quality of generated text is difficult, since traditional...
research
05/24/2023

Don't Take This Out of Context! On the Need for Contextual Models and Evaluations for Stylistic Rewriting

Most existing stylistic text rewriting methods operate on a sentence lev...
research
10/11/2018

Semantic Structural Evaluation for Text Simplification

Current measures for evaluating text simplification systems focus on eva...

Please sign up or login with your details

Forgot password? Click here to reset