A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation – through the Lens of Semantic Similarity Rating

05/24/2022
by   Laura Zeidler, et al.
0

Evaluating the quality of generated text is difficult, since traditional NLG evaluation metrics, focusing more on surface form than meaning, often fail to assign appropriate scores. This is especially problematic for AMR-to-text evaluation, given the abstract nature of AMR. Our work aims to support the development and improvement of NLG evaluation metrics that focus on meaning, by developing a dynamic CheckList for NLG metrics that is interpreted by being organized around meaning-relevant linguistic phenomena. Each test instance consists of a pair of sentences with their AMR graphs and a human-produced textual semantic similarity or relatedness score. Our CheckList facilitates comparative evaluation of metrics and reveals strengths and weaknesses of novel and traditional metrics. We demonstrate the usefulness of CheckList by designing a new metric GraCo that computes lexical cohesion graphs over AMR concepts. Our analysis suggests that GraCo presents an interesting NLG metric worth future investigation and that meaning-oriented NLG metrics can profit from graph-based metric components using AMR.

READ FULL TEXT
research
08/26/2021

Weisfeiler-Leman in the BAMBOO: Novel AMR Graph Metrics and a Benchmark for AMR Graph Similarity

Several metrics have been proposed for assessing the similarity of (abst...
research
10/06/2020

GRUEN for Evaluating Linguistic Quality of Generated Text

Automatic evaluation metrics are indispensable for evaluating generated ...
research
05/28/2019

SEMA: an Extended Semantic Evaluation Metric for AMR

Abstract Meaning Representation (AMR) is a recently designed semantic re...
research
07/18/2018

Is it worth it? Budget-related evaluation metrics for model selection

Creating a linguistic resource is often done by using a machine learning...
research
01/29/2020

AMR Similarity Metrics from Principles

Different metrics have been proposed to compare Abstract Meaning Represe...
research
08/02/2016

SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity

Verbs play a critical role in the meaning of sentences, but these ubiqui...
research
08/20/2020

Towards a Decomposable Metric for Explainable Evaluation of Text Generation from AMR

Systems that generate sentences from (abstract) meaning representations ...

Please sign up or login with your details

Forgot password? Click here to reset