Towards a Decomposable Metric for Explainable Evaluation of Text Generation from AMR

08/20/2020
by   Juri Opitz, et al.
0

Systems that generate sentences from (abstract) meaning representations (AMRs) are typically evaluated using automatic surface matching metrics that compare the generated texts to the texts that were originally given to human annotators to construct AMR meaning representations. However, besides well-known issues from which such metrics suffer (Callison-Burch et al., 2006; Novikova et al., 2017), we show that an additional problem arises when applied for AMR-to-text evaluation because mapping from the more abstract domain of AMR to the more concrete domain of sentences allows for manifold sentence realizations. In this work we aim to alleviate these issues and propose ℳℱ_β, an automatic metric that builds on two pillars. The first pillar is the principle of meaning preservation ℳ: it measures to what extent the original AMR graph can be reconstructed from the generated sentence. We implement this principle by i) automatically constructing an AMR from the generated sentence using state-of-the-art AMR parsers and ii) apply fine-grained principled AMR metrics to measure the distance between the original and the reconstructed AMR. The second pillar builds on a principle of (grammatical) form ℱ, which measures the linguistic quality of the generated sentences, which we implement using SOTA language models. We show - theoretically and experimentally - that fulfillment of both principles offers several benefits for evaluation of AMR-to-text systems, including the explainability of scores.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2019

Jointly Measuring Diversity and Quality in Text Generation Models

Text generation is an important Natural Language Processing task with va...
research
01/29/2020

AMR Similarity Metrics from Principles

Different metrics have been proposed to compare Abstract Meaning Represe...
research
02/04/2022

A Benchmark Corpus for the Detection of Automatically Generated Text in Academic Publications

Automatic text generation based on neural language models has achieved p...
research
06/14/2022

SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable AMR Meaning Features

Metrics for graph-based meaning representations (e.g., Abstract Meaning ...
research
04/15/2021

Rethinking Automatic Evaluation in Sentence Simplification

Automatic evaluation remains an open research question in Natural Langua...
research
05/24/2022

A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation – through the Lens of Semantic Similarity Rating

Evaluating the quality of generated text is difficult, since traditional...
research
10/28/2018

Learning Criteria and Evaluation Metrics for Textual Transfer between Non-Parallel Corpora

We consider the problem of automatically generating textual paraphrases ...

Please sign up or login with your details

Forgot password? Click here to reset