Small but Mighty: New Benchmarks for Split and Rephrase

09/17/2020
by   Li Zhang, et al.
0

Split and Rephrase is a text simplification task of rewriting a complex sentence into simpler ones. As a relatively new task, it is paramount to ensure the soundness of its evaluation benchmark and metric. We find that the widely used benchmark dataset universally contains easily exploitable syntactic cues caused by its automatic generation process. Taking advantage of such cues, we show that even a simple rule-based model can perform on par with the state-of-the-art model. To remedy such limitations, we collect and release two crowdsourced benchmark datasets. We not only make sure that they contain significantly more diverse syntax, but also carefully control for their quality according to a well-defined set of criteria. While no satisfactory automatic metric exists, we apply fine-grained manual evaluation based on these criteria using crowdsourcing, showing that our datasets better represent the task and are significantly more challenging for the models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2021

Perturbation CheckLists for Evaluating NLG Evaluation Metrics

Natural Language Generation (NLG) evaluation is a multifaceted task requ...
research
05/01/2020

KPQA: A Metric for Generative Question Answering Using Word Weights

For the automatic evaluation of Generative Question Answering (genQA) sy...
research
10/11/2021

Document-Level Text Simplification: Dataset, Criteria and Baseline

Text simplification is a valuable technique. However, current research i...
research
08/24/2022

Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation

Research on Automatic Story Generation (ASG) relies heavily on human and...
research
09/21/2023

CAMERA: A Multimodal Dataset and Benchmark for Ad Text Generation

In response to the limitations of manual online ad production, significa...
research
01/22/2021

A Closer Look at Temporal Sentence Grounding in Videos: Datasets and Metrics

Despite Temporal Sentence Grounding in Videos (TSGV) has realized impres...
research
05/02/2018

Split and Rephrase: Better Evaluation and a Stronger Baseline

Splitting and rephrasing a complex sentence into several shorter sentenc...

Please sign up or login with your details

Forgot password? Click here to reset