Document-Level Text Simplification: Dataset, Criteria and Baseline

10/11/2021
by   Renliang Sun, et al.
0

Text simplification is a valuable technique. However, current research is limited to sentence simplification. In this paper, we define and investigate a new task of document-level text simplification, which aims to simplify a document consisting of multiple sentences. Based on Wikipedia dumps, we first construct a large-scale dataset named D-Wikipedia and perform analysis and human evaluation on it to show that the dataset is reliable. Then, we propose a new automatic evaluation metric called D-SARI that is more suitable for the document-level simplification task. Finally, we select several representative models as baseline models for this task and perform automatic evaluation and human evaluation. We analyze the results and point out the shortcomings of the baseline models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2021

Towards Document-Level Paraphrase Generation with Sentence Rewriting and Reordering

Paraphrase generation is an important task in natural language processin...
research
05/30/2023

SWiPE: A Dataset for Document-Level Simplification of Wikipedia Pages

Text simplification research has mostly focused on sentence-level simpli...
research
05/10/2023

WikiSQE: A Large-Scale Dataset for Sentence Quality Estimation in Wikipedia

Wikipedia can be edited by anyone and thus contains various quality sent...
research
05/30/2019

Assessing The Factual Accuracy of Generated Text

We propose a model-based metric to estimate the factual accuracy of gene...
research
09/17/2020

Small but Mighty: New Benchmarks for Split and Rephrase

Split and Rephrase is a text simplification task of rewriting a complex ...
research
02/27/2019

DiscoFuse: A Large-Scale Dataset for Discourse-based Sentence Fusion

Sentence fusion is the task of joining several independent sentences int...
research
05/01/2020

ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations

In order to simplify a sentence, human editors perform multiple rewritin...

Please sign up or login with your details

Forgot password? Click here to reset