Learning To Split and Rephrase From Wikipedia Edit History

08/28/2018
by   Jan A. Botha, et al.
0

Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning. We extract a rich new dataset for this task by mining Wikipedia's edit history: WikiSplit contains one million naturally occurring sentence rewrites, providing sixty times more distinct split examples and a ninety times larger vocabulary than the WebSplit corpus introduced by Narayan et al. (2017) as a benchmark for this task. Incorporating WikiSplit as training data produces a model with qualitatively better predictions that score 32 BLEU points above the prior best result on the WebSplit benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2022

Mining Naturally-occurring Corrections and Paraphrases from Wikipedia's Revision History

Naturally-occurring instances of linguistic phenomena are important both...
research
09/10/2021

BiSECT: Learning to Split and Rephrase Sentences with Bitexts

An important task in NLP applications such as sentence simplification is...
research
05/02/2018

Split and Rephrase: Better Evaluation and a Stronger Baseline

Splitting and rephrasing a complex sentence into several shorter sentenc...
research
05/15/2018

Harvesting Paragraph-Level Question-Answer Pairs from Wikipedia

We study the task of generating from Wikipedia articles question-answer ...
research
06/03/2017

Wikipedia Vandal Early Detection: from User Behavior to User Embedding

Wikipedia is the largest online encyclopedia that allows anyone to edit ...
research
08/28/2018

WikiAtomicEdits: A Multilingual Corpus of Wikipedia Edits for Modeling Language and Discourse

We release a corpus of 43 million atomic edits across 8 languages. These...
research
09/19/2019

An Edit-centric Approach for Wikipedia Article Quality Assessment

We propose an edit-centric approach to assess Wikipedia article quality ...

Please sign up or login with your details

Forgot password? Click here to reset