BiSECT: Learning to Split and Rephrase Sentences with Bitexts

09/10/2021
by   Joongwon Kim, et al.
0

An important task in NLP applications such as sentence simplification is the ability to take a long, complex sentence and split it into shorter sentences, rephrasing as necessary. We introduce a novel dataset and a new model for this `split and rephrase' task. Our BiSECT training data consists of 1 million long English sentences paired with shorter, meaning-equivalent English sentences. We obtain these by extracting 1-2 sentence alignments in bilingual parallel corpora and then using machine translation to convert both sides of the corpus into the same language. BiSECT contains higher quality training examples than previous Split and Rephrase corpora, with sentence splits that require more significant modifications. We categorize examples in our corpus, and use these categories in a novel model that allows us to target specific regions of the input sentence to be split and edited. Moreover, we show that models trained on BiSECT can perform a wider variety of split operations and improve upon previous state-of-the-art approaches in automatic and human evaluations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2017

Split and Rephrase

We propose a new sentence simplification task (Split-and-Rephrase) where...
research
01/16/2020

Fact-aware Sentence Split and Rephrase with Permutation Invariant Training

Sentence Split and Rephrase aims to break down a complex sentence into s...
research
01/31/2023

Sentence Identification with BOS and EOS Label Combinations

The sentence is a fundamental unit in many NLP applications. Sentence se...
research
08/28/2018

Learning To Split and Rephrase From Wikipedia Edit History

Split and rephrase is the task of breaking down a sentence into shorter ...
research
05/02/2018

Split and Rephrase: Better Evaluation and a Stronger Baseline

Splitting and rephrasing a complex sentence into several shorter sentenc...
research
02/25/2019

EAT2seq: A generic framework for controlled sentence transformation without task-specific training

We present EAT2seq: a novel method to architect automatic linguistic tra...
research
12/16/2021

Idiomatic Expression Paraphrasing without Strong Supervision

Idiomatic expressions (IEs) play an essential role in natural language. ...

Please sign up or login with your details

Forgot password? Click here to reset