Sentence Simplification Using Paraphrase Corpus for Initialization

05/31/2023
by   Kang Liu, et al.
0

Neural sentence simplification method based on sequence-to-sequence framework has become the mainstream method for sentence simplification (SS) task. Unfortunately, these methods are currently limited by the scarcity of parallel SS corpus. In this paper, we focus on how to reduce the dependence on parallel corpus by leveraging a careful initialization for neural SS methods from paraphrase corpus. Our work is motivated by the following two findings: (1) Paraphrase corpus includes a large proportion of sentence pairs belonging to SS corpus. (2) We can construct large-scale pseudo parallel SS data by keeping these sentence pairs with a higher complexity difference. Therefore, we propose two strategies to initialize neural SS methods using paraphrase corpus. We train three different neural SS methods with our initialization, which can obtain substantial improvements on the available WikiLarge data compared with themselves without initialization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2021

The ELITR ECA Corpus

We present the ELITR ECA corpus, a multilingual corpus derived from publ...
research
09/01/2021

An Unsupervised Method for Building Sentence Simplification Corpora in Multiple Languages

The availability of parallel sentence simplification (SS) is scarce for ...
research
08/11/2021

Icelandic Parallel Abstracts Corpus

We present a new Icelandic-English parallel corpus, the Icelandic Parall...
research
02/28/2016

Identification of Parallel Passages Across a Large Hebrew/Aramaic Corpus

We propose a method for efficiently finding all parallel passages in a l...
research
11/03/2018

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings

Machine translation is highly sensitive to the size and quality of the t...
research
06/05/2019

Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron

We propose a novel model architecture and training algorithm to learn bi...
research
12/04/2019

Towards Constructing a Corpus for Studying the Effects of Treatments and Substances Reported in PubMed Abstracts

We present the construction of an annotated corpus of PubMed abstracts r...

Please sign up or login with your details

Forgot password? Click here to reset