ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation

01/11/2019
by   J. Edward Hu, et al.
0

We present ParaBank, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality. Following the approach of ParaNMT, we train a Czech-English neural machine translation (NMT) system to generate novel paraphrases of English reference sentences. By adding lexical constraints to the NMT decoding procedure, however, we are able to produce multiple high-quality sentential paraphrases per source sentence, yielding an English paraphrase resource with more than 4 billion generated tokens and exhibiting greater lexical diversity. Using human judgments, we also demonstrate that ParaBank's paraphrases improve over ParaNMT on both semantic similarity and fluency. Finally, we use ParaBank to train a monolingual NMT model with the same support for lexically-constrained decoding for sentence rewriting tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/23/2019

Reference Network for Neural Machine Translation

Neural Machine Translation (NMT) has achieved notable success in recent ...
research
07/21/2021

Guided Generation of Cause and Effect

We present a conditional text generation framework that posits sententia...
research
03/20/2023

Towards Reliable Neural Machine Translation with Consistency-Aware Meta-Learning

Neural machine translation (NMT) has achieved remarkable success in prod...
research
08/11/2020

Paraphrase Generation as Zero-Shot Multilingual Translation: Disentangling Semantic Similarity from Lexical and Syntactic Diversity

Recent work has shown that a multilingual neural machine translation (NM...
research
09/01/2018

Simple Fusion: Return of the Language Model

Neural Machine Translation (NMT) typically leverages monolingual data in...
research
04/11/2017

Unfolding and Shrinking Neural Machine Translation Ensembles

Ensembling is a well-known technique in neural machine translation (NMT)...
research
09/30/2022

QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation

With the recent advance in neural machine translation demonstrating its ...

Please sign up or login with your details

Forgot password? Click here to reset