Paraphrase Generation as Unsupervised Machine Translation

09/07/2021
by   Chun Fan, et al.
0

In this paper, we propose a new paradigm for paraphrase generation by treating the task as unsupervised machine translation (UMT) based on the assumption that there must be pairs of sentences expressing the same meaning in a large-scale unlabeled monolingual corpus. The proposed paradigm first splits a large unlabeled corpus into multiple clusters, and trains multiple UMT models using pairs of these clusters. Then based on the paraphrase pairs produced by these UMT models, a unified surrogate model can be trained to serve as the final Seq2Seq model to generate paraphrases, which can be directly used for test in the unsupervised setup, or be finetuned on labeled datasets in the supervised setup. The proposed method offers merits over machine-translation-based paraphrase generation methods, as it avoids reliance on bilingual sentence pairs. It also allows human intervene with the model so that more diverse paraphrases can be generated using different filtering criteria. Extensive experiments on existing paraphrase dataset for both the supervised and unsupervised setups demonstrate the effectiveness the proposed paradigm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2021

ConRPG: Paraphrase Generation using Contexts as Regularizer

A long-standing issue with paraphrase generation is how to obtain reliab...
research
05/26/2023

ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR Back-Translation

Paraphrase generation is a long-standing task in natural language proces...
research
07/19/2021

Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages

For most language combinations, parallel data is either scarce or simply...
research
05/29/2019

Unsupervised Paraphrasing without Translation

Paraphrasing exemplifies the ability to abstract semantic content from s...
research
05/17/2022

Consistent Human Evaluation of Machine Translation across Language Pairs

Obtaining meaningful quality scores for machine translation systems thro...
research
05/10/2022

ParaCotta: Synthetic Multilingual Paraphrase Corpora from the Most Diverse Translation Sample Pair

We release our synthetic parallel paraphrase corpus across 17 languages:...
research
12/11/2019

Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling

As a special machine translation task, dialect translation has two main ...

Please sign up or login with your details

Forgot password? Click here to reset