Paraphrase Generation as Zero-Shot Multilingual Translation: Disentangling Semantic Similarity from Lexical and Syntactic Diversity

08/11/2020
by   Brian Thompson, et al.
0

Recent work has shown that a multilingual neural machine translation (NMT) model can be used to judge how well a sentence paraphrases another sentence in the same language; however, attempting to generate paraphrases from the model using beam search produces trivial copies or near copies. We introduce a simple paraphrase generation algorithm which discourages the production of n-grams that are present in the input. Our approach enables paraphrase generation in many languages from a single multilingual NMT model. Furthermore, the trade-off between semantic similarity and lexical/syntactic diversity between the input and output can be controlled at generation time. We conduct human evaluation to compare our method to a paraphraser trained on a large English synthetic paraphrase database and find that our model produces paraphrases that better preserve semantic meaning and grammatically, for the same level of lexical/syntactic diversity. Additional smaller human assessments demonstrate our approach also works in non-English languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/28/2023

Multilingual Lexical Simplification via Paraphrase Generation

Lexical simplification (LS) methods based on pretrained language models ...
research
08/21/2018

Translational Grounding: Using Paraphrase Recognition and Generation to Demonstrate Semantic Abstraction Abilities of MultiLingual NMT

In this paper, we investigate whether multilingual neural translation mo...
research
01/11/2019

ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation

We present ParaBank, a large-scale English paraphrase dataset that surpa...
research
10/02/2021

Improving Zero-shot Multilingual Neural Machine Translation for Low-Resource Languages

Although the multilingual Neural Machine Translation(NMT), which extends...
research
06/01/2022

Exploring Diversity in Back Translation for Low-Resource Machine Translation

Back translation is one of the most widely used methods for improving th...
research
08/25/2018

Paraphrases as Foreign Languages in Multilingual Neural Machine Translation

Using paraphrases, the expression of the same semantic meaning in differ...
research
04/28/2022

NMTScore: A Multilingual Analysis of Translation-based Text Similarity Measures

Being able to rank the similarity of short text segments is an interesti...

Please sign up or login with your details

Forgot password? Click here to reset