Paraphrase Generation as Zero-Shot Multilingual Translation: Disentangling Semantic Similarity from Lexical and Syntactic Diversity

08/11/2020 ∙ by Brian Thompson, et al. ∙ 0

Recent work has shown that a multilingual neural machine translation (NMT) model can be used to judge how well a sentence paraphrases another sentence in the same language; however, attempting to generate paraphrases from the model using beam search produces trivial copies or near copies. We introduce a simple paraphrase generation algorithm which discourages the production of n-grams that are present in the input. Our approach enables paraphrase generation in many languages from a single multilingual NMT model. Furthermore, the trade-off between semantic similarity and lexical/syntactic diversity between the input and output can be controlled at generation time. We conduct human evaluation to compare our method to a paraphraser trained on a large English synthetic paraphrase database and find that our model produces paraphrases that better preserve semantic meaning and grammatically, for the same level of lexical/syntactic diversity. Additional smaller human assessments demonstrate our approach also works in non-English languages.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.