Principled Paraphrase Generation with Parallel Corpora

05/24/2022
by   Aitor Ormazabal, et al.
0

Round-trip Machine Translation (MT) is a popular choice for paraphrase generation, which leverages readily available parallel corpora for supervision. In this paper, we formalize the implicit similarity function induced by this approach, and show that it is susceptible to non-paraphrase pairs sharing a single ambiguous translation. Based on these insights, we design an alternative similarity metric that mitigates this issue by requiring the entire translation distribution to match, and implement a relaxation of it through the Information Bottleneck method. Our approach incorporates an adversarial term into MT training in order to learn representations that encode as much information about the reference translation as possible, while keeping as little information about the input as possible. Paraphrases can be generated by decoding back to the source from this representation, without having to generate pivot translations. In addition to being more principled and efficient than round-trip MT, our approach offers an adjustable parameter to control the fidelity-diversity trade-off, and obtains better results in our experiments.

READ FULL TEXT
research
10/05/2016

Word2Vec vs DBnary: Augmenting METEOR using Vector Representations or Lexical Resources?

This paper presents an approach combining lexico-semantic resources and ...
research
04/30/2020

Simulated Multiple Reference Training Improves Low-Resource Machine Translation

Many valid translations exist for a given sentence, and yet machine tran...
research
12/16/2021

Isometric MT: Neural Machine Translation for Automatic Dubbing

Automatic dubbing (AD) is among the use cases where translations should ...
research
06/20/2023

EvolveMT: an Ensemble MT Engine Improving Itself with Usage Only

This paper presents EvolveMT for efficiently combining multiple machine ...
research
09/02/2018

MTNT: A Testbed for Machine Translation of Noisy Text

Noisy or non-standard input text can cause disastrous mistranslations in...
research
10/17/2020

Incorporate Semantic Structures into Machine Translation Evaluation via UCCA

Copying mechanism has been commonly used in neural paraphrasing networks...
research
09/04/2021

Pushing Paraphrase Away from Original Sentence: A Multi-Round Paraphrase Generation Approach

In recent years, neural paraphrase generation based on Seq2Seq has achie...

Please sign up or login with your details

Forgot password? Click here to reset