Unsupervised Paraphrase Generation via Dynamic Blocking

10/24/2020
by   Tong Niu, et al.
5

We propose Dynamic Blocking, a decoding algorithm which enables large-scale pretrained autoregressive models (such as BART, T5, GPT-2 and XLNet) to generate high-quality paraphrases in an unsupervised setting. In order to obtain an alternative surface form, whenever the language model emits a token that is present in the source sequence, we prevent the model from generating the subsequent source token for the next time step. We show that our approach achieves state-of-the-art results on benchmark datasets when compared to previous unsupervised approaches, and is even comparable with strong supervised, in-domain models. We also propose a new automatic metric based on self-BLEU and BERTscore which not only discourages the model from copying the input through, but also evaluates text similarity based on distributed representations, hence avoiding reliance on exact keyword matching. In addition, we demonstrate that our model generalizes across languages without any additional training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/25/2022

Comparison Study Between Token Classification and Sequence Classification In Text Classification

Unsupervised Machine Learning techniques have been applied to Natural La...
research
05/23/2022

A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation

Non-Autoregressive generation is a sequence generation paradigm, which r...
research
04/05/2022

LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition

Code comment generation is the task of generating a high-level natural l...
research
09/09/2019

Unsupervised Paraphrasing by Simulated Annealing

Unsupervised paraphrase generation is a promising and important research...
research
06/19/2023

Unsupervised Open-domain Keyphrase Generation

In this work, we study the problem of unsupervised open-domain keyphrase...
research
06/10/2023

Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC

Non-autoregressive approaches aim to improve the inference speed of tran...
research
05/23/2023

FOCUS: Effective Embedding Initialization for Specializing Pretrained Multilingual Models on a Single Language

Using model weights pretrained on a high-resource language as a warm sta...

Please sign up or login with your details

Forgot password? Click here to reset