On the Learning of Non-Autoregressive Transformers

06/13/2022
by   Fei Huang, et al.
0

Non-autoregressive Transformer (NAT) is a family of text generation models, which aims to reduce the decoding latency by predicting the whole sentences in parallel. However, such latency reduction sacrifices the ability to capture left-to-right dependencies, thereby making NAT learning very challenging. In this paper, we present theoretical and empirical analyses to reveal the challenges of NAT learning and propose a unified perspective to understand existing successes. First, we show that simply training NAT by maximizing the likelihood can lead to an approximation of marginal distributions but drops all dependencies between tokens, where the dropped information can be measured by the dataset's conditional total correlation. Second, we formalize many previous objectives in a unified framework and show that their success can be concluded as maximizing the likelihood on a proxy distribution, leading to a reduced information loss. Empirical studies show that our perspective can explain the phenomena in NAT learning and guide the design of new training methods.

READ FULL TEXT
research
06/01/2020

Cascaded Text Generation with Markov Transformers

The two dominant approaches to neural text generation are fully autoregr...
research
09/14/2021

Non-autoregressive Transformer with Unified Bidirectional Decoder for Automatic Speech Recognition

Non-autoregressive (NAR) transformer models have been studied intensivel...
research
12/12/2021

Towards More Efficient Insertion Transformer with Fractional Positional Encoding

Auto-regressive neural sequence models have been shown to be effective a...
research
02/08/2019

Insertion Transformer: Flexible Sequence Generation via Insertion Operations

We present the Insertion Transformer, an iterative, partially autoregres...
research
12/22/2021

Diformer: Directional Transformer for Neural Machine Translation

Autoregressive (AR) and Non-autoregressive (NAR) models have their own s...
research
05/06/2023

An Adversarial Non-Autoregressive Model for Text Generation with Incomplete Information

Non-autoregressive models have been widely studied in the Complete Infor...

Please sign up or login with your details

Forgot password? Click here to reset