A Study of Non-autoregressive Model for Sequence Generation

04/22/2020
by   Yi Ren, et al.
0

Non-autoregressive (NAR) models generate all the tokens of a sequence in parallel, resulting in faster generation speed compared to their autoregressive (AR) counterparts but at the cost of lower accuracy. Different techniques including knowledge distillation and source-target alignment have been proposed to bridge the gap between AR and NAR models in various tasks such as neural machine translation (NMT), automatic speech recognition (ASR), and text to speech (TTS). With the help of those techniques, NAR models can catch up with the accuracy of AR models in some tasks but not in some others. In this work, we conduct a study to understand the difficulty of NAR sequence generation and try to answer: (1) Why NAR models can catch up with AR models in some tasks but not all? (2) Why techniques like knowledge distillation and source-target alignment can help NAR models. Since the main difference between AR and NAR models is that NAR models do not use dependency among target tokens while AR models do, intuitively the difficulty of NAR sequence generation heavily depends on the strongness of dependency among target tokens. To quantify such dependency, we propose an analysis model called CoMMA to characterize the difficulty of different NAR sequence generation tasks. We have several interesting findings: 1) Among the NMT, ASR and TTS tasks, ASR has the most target-token dependency while TTS has the least. 2) Knowledge distillation reduces the target-token dependency in target sequence and thus improves the accuracy of NAR models. 3) Source-target alignment constraint encourages dependency of a target token on source tokens and thus eases the training of NAR models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2021

Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation

A conventional approach to improving the performance of end-to-end speec...
research
05/09/2021

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Error correction techniques have been used to refine the output sentence...
research
05/23/2022

A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation

Non-Autoregressive generation is a sequence generation paradigm, which r...
research
10/11/2021

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

Non-autoregressive (NAR) models simultaneously generate multiple outputs...
research
04/20/2022

A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

Non-autoregressive (NAR) generation, which is first proposed in neural m...
research
08/06/2020

FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire

Lipreading is an impressive technique and there has been a definite impr...
research
05/27/2021

How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation?

While non-autoregressive (NAR) models are showing great promise for mach...

Please sign up or login with your details

Forgot password? Click here to reset