LAVA NAT: A Non-Autoregressive Translation Model with Look-Around Decoding and Vocabulary Attention

02/08/2020
by   Xiaoya Li, et al.
0

Non-autoregressive translation (NAT) models generate multiple tokens in one forward pass and is highly efficient at inference stage compared with autoregressive translation (AT) methods. However, NAT models often suffer from the multimodality problem, i.e., generating duplicated tokens or missing tokens. In this paper, we propose two novel methods to address this issue, the Look-Around (LA) strategy and the Vocabulary Attention (VA) mechanism. The Look-Around strategy predicts the neighbor tokens in order to predict the current token, and the Vocabulary Attention models long-term token dependencies inside the decoder by attending the whole vocabulary for each position to acquire knowledge of which token is about to generate. dynamic bidirectional decoding approach to accelerate the inference process of the LAVA model while preserving the high-quality of the generated output. Our proposed model uses significantly less time during inference compared with autoregressive models and most other NAT models. Our experiments on four benchmarks (WMT14 En→De, WMT14 De→En, WMT16 Ro→En and IWSLT14 De→En) show that the proposed model achieves competitive performance compared with the state-of-the-art non-autoregressive and autoregressive models while significantly reducing the time cost in inference phase.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/15/2020

Parallel Machine Translation with Disentangled Context Transformer

State-of-the-art neural machine translation models generate a translatio...
research
06/09/2020

Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation

Non-autoregressive neural machine translation (NAT) predicts the entire ...
research
07/05/2023

SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference

Autoregressive large language models (LLMs) have made remarkable progres...
research
03/30/2020

Pruned Wasserstein Index Generation Model and wigpy Package

Recent proposal of Wasserstein Index Generation model (WIG) has shown a ...
research
12/20/2022

What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary

Dual encoders are now the dominant architecture for dense retrieval. Yet...
research
05/22/2023

Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference

In the rapidly evolving field of deep learning, the performance of model...
research
01/27/2023

Candidate Soups: Fusing Candidate Results Improves Translation Quality for Non-Autoregressive Translation

Non-autoregressive translation (NAT) model achieves a much faster infere...

Please sign up or login with your details

Forgot password? Click here to reset