Incorporating BERT into Parallel Sequence Decoding with Adapters

10/13/2020
by   Junliang Guo, et al.
0

While large scale pre-trained language models such as BERT have achieved great success on various natural language understanding tasks, how to efficiently and effectively incorporate them into sequence-to-sequence models and the corresponding text generation tasks remains a non-trivial problem. In this paper, we propose to address this problem by taking two different BERT models as the encoder and decoder respectively, and fine-tuning them by introducing simple and lightweight adapter modules, which are inserted between BERT layers and tuned on the task-specific dataset. In this way, we obtain a flexible and efficient model which is able to jointly leverage the information contained in the source-side and target-side BERT models, while bypassing the catastrophic forgetting problem. Each component in the framework can be considered as a plug-in unit, making the framework flexible and task agnostic. Our framework is based on a parallel sequence decoding algorithm named Mask-Predict considering the bi-directional and conditional independent nature of BERT, and can be adapted to traditional autoregressive decoding easily. We conduct extensive experiments on neural machine translation tasks where the proposed method consistently outperforms autoregressive baselines while reducing the inference latency by half, and achieves 36.49/33.57 BLEU scores on IWSLT14 German-English/WMT14 German-English translation. When adapted to autoregressive decoding, the proposed method achieves 30.60/43.56 BLEU scores on WMT14 English-German/English-French translation, on par with the state-of-the-art baseline models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/10/2019

Distilling the Knowledge of BERT for Text Generation

Large-scale pre-trained language model, such as BERT, has recently achie...
research
05/29/2019

A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models

Undirected neural sequence models such as BERT have received renewed int...
research
08/15/2019

Towards Making the Most of BERT in Neural Machine Translation

GPT-2 and BERT demonstrate the effectiveness of using pre-trained langua...
research
08/26/2018

Semi-Autoregressive Neural Machine Translation

Existing approaches to neural machine translation are typically autoregr...
research
09/09/2021

BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation

The success of bidirectional encoders using masked language models, such...
research
11/08/2016

Unsupervised Pretraining for Sequence to Sequence Learning

Sequence to sequence models are successful tools for supervised sequence...
research
03/01/2022

E-LANG: Energy-Based Joint Inferencing of Super and Swift Language Models

Building huge and highly capable language models has been a trend in the...

Please sign up or login with your details

Forgot password? Click here to reset