Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translation

10/24/2020
by   Yongchang Hao, et al.
0

Non-Autoregressive machine Translation (NAT) models have demonstrated significant inference speedup but suffer from inferior translation accuracy. The common practice to tackle the problem is transferring the Autoregressive machine Translation (AT) knowledge to NAT models, e.g., with knowledge distillation. In this work, we hypothesize and empirically verify that AT and NAT encoders capture different linguistic properties and representations of source sentences. Therefore, we propose to adopt the multi-task learning to transfer the AT knowledge to NAT models through the encoder sharing. Specifically, we take the AT model as an auxiliary task to enhance NAT model performance. Experimental results on WMT14 English->German and WMT16 English->Romanian datasets show that the proposed multi-task NAT achieves significant improvements over the baseline NAT models. In addition, experimental results demonstrate that our multi-task NAT is complementary to the standard knowledge transfer method, knowledge distillation. Code is publicly available at https://github.com/yongchanghao/multi-task-nat

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/29/2021

Using Perturbed Length-aware Positional Encoding for Non-autoregressive Neural Machine Translation

Non-autoregressive neural machine translation (NAT) usually employs sequ...
research
11/07/2019

Understanding Knowledge Distillation in Non-autoregressive Machine Translation

Non-autoregressive machine translation (NAT) systems predict a sequence ...
research
11/11/2019

Graph Representation Learning via Multi-task Knowledge Distillation

Machine learning on graph structured data has attracted much research in...
research
06/02/2021

Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation

Knowledge distillation (KD) is commonly used to construct synthetic data...
research
06/12/2018

Multi-Task Neural Models for Translating Between Styles Within and Across Languages

Generating natural language requires conveying content in an appropriate...
research
01/14/2023

: Structured Dataset Preprocessing Annotations for Frictionless Extreme Multi-Task Learning and Evaluation

The HuggingFace Datasets Hub hosts thousands of datasets. This provides ...
research
10/19/2022

A baseline revisited: Pushing the limits of multi-segment models for context-aware translation

This paper addresses the task of contextual translation using multi-segm...

Please sign up or login with your details

Forgot password? Click here to reset