Joint-training on Symbiosis Networks for Deep Nueral Machine Translation models

12/22/2021
by   Zhengzhe Yu, et al.
0

Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but it reaches the upper bound of translation quality when the number of encoder layers exceeds 18. Worse still, deeper networks consume a lot of memory, making it impossible to train efficiently. In this paper, we present Symbiosis Networks, which include a full network as the Symbiosis Main Network (M-Net) and another shared sub-network with the same structure but less layers as the Symbiotic Sub Network (S-Net). We adopt Symbiosis Networks on Transformer-deep (m-n) architecture and define a particular regularization loss ℒ_τ between the M-Net and S-Net in NMT. We apply joint-training on the Symbiosis Networks and aim to improve the M-Net performance. Our proposed training strategy improves Transformer-deep (12-6) by 0.61, 0.49 and 0.69 BLEU over the baselines under classic training on WMT'14 EN->DE, DE->EN and EN->FR tasks. Furthermore, our Transformer-deep (12-6) even outperforms classic Transformer-deep (18-6).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2020

Shallow-to-Deep Training for Neural Machine Translation

Deep encoders have been proven to be effective in improving neural machi...
research
07/03/2019

Depth Growing for Neural Machine Translation

While very deep neural networks have shown effectiveness for computer vi...
research
12/27/2020

Learning Light-Weight Translation Models from Deep Transformer

Recently, deep models have shown tremendous improvements in neural machi...
research
11/01/2021

Arch-Net: Model Distillation for Architecture Agnostic Model Deployment

Vast requirement of computation power of Deep Neural Networks is a major...
research
04/29/2020

Multiscale Collaborative Deep Models for Neural Machine Translation

Recent evidence reveals that Neural Machine Translation (NMT) models wit...
research
08/23/2021

Recurrent multiple shared layers in Depth for Neural Machine Translation

Learning deeper models is usually a simple and effective approach to imp...
research
02/04/2022

Data Scaling Laws in NMT: The Effect of Noise and Architecture

In this work, we study the effect of varying the architecture and traini...

Please sign up or login with your details

Forgot password? Click here to reset