Recurrent multiple shared layers in Depth for Neural Machine Translation

08/23/2021
by   Guoliang Li, et al.
0

Learning deeper models is usually a simple and effective approach to improve model performance, but deeper models have larger model parameters and are more difficult to train. To get a deeper model, simply stacking more layers of the model seems to work well, but previous works have claimed that it cannot benefit the model. We propose to train a deeper model with recurrent mechanism, which loops the encoder and decoder blocks of Transformer in the depth direction. To address the increasing of model parameters, we choose to share parameters in different recursive moments. We conduct our experiments on WMT16 English-to-German and WMT14 English-to-France translation tasks, our model outperforms the shallow Transformer-Base/Big baseline by 0.35, 1.45 BLEU points, which is 27.23 deep Transformer(20-layer encoder, 6-layer decoder), our model has similar model performance and infer speed, but our model parameters are 54.72 former.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2019

Learning Deep Transformer Models for Machine Translation

Transformer is the state-of-the-art model in recent machine translation ...
research
05/10/2023

Multi-Path Transformer is Better: A Case Study on Neural Machine Translation

For years the model performance in machine learning obeyed a power-law r...
research
10/28/2022

LegoNet: A Fast and Exact Unlearning Architecture

Machine unlearning aims to erase the impact of specific training samples...
research
11/10/2019

Two-Headed Monster And Crossed Co-Attention Networks

This paper presents some preliminary investigations of a new co-attentio...
research
09/23/2020

Multi-Pass Transformer for Machine Translation

In contrast with previous approaches where information flows only toward...
research
03/21/2020

Analyzing Word Translation of Transformer Layers

The Transformer translation model is popular for its effective paralleli...
research
12/22/2021

Joint-training on Symbiosis Networks for Deep Nueral Machine Translation models

Deep encoders have been proven to be effective in improving neural machi...

Please sign up or login with your details

Forgot password? Click here to reset