Residual Tree Aggregation of Layers for Neural Machine Translation

07/19/2021
by   Guoliang Li, et al.
0

Although attention-based Neural Machine Translation has achieved remarkable progress in recent layers, it still suffers from issue of making insufficient use of the output of each layer. In transformer, it only uses the top layer of encoder and decoder in the subsequent process, which makes it impossible to take advantage of the useful information in other layers. To address this issue, we propose a residual tree aggregation of layers for Transformer(RTAL), which helps to fuse information across layers. Specifically, we try to fuse the information across layers by constructing a post-order binary tree. In additional to the last node, we add the residual connection to the process of generating child nodes. Our model is based on the Neural Machine Translation model Transformer and we conduct our experiments on WMT14 English-to-German and WMT17 English-to-France translation tasks. Experimental results across language pairs show that the proposed approach outperforms the strong baseline model significantly

READ FULL TEXT
research
02/15/2019

Dynamic Layer Aggregation for Neural Machine Translation with Routing-by-Agreement

With the promising progress of deep neural networks, layer aggregation h...
research
10/24/2018

Exploiting Deep Representations for Neural Machine Translation

Advanced neural machine translation (NMT) models generally implement enc...
research
10/16/2019

Using Whole Document Context in Neural Machine Translation

In Machine Translation, considering the document as a whole can help to ...
research
05/10/2023

Multi-Path Transformer is Better: A Case Study on Neural Machine Translation

For years the model performance in machine learning obeyed a power-law r...
research
06/28/2019

Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts

The transformer is a state-of-the-art neural translation model that uses...
research
09/23/2020

Multi-Pass Transformer for Machine Translation

In contrast with previous approaches where information flows only toward...
research
01/03/2021

An Efficient Transformer Decoder with Compressed Sub-layers

The large attention-based encoder-decoder network (Transformer) has beco...

Please sign up or login with your details

Forgot password? Click here to reset