Dynamic Layer Aggregation for Neural Machine Translation with Routing-by-Agreement

02/15/2019
by   Zi-Yi Dou, et al.
0

With the promising progress of deep neural networks, layer aggregation has been used to fuse information across layers in various fields, such as computer vision and machine translation. However, most of the previous methods combine layers in a static fashion in that their aggregation strategy is independent of specific hidden states. Inspired by recent progress on capsule networks, in this paper we propose to use routing-by-agreement strategies to aggregate layers dynamically. Specifically, the algorithm learns the probability of a part (individual layer representations) assigned to a whole (aggregated representations) in an iterative way and combines parts accordingly. We implement our algorithm on top of the state-of-the-art neural machine translation model TRANSFORMER and conduct experiments on the widely-used WMT14 English-German and WMT17 Chinese-English translation datasets. Experimental results across language pairs show that the proposed approach consistently outperforms the strong baseline model and a representative static aggregation model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2021

Residual Tree Aggregation of Layers for Neural Machine Translation

Although attention-based Neural Machine Translation has achieved remarka...
research
04/30/2020

Capsule-Transformer for Neural Machine Translation

Transformer hugely benefits from its key design of the multi-head self-a...
research
10/24/2018

Exploiting Deep Representations for Neural Machine Translation

Advanced neural machine translation (NMT) models generally implement enc...
research
08/31/2019

Improving Multi-Head Attention with Capsule Networks

Multi-head attention advances neural machine translation by working out ...
research
11/01/2018

Towards Linear Time Neural Machine Translation with Capsule Networks

In this study, we first investigate a novel capsule network with dynamic...
research
04/05/2019

Information Aggregation for Multi-Head Attention with Routing-by-Agreement

Multi-head attention is appealing for its ability to jointly extract dif...
research
06/05/2018

Information Aggregation via Dynamic Routing for Sequence Encoding

While much progress has been made in how to encode a text sequence into ...

Please sign up or login with your details

Forgot password? Click here to reset