Capsule-Transformer for Neural Machine Translation

04/30/2020
by   Sufeng Duan, et al.
0

Transformer hugely benefits from its key design of the multi-head self-attention network (SAN), which extracts information from various perspectives through transforming the given input into different subspaces. However, its simple linear transformation aggregation strategy may still potentially fail to fully capture deeper contextualized information. In this paper, we thus propose the capsule-Transformer, which extends the linear transformation into a more general capsule routing algorithm by taking SAN as a special case of capsule network. So that the resulted capsule-Transformer is capable of obtaining a better attention distribution representation of the input sequence via information aggregation among different heads and words. Specifically, we see groups of attention weights in SAN as low layer capsules. By applying the iterative capsule routing algorithm they can be further aggregated into high layer capsules which contain deeper contextualized information. Experimental results on the widely-used machine translation datasets show our proposed capsule-Transformer outperforms strong Transformer baseline significantly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2019

Improving Multi-Head Attention with Capsule Networks

Multi-head attention advances neural machine translation by working out ...
research
02/15/2019

Dynamic Layer Aggregation for Neural Machine Translation with Routing-by-Agreement

With the promising progress of deep neural networks, layer aggregation h...
research
04/05/2019

Information Aggregation for Multi-Head Attention with Routing-by-Agreement

Multi-head attention is appealing for its ability to jointly extract dif...
research
10/10/2018

SECaps: A Sequence Enhanced Capsule Model for Charge Prediction

Automatic charge prediction aims to predict appropriate final charges ac...
research
01/16/2021

To Understand Representation of Layer-aware Sequence Encoders as Multi-order-graph

In this paper, we propose a unified explanation of representation for la...
research
07/30/2019

Multi-Kernel Capsule Network for Schizophrenia Identification

Objective: Schizophrenia seriously affects the quality of life. To date,...
research
09/02/2019

Enhancing Context Modeling with a Query-Guided Capsule Network for Document-level Translation

Context modeling is essential to generate coherent and consistent transl...

Please sign up or login with your details

Forgot password? Click here to reset