Layer-Wise Cross-View Decoding for Sequence-to-Sequence Learning

by   Fenglin Liu, et al.
Peking University

In sequence-to-sequence learning, the attention mechanism has been a great success in bridging the information between the encoder and the decoder. However, it is often overlooked that the decoder only has a single view of the source sequences, that is, the representations generated by the last encoder layer, which is supposed to be a global view of source sequences. Such implementation hinders the decoder from concrete, fine-grained, local source information. In this work, we explore to reuse the representations from different encoder layers for layer-wise cross-view decoding, that is, different views of the source sequences are presented to different decoder layers. We investigate multiple, representative strategies for cross-view coding, of which the granularity consistent attention (GCA) strategy proves the most efficient and effective in the experiments on neural machine translation task. Especially, GCA surpasses the previous state-of-the-art architecture on three machine translation datasets.


page 1

page 2

page 3

page 4


Balancing Cost and Benefit with Tied-Multi Transformers

We propose and evaluate a novel procedure for training multiple Transfor...

Input Combination Strategies for Multi-Source Transformer Decoder

In multi-source sequence-to-sequence tasks, the attention mechanism can ...

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning

Encoder layer fusion (EncoderFusion) is a technique to fuse all the enco...

Quantum Statistics-Inspired Neural Attention

Sequence-to-sequence (encoder-decoder) models with attention constitute ...

Layer-Wise Multi-View Learning for Neural Machine Translation

Traditional neural machine translation is limited to the topmost encoder...

Sequential Attention Source Identification Based on Feature Representation

Snapshot observation based source localization has been widely studied d...

Learn to Compose Syntactic and Semantic Representations Appropriately for Compositional Generalization

Recent studies have shown that sequence-to-sequence (Seq2Seq) models are...

Please sign up or login with your details

Forgot password? Click here to reset