Understanding How Encoder-Decoder Architectures Attend

10/28/2021
by   Kyle Aitken, et al.
29

Encoder-decoder networks with attention have proven to be a powerful way to solve many sequence-to-sequence tasks. In these networks, attention aligns encoder and decoder states and is often used for visualizing network behavior. However, the mechanisms used by networks to generate appropriate attention matrices are still mysterious. Moreover, how these mechanisms vary depending on the particular architecture used for the encoder and decoder (recurrent, feed-forward, etc.) are also not well understood. In this work, we investigate how encoder-decoder networks solve different sequence-to-sequence tasks. We introduce a way of decomposing hidden states over a sequence into temporal (independent of input) and input-driven (independent of sequence position) components. This reveals how attention matrices are formed: depending on the task requirements, networks rely more heavily on either the temporal or input-driven components. These findings hold across both recurrent and feed-forward architectures despite their differences in forming the temporal components. Overall, our results provide new insight into the inner workings of attention-based encoder-decoder networks.

READ FULL TEXT

page 6

page 7

page 20

research
11/12/2018

Input Combination Strategies for Multi-Source Transformer Decoder

In multi-source sequence-to-sequence tasks, the attention mechanism can ...
research
02/04/2023

Greedy Ordering of Layer Weight Matrices in Transformers Improves Translation

Prior work has attempted to understand the internal structures and funct...
research
11/09/2019

Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models

Inspired by modular software design principles of independence, intercha...
research
02/28/2020

Temporal Convolutional Attention-based Network For Sequence Modeling

With the development of feed-forward models, the default model for seque...
research
05/29/2019

Dimension Reduction Approach for Interpretability of Sequence to Sequence Recurrent Neural Networks

Encoder-decoder recurrent neural network models (Seq2Seq) have achieved ...
research
06/07/2019

Assessing incrementality in sequence-to-sequence models

Since their inception, encoder-decoder models have successfully been app...
research
04/28/2018

CRAM: Clued Recurrent Attention Model

To overcome the poor scalability of convolutional neural network, recurr...

Please sign up or login with your details

Forgot password? Click here to reset