On the Dynamics of Training Attention Models

11/19/2020
by   Haoye Lu, et al.
0

The attention mechanism has been widely used in deep neural networks as a model component. By now, it has become a critical building block in many state-of-the-art natural language models. Despite its great success established empirically, the working mechanism of attention has not been investigated at a sufficient theoretical depth to date. In this paper, we set up a simple text classification task and study the dynamics of training a simple attention-based classification model using gradient descent. In this setting, we show that, for the discriminative words that the model should attend to, a persisting identity exists relating its embedding and the inner product of its key and the query. This allows us to prove that training must converge to attending to the discriminative words when the attention output is classified by a linear classifier. Experiments are performed, which validates our theoretical analysis and provides further insights.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2020

Text Classification with Lexicon from PreAttention Mechanism

A comprehensive and high-quality lexicon plays a crucial role in traditi...
research
03/14/2023

Input-length-shortening and text generation via attention values

Identifying words that impact a task's performance more than others is a...
research
11/23/2018

Explicit Interaction Model towards Text Classification

Text classification is one of the fundamental tasks in natural language ...
research
07/05/2022

Betti numbers of attention graphs is all you really need

We apply methods of topological analysis to the attention graphs, calcul...
research
12/19/2022

Uncovering the Origins of Instability in Dynamical Systems: How Attention Mechanism Can Help?

The behavior of the network and its stability are governed by both dynam...
research
06/23/2023

Max-Margin Token Selection in Attention Mechanism

Attention mechanism is a central component of the transformer architectu...
research
12/15/2017

Pre-training Attention Mechanisms

Recurrent neural networks with differentiable attention mechanisms have ...

Please sign up or login with your details

Forgot password? Click here to reset