A Regularized Framework for Sparse and Structured Neural Attention

05/22/2017
by   Vlad Niculae, et al.
0

Modern neural networks are often augmented with an attention mechanism, which tells the network where to focus within the input. We propose in this paper a new framework for sparse and structured attention, building upon a smoothed max operator. We show that the gradient of this operator defines a mapping from real values to probabilities, suitable as an attention mechanism. Our framework includes softmax and a slight generalization of the recently-proposed sparsemax as special cases. However, we also show how our framework can incorporate modern structured penalties, resulting in more interpretable attention mechanisms, that focus on entire segments or groups of an input. We derive efficient algorithms to compute the forward and backward passes of our attention mechanisms, enabling their use in a neural network trained with backpropagation. To showcase their potential as a drop-in replacement for existing ones, we evaluate our attention mechanisms on three large-scale tasks: textual entailment, machine translation, and sentence summarization. Our attention mechanisms improve interpretability without sacrificing performance; notably, on textual entailment and summarization, we outperform the standard attention mechanisms based on softmax and sparsemax.

READ FULL TEXT

page 13

page 16

page 17

page 19

page 20

page 21

research
09/19/2016

A Cheap Linear Attention Mechanism with Fast Lookups and Fixed-Size Representations

The softmax content-based attention mechanism has proven to be very bene...
research
10/29/2019

Contrastive Attention Mechanism for Abstractive Sentence Summarization

We propose a contrastive attention mechanism to extend the sequence-to-s...
research
10/29/2018

On Controllable Sparse Alternatives to Softmax

Converting an n-dimensional vector to a probability distribution over n ...
research
04/07/2020

Salience Estimation with Multi-Attention Learning for Abstractive Text Summarization

Attention mechanism plays a dominant role in the sequence generation mod...
research
03/09/2017

A Structured Self-attentive Sentence Embedding

This paper proposes a new model for extracting an interpretable sentence...
research
11/21/2021

Efficient Softmax Approximation for Deep Neural Networks with Attention Mechanism

There has been a rapid advance of custom hardware (HW) for accelerating ...
research
11/10/2019

Understanding Multi-Head Attention in Abstractive Summarization

Attention mechanisms in deep learning architectures have often been used...

Please sign up or login with your details

Forgot password? Click here to reset