Bayesian Attention Modules

10/20/2020
by   Xinjie Fan, et al.
0

Attention modules, as simple and effective tools, have not only enabled deep neural networks to achieve state-of-the-art results in many domains, but also enhanced their interpretability. Most current models use deterministic attention modules due to their simplicity and ease of optimization. Stochastic counterparts, on the other hand, are less popular despite their potential benefits. The main reason is that stochastic attention often introduces optimization issues or requires significant model changes. In this paper, we propose a scalable stochastic version of attention that is easy to implement and optimize. We construct simplex-constrained attention distributions by normalizing reparameterizable distributions, making the training process differentiable. We learn their parameters in a Bayesian framework where a data-dependent prior is introduced for regularization. We apply the proposed stochastic attention modules to various attention-based models, with applications to graph node classification, visual question answering, image captioning, machine translation, and language understanding. Our experiments show the proposed method brings consistent improvements over the corresponding baselines.

READ FULL TEXT

page 6

page 18

research
06/09/2021

Bayesian Attention Belief Networks

Attention-based neural networks have achieved state-of-the-art results o...
research
10/25/2021

Alignment Attention by Matching Key and Query Distributions

The neural attention mechanism has been incorporated into deep neural ne...
research
03/19/2020

Normalized and Geometry-Aware Self-Attention Network for Image Captioning

Self-attention (SA) network has shown profound value in image captioning...
research
03/06/2021

Contextual Dropout: An Efficient Sample-Dependent Dropout Module

Dropout has been demonstrated as a simple and effective module to not on...
research
10/19/2022

Prophet Attention: Predicting Attention with Future Attention for Improved Image Captioning

Recently, attention based models have been used extensively in many sequ...
research
10/30/2018

Gated Hierarchical Attention for Image Captioning

Attention modules connecting encoder and decoders have been widely appli...
research
11/22/2019

Optimizing Data Usage via Differentiable Rewards

To acquire a new skill, humans learn better and faster if a tutor, based...

Please sign up or login with your details

Forgot password? Click here to reset