Bayesian Attention Belief Networks

06/09/2021
by   Shujian Zhang, et al.
0

Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. Most such models use deterministic attention while stochastic attention is less explored due to the optimization difficulties or complicated model design. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights with a hierarchy of gamma distributions, and an encoder network by stacking Weibull distributions with a deterministic-upward-stochastic-downward structure to approximate the posterior. The resulting auto-encoding networks can be optimized in a differentiable way with a variational lower bound. It is simple to convert any models with deterministic attention, including pretrained ones, to the proposed Bayesian attention belief networks. On a variety of language understanding tasks, we show that our method outperforms deterministic attention and state-of-the-art stochastic attention in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks. We further demonstrate the general applicability of our method on neural machine translation and visual question answering, showing great potential of incorporating our method into various attention-related tasks.

READ FULL TEXT

page 3

page 4

page 9

research
10/25/2021

Alignment Attention by Matching Key and Query Distributions

The neural attention mechanism has been incorporated into deep neural ne...
research
10/20/2020

Bayesian Attention Modules

Attention modules, as simple and effective tools, have not only enabled ...
research
06/17/2021

Evaluating the Robustness of Bayesian Neural Networks Against Different Types of Attacks

To evaluate the robustness gain of Bayesian neural networks on image cla...
research
08/03/2019

Invariance-based Adversarial Attack on Neural Machine Translation Systems

Recently, NLP models have been shown to be susceptible to adversarial at...
research
02/03/2017

Structured Attention Networks

Attention networks have proven to be an effective approach for embedding...
research
02/10/2015

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Inspired by recent work in machine translation and object detection, we ...
research
02/19/2023

Stochastic Generative Flow Networks

Generative Flow Networks (or GFlowNets for short) are a family of probab...

Please sign up or login with your details

Forgot password? Click here to reset