Pathologies in priors and inference for Bayesian transformers

10/08/2021
by   Tristan Cinquin, et al.
0

In recent years, the transformer has established itself as a workhorse in many applications ranging from natural language processing to reinforcement learning. Similarly, Bayesian deep learning has become the gold-standard for uncertainty estimation in safety-critical applications, where robustness and calibration are crucial. Surprisingly, no successful attempts to improve transformer models in terms of predictive uncertainty using Bayesian inference exist. In this work, we study this curiously underpopulated area of Bayesian transformers. We find that weight-space inference in transformers does not work well, regardless of the approximate posterior. We also find that the prior is at least partially at fault, but that it is very hard to find well-specified weight priors for these models. We hypothesize that these problems stem from the complexity of obtaining a meaningful mapping from weight-space to function-space distributions in the transformer. Therefore, moving closer to function-space, we propose a novel method based on the implicit reparameterization of the Dirichlet distribution to apply variational inference directly to the attention weights. We find that this proposed method performs competitively with our baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2023

Calibrating Transformers via Sparse Gaussian Processes

Transformer models have achieved profound success in prediction tasks in...
research
10/16/2018

The Deep Weight Prior. Modeling a prior distribution for CNNs using generative models

Bayesian inference is known to provide a general framework for incorpora...
research
09/21/2023

Bayesian sparsification for deep neural networks with Bayesian model reduction

Deep learning's immense capabilities are often constrained by the comple...
research
10/23/2020

Stabilizing Transformer-Based Action Sequence Generation For Q-Learning

Since the publication of the original Transformer architecture (Vaswani ...
research
10/08/2022

Unified Probabilistic Neural Architecture and Weight Ensembling Improves Model Robustness

Robust machine learning models with accurately calibrated uncertainties ...
research
05/22/2023

Neural Functional Transformers

The recent success of neural networks as implicit representation of data...
research
11/23/2021

Weight Pruning and Uncertainty in Radio Galaxy Classification

In this work we use variational inference to quantify the degree of epis...

Please sign up or login with your details

Forgot password? Click here to reset