Calibrating Transformers via Sparse Gaussian Processes

03/04/2023
by   Wenlong Chen, et al.
0

Transformer models have achieved profound success in prediction tasks in a wide range of applications in natural language processing, speech recognition and computer vision. Extending Transformer's success to safety-critical domains requires calibrated uncertainty estimation which remains under-explored. To address this, we propose Sparse Gaussian Process attention (SGPA), which performs Bayesian inference directly in the output space of multi-head attention blocks (MHAs) in transformer to calibrate its uncertainty. It replaces the scaled dot-product operation with a valid symmetric kernel and uses sparse Gaussian processes (SGP) techniques to approximate the posterior processes of MHA outputs. Empirically, on a suite of prediction tasks on text, images and graphs, SGPA-based Transformers achieve competitive predictive accuracy, while noticeably improving both in-distribution calibration and out-of-distribution robustness and detection.

READ FULL TEXT

page 7

page 14

page 19

page 20

research
10/08/2021

Pathologies in priors and inference for Bayesian transformers

In recent years, the transformer has established itself as a workhorse i...
research
10/16/2021

Transformer with a Mixture of Gaussian Keys

Multi-head attention is a driving force behind state-of-the-art transfor...
research
06/01/2022

Transformer with Fourier Integral Attentions

Multi-head attention empowers the recent success of transformers, the st...
research
04/28/2021

Distributional Gaussian Process Layers for Outlier Detection in Image Segmentation

We propose a parameter efficient Bayesian layer for hierarchical convolu...
research
12/08/2018

Sampling-based Bayesian Inference with gradient uncertainty

Deep neural networks(NNs) have achieved impressive performance, often ex...
research
12/27/2021

Transformer Uncertainty Estimation with Hierarchical Stochastic Attention

Transformers are state-of-the-art in a wide range of NLP tasks and have ...
research
08/16/2023

Can Transformers Learn Optimal Filtering for Unknown Systems?

Transformers have demonstrated remarkable success in natural language pr...

Please sign up or login with your details

Forgot password? Click here to reset