Logit-Based Ensemble Distribution Distillation for Robust Autoregressive Sequence Uncertainties

05/17/2023
by   Yassir Fathullah, et al.
0

Efficiently and reliably estimating uncertainty is an important objective in deep learning. It is especially pertinent to autoregressive sequence tasks, where training and inference costs are typically very high. However, existing research has predominantly focused on tasks with static data such as image classification. In this work, we investigate Ensemble Distribution Distillation (EDD) applied to large-scale natural language sequence-to-sequence data. EDD aims to compress the superior uncertainty performance of an expensive (teacher) ensemble into a cheaper (student) single model. Importantly, the ability to separate knowledge (epistemic) and data (aleatoric) uncertainty is retained. Existing probability-space approaches to EDD, however, are difficult to scale to large vocabularies. We show, for modern transformer architectures on large-scale translation tasks, that modelling the ensemble logits, instead of softmax probabilities, leads to significantly better students. Moreover, the students surprisingly even outperform Deep Ensembles by up to  10 out-of-distribution detection, whilst matching them at in-distribution translation.

READ FULL TEXT
research
03/15/2022

Self-Distribution Distillation: Efficient Uncertainty Estimation

Deep learning is increasingly being applied in safety-critical domains. ...
research
02/26/2020

A general framework for ensemble distribution distillation

Ensembles of neural networks have been shown to give better performance ...
research
05/14/2021

Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

Ensembles of machine learning models yield improved system performance a...
research
02/18/2020

Uncertainty in Structured Prediction

Uncertainty estimation is important for ensuring safety and robustness o...
research
06/30/2022

Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation

Ensembles of deep neural networks have demonstrated superior performance...
research
03/17/2023

DUDES: Deep Uncertainty Distillation using Ensembles for Semantic Segmentation

Deep neural networks lack interpretability and tend to be overconfident,...
research
06/05/2022

Functional Ensemble Distillation

Bayesian models have many desirable properties, most notable is their ab...

Please sign up or login with your details

Forgot password? Click here to reset