Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

05/14/2021
by   Max Ryabinin, et al.
0

Ensembles of machine learning models yield improved system performance as well as robust and interpretable uncertainty estimates; however, their inference costs may often be prohibitively high. Ensemble Distribution Distillation is an approach that allows a single model to efficiently capture both the predictive performance and uncertainty estimates of an ensemble. For classification, this is achieved by training a Dirichlet distribution over the ensemble members' output distributions via the maximum likelihood criterion. Although theoretically principled, this criterion exhibits poor convergence when applied to large-scale tasks where the number of classes is very high. In our work, we analyze this effect and show that the Dirichlet log-likelihood criterion classes with low probability induce larger gradients than high-probability classes. This forces the model to focus on the distribution of the ensemble tail-class probabilities. We propose a new training objective that minimizes the reverse KL-divergence to a Proxy-Dirichlet target derived from the ensemble. This loss resolves the gradient issues of Ensemble Distribution Distillation, as we demonstrate both theoretically and empirically on the ImageNet and WMT17 En-De datasets containing 1000 and 40,000 classes, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2019

Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness

Ensemble approaches for uncertainty estimation have recently been applie...
research
04/30/2019

Ensemble Distribution Distillation

Ensemble of Neural Network (NN) models are known to yield improvements i...
research
05/17/2023

Logit-Based Ensemble Distribution Distillation for Robust Autoregressive Sequence Uncertainties

Efficiently and reliably estimating uncertainty is an important objectiv...
research
10/05/2022

Meta-Ensemble Parameter Learning

Ensemble of machine learning models yields improved performance as well ...
research
06/20/2020

Regression Prior Networks

Prior Networks are a recently developed class of models which yield inte...
research
06/12/2019

Efficient Evaluation-Time Uncertainty Estimation by Improved Distillation

In this work we aim to obtain computationally-efficient uncertainty esti...
research
03/10/2023

Long-tailed Classification from a Bayesian-decision-theory Perspective

Long-tailed classification poses a challenge due to its heavy imbalance ...

Please sign up or login with your details

Forgot password? Click here to reset