Functional Ensemble Distillation

06/05/2022
by   Coby Penso, et al.
0

Bayesian models have many desirable properties, most notable is their ability to generalize from limited data and to properly estimate the uncertainty in their predictions. However, these benefits come at a steep computational cost as Bayesian inference, in most cases, is computationally intractable. One popular approach to alleviate this problem is using a Monte-Carlo estimation with an ensemble of models sampled from the posterior. However, this approach still comes at a significant computational cost, as one needs to store and run multiple models at test time. In this work, we investigate how to best distill an ensemble's predictions using an efficient model. First, we argue that current approaches that simply return distribution over predictions cannot compute important properties, such as the covariance between predictions, which can be valuable for further processing. Second, in many limited data settings, all ensemble members achieve nearly zero training loss, namely, they produce near-identical predictions on the training set which results in sub-optimal distilled models. To address both problems, we propose a novel and general distillation approach, named Functional Ensemble Distillation (FED), and we investigate how to best distill an ensemble in this setting. We find that learning the distilled model via a simple augmentation scheme in the form of mixup augmentation significantly boosts the performance. We evaluated our method on several tasks and showed that it achieves superior results in both accuracy and uncertainty estimation compared to current approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2022

Cosine Model Watermarking Against Ensemble Distillation

Many model watermarking methods have been developed to prevent valuable ...
research
03/15/2022

Self-Distribution Distillation: Efficient Uncertainty Estimation

Deep learning is increasingly being applied in safety-critical domains. ...
research
04/30/2022

Deep Ensemble as a Gaussian Process Approximate Posterior

Deep Ensemble (DE) is an effective alternative to Bayesian neural networ...
research
06/30/2022

Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation

Ensembles of deep neural networks have demonstrated superior performance...
research
06/13/2020

Uncertainty Estimation with Infinitesimal Jackknife, Its Distribution and Mean-Field Approximation

Uncertainty quantification is an important research area in machine lear...
research
06/08/2022

Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping

In machine learning, an agent needs to estimate uncertainty to efficient...
research
05/17/2023

Logit-Based Ensemble Distribution Distillation for Robust Autoregressive Sequence Uncertainties

Efficiently and reliably estimating uncertainty is an important objectiv...

Please sign up or login with your details

Forgot password? Click here to reset