Hydra: Preserving Ensemble Diversity for Model Distillation

01/14/2020
by   Linh Tran, et al.
1

Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty. However, they are expensive in computation and memory. Therefore, recent research has focused on distilling ensembles into a single compact model, reducing the computational and memory burden of the ensemble while trying to preserve its predictive behavior. Most existing distillation formulations summarize the ensemble by capturing its average predictions. As a result, the diversity of the ensemble predictions, stemming from each individual member, is lost. Thus, the distilled model cannot provide a measure of uncertainty comparable to that of the original ensemble. To retain more faithfully the diversity of the ensemble, we propose a distillation method based on a single multi-headed neural network, which we refer to as Hydra. The shared body network learns a joint feature representation that enables each head to capture the predictive behavior of each ensemble member. We demonstrate that with a slight increase in parameter count, Hydra improves distillation performance on classification and regression settings while capturing the uncertainty behaviour of the original ensemble over both in-domain and out-of-distribution tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2020

A general framework for ensemble distribution distillation

Ensembles of neural networks have been shown to give better performance ...
research
04/30/2019

Ensemble Distribution Distillation

Ensemble of Neural Network (NN) models are known to yield improvements i...
research
10/27/2021

Diversity Matters When Learning From Ensembles

Deep ensembles excel in large-scale image classification tasks both in t...
research
06/18/2019

Maximizing Overall Diversity for Improved Uncertainty Estimates in Deep Ensembles

The inaccuracy of neural network models on inputs that do not stem from ...
research
07/09/2021

Multi-headed Neural Ensemble Search

Ensembles of CNN models trained with different seeds (also known as Deep...
research
05/19/2022

Simple Regularisation for Uncertainty-Aware Knowledge Distillation

Considering uncertainty estimation of modern neural networks (NNs) is on...
research
06/20/2020

Regression Prior Networks

Prior Networks are a recently developed class of models which yield inte...

Please sign up or login with your details

Forgot password? Click here to reset