Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation

06/30/2022
by   Giung Nam, et al.
0

Ensembles of deep neural networks have demonstrated superior performance, but their heavy computational cost hinders applying them for resource-limited environments. It motivates distilling knowledge from the ensemble teacher into a smaller student network, and there are two important design choices for this ensemble distillation: 1) how to construct the student network, and 2) what data should be shown during training. In this paper, we propose a weight averaging technique where a student with multiple subnetworks is trained to absorb the functional diversity of ensemble teachers, but then those subnetworks are properly averaged for inference, giving a single student network with no additional inference cost. We also propose a perturbation strategy that seeks inputs from which the diversities of teachers can be better transferred to the student. Combining these two, our method significantly improves upon previous methods on various image classification tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2019

Regularizing Neural Networks by Stochastically Training Layer Ensembles

Dropout and similar stochastic neural network regularization methods are...
research
02/27/2017

Fast and Accurate Inference with Adaptive Ensemble Prediction in Image Classification with Deep Neural Networks

Ensembling multiple predictions is a widely used technique to improve th...
research
02/20/2023

Progressive Knowledge Distillation: Building Ensembles for Efficient Inference

We study the problem of progressive distillation: Given a large, pre-tra...
research
12/06/2018

MEAL: Multi-Model Ensemble via Adversarial Learning

Often the best performing deep neural models are ensembles of multiple b...
research
03/03/2020

Distilled Hierarchical Neural Ensembles with Adaptive Inference Cost

Deep neural networks form the basis of state-of-the-art models across a ...
research
06/05/2022

Functional Ensemble Distillation

Bayesian models have many desirable properties, most notable is their ab...
research
05/17/2023

Logit-Based Ensemble Distribution Distillation for Robust Autoregressive Sequence Uncertainties

Efficiently and reliably estimating uncertainty is an important objectiv...

Please sign up or login with your details

Forgot password? Click here to reset