Progressive Knowledge Distillation: Building Ensembles for Efficient Inference

02/20/2023
by   Don Kurian Dennis, et al.
0

We study the problem of progressive distillation: Given a large, pre-trained teacher model g, we seek to decompose the model into an ensemble of smaller, low-inference cost student models f_i. The resulting ensemble allows for flexibly tuning accuracy vs. inference cost, which is useful for a number of applications in on-device inference. The method we propose, B-DISTIL, relies on an algorithmic procedure that uses function composition over intermediate activations to construct expressive ensembles with similar performance as g, but with much smaller student models. We demonstrate the effectiveness of by decomposing pretrained models across standard image, speech, and sensor datasets. We also provide theoretical guarantees for our method in terms of convergence and generalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2022

Knowledge Distillation via Weighted Ensemble of Teaching Assistants

Knowledge distillation in machine learning is the process of transferrin...
research
06/30/2022

Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation

Ensembles of deep neural networks have demonstrated superior performance...
research
06/11/2021

RefBERT: Compressing BERT by Referencing to Pre-computed Representations

Recently developed large pre-trained language models, e.g., BERT, have a...
research
06/04/2021

ERNIE-Tiny : A Progressive Distillation Framework for Pretrained Transformer Compression

Pretrained language models (PLMs) such as BERT adopt a training paradigm...
research
10/01/2022

Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition

The smaller memory bandwidth in smart devices prompts development of sma...
research
10/12/2022

Efficient Knowledge Distillation from Model Checkpoints

Knowledge distillation is an effective approach to learn compact models ...
research
08/04/2019

Measuring the Algorithmic Convergence of Randomized Ensembles: The Regression Setting

When randomized ensemble methods such as bagging and random forests are ...

Please sign up or login with your details

Forgot password? Click here to reset