Neighbourhood Distillation: On the benefits of non end-to-end distillation

10/02/2020
by   Laëtitia Shao, et al.
0

End-to-end training with back propagation is the standard method for training deep neural networks. However, as networks become deeper and bigger, end-to-end training becomes more challenging: highly non-convex models gets stuck easily in local optima, gradients signals are prone to vanish or explode during back-propagation, training requires computational resources and time. In this work, we propose to break away from the end-to-end paradigm in the context of Knowledge Distillation. Instead of distilling a model end-to-end, we propose to split it into smaller sub-networks - also called neighbourhoods - that are then trained independently. We empirically show that distilling networks in a non end-to-end fashion can be beneficial in a diverse range of use cases. First, we show that it speeds up Knowledge Distillation by exploiting parallelism and training on smaller networks. Second, we show that independently distilled neighbourhoods may be efficiently re-used for Neural Architecture Search. Finally, because smaller networks model simpler functions, we show that they are easier to train with synthetic data than their deeper counterparts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2020

Channel Planting for Deep Neural Networks using Knowledge Distillation

In recent years, deeper and wider neural networks have shown excellent p...
research
12/21/2020

Diverse Knowledge Distillation for End-to-End Person Search

Person search aims to localize and identify a specific person from a gal...
research
09/03/2019

Knowledge Distillation for End-to-End Person Search

We introduce knowledge distillation for end-to-end person search. End-to...
research
09/03/2019

Knowledge Distillation for End-to-EndPerson Search

We introduce knowledge distillation for end-to-end person search. End-to...
research
11/27/2022

EPIK: Eliminating multi-model Pipelines with Knowledge-distillation

Real-world tasks are largely composed of multiple models, each performin...
research
10/25/2020

Empowering Knowledge Distillation via Open Set Recognition for Robust 3D Point Cloud Classification

Real-world scenarios pose several challenges to deep learning based comp...
research
06/04/2020

End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020

This paper describes FBK's participation in the IWSLT 2020 offline speec...

Please sign up or login with your details

Forgot password? Click here to reset