Knowledge distillation: A good teacher is patient and consistent

06/09/2021
by   Lucas Beyer, et al.
9

There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications. In this paper we address this issue and significantly bridge the gap between these two types of models. Throughout our empirical investigation we do not aim to necessarily propose a new method, but strive to identify a robust and effective recipe for making state-of-the-art large scale models affordable in practice. We demonstrate that, when performed correctly, knowledge distillation can be a powerful tool for reducing the size of large models without compromising their performance. In particular, we uncover that there are certain implicit design choices, which may drastically affect the effectiveness of distillation. Our key contribution is the explicit identification of these design choices, which were not previously articulated in the literature. We back up our findings by a comprehensive empirical study, demonstrate compelling results on a wide range of vision datasets and, in particular, obtain a state-of-the-art ResNet-50 model for ImageNet, which achieves 82.8% top-1 accuracy.

READ FULL TEXT

page 3

page 5

page 6

page 7

page 9

page 10

page 11

page 13

research
08/22/2022

Tree-structured Auxiliary Online Knowledge Distillation

Traditional knowledge distillation adopts a two-stage training process i...
research
12/12/2022

Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization

Knowledge Distillation (KD) has been extensively used for natural langua...
research
05/25/2023

VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale

The tremendous success of large models trained on extensive datasets dem...
research
07/13/2020

Towards practical lipreading with distilled and efficient models

Lipreading has witnessed a lot of progress due to the resurgence of neur...
research
02/16/2023

LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation

Large-scale language-agnostic sentence embedding models such as LaBSE (F...
research
07/26/2021

Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation

Text recognition remains a fundamental and extensively researched topic ...
research
10/07/2020

Less is more: Faster and better music version identification with embedding distillation

Version identification systems aim to detect different renditions of the...

Please sign up or login with your details

Forgot password? Click here to reset