Search to Distill: Pearls are Everywhere but not the Eyes

11/20/2019
by   Yu Liu, et al.
0

Standard Knowledge Distillation (KD) approaches distill the knowledge of a cumbersome teacher model into the parameters of a student model with a pre-defined architecture. However, the knowledge of a neural network, which is represented by the network's output distribution conditioned on its input, depends not only on its parameters but also on its architecture. Hence, a more generalized approach for KD is to distill the teacher's knowledge into both the parameters and architecture of the student. To achieve this, we present a new Architecture-aware Knowledge Distillation (AKD) approach that finds student models (pearls for the teacher) that are best for distilling the given teacher model. In particular, we leverage Neural Architecture Search (NAS), equipped with our KD-guided reward, to search for the best student architectures for a given teacher. Experimental results show our proposed AKD consistently outperforms the conventional NAS plus KD approach, and achieves state-of-the-art results on the ImageNet classification task under various latency settings. Furthermore, the best AKD student architecture for the ImageNet classification task also transfers well to other tasks such as million level face recognition and ensemble learning.

READ FULL TEXT

page 4

page 5

research
11/29/2019

Towards Oracle Knowledge Distillation with Neural Architecture Search

We present a novel framework of knowledge distillation that is capable o...
research
03/16/2023

Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models

Large pre-trained language models have achieved state-of-the-art results...
research
06/27/2022

Revisiting Architecture-aware Knowledge Distillation: Smaller Models and Faster Search

Knowledge Distillation (KD) has recently emerged as a popular method for...
research
03/28/2023

DisWOT: Student Architecture Search for Distillation WithOut Training

Knowledge distillation (KD) is an effective training strategy to improve...
research
11/05/2021

AUTOKD: Automatic Knowledge Distillation Into A Student Architecture Family

State-of-the-art results in deep learning have been improving steadily, ...
research
05/26/2023

Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets

Distillation-aware Neural Architecture Search (DaNAS) aims to search for...
research
06/12/2021

LE-NAS: Learning-based Ensenble with NAS for Dose Prediction

Radiation therapy treatment planning is a complex process, as the target...

Please sign up or login with your details

Forgot password? Click here to reset