Revisiting Architecture-aware Knowledge Distillation: Smaller Models and Faster Search

06/27/2022
by   Taehyeon Kim, et al.
1

Knowledge Distillation (KD) has recently emerged as a popular method for compressing neural networks. In recent studies, generalized distillation methods that find parameters and architectures of student models at the same time have been proposed. Still, this search method requires a lot of computation to search for architectures and has the disadvantage of considering only convolutional blocks in their search space. This paper introduces a new algorithm, coined as Trust Region Aware architecture search to Distill knowledge Effectively (TRADE), that rapidly finds effective student architectures from several state-of-the-art architectures using trust region Bayesian optimization approach. Experimental results show our proposed TRADE algorithm consistently outperforms both the conventional NAS approach and pre-defined architecture under KD training.

READ FULL TEXT

page 2

page 10

research
11/20/2019

Search to Distill: Pearls are Everywhere but not the Eyes

Standard Knowledge Distillation (KD) approaches distill the knowledge of...
research
11/05/2021

AUTOKD: Automatic Knowledge Distillation Into A Student Architecture Family

State-of-the-art results in deep learning have been improving steadily, ...
research
03/28/2023

DisWOT: Student Architecture Search for Distillation WithOut Training

Knowledge distillation (KD) is an effective training strategy to improve...
research
06/15/2020

Multi-fidelity Neural Architecture Search with Knowledge Distillation

Neural architecture search (NAS) targets at finding the optimal architec...
research
10/13/2021

CONetV2: Efficient Auto-Channel Size Optimization for CNNs

Neural Architecture Search (NAS) has been pivotal in finding optimal net...
research
05/16/2020

Generalized Bayesian Posterior Expectation Distillation for Deep Neural Networks

In this paper, we present a general framework for distilling expectation...
research
01/29/2022

AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models

Knowledge distillation (KD) methods compress large models into smaller s...

Please sign up or login with your details

Forgot password? Click here to reset