Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets

05/26/2023
by   Hayeon Lee, et al.
0

Distillation-aware Neural Architecture Search (DaNAS) aims to search for an optimal student architecture that obtains the best performance and/or efficiency when distilling the knowledge from a given teacher model. Previous DaNAS methods have mostly tackled the search for the neural architecture for fixed datasets and the teacher, which are not generalized well on a new task consisting of an unseen dataset and an unseen teacher, thus need to perform a costly search for any new combination of the datasets and the teachers. For standard NAS tasks without KD, meta-learning-based computationally efficient NAS methods have been proposed, which learn the generalized search process over multiple tasks (datasets) and transfer the knowledge obtained over those tasks to a new task. However, since they assume learning from scratch without KD from a teacher, they might not be ideal for DaNAS scenarios. To eliminate the excessive computational cost of DaNAS methods and the sub-optimality of rapid NAS methods, we propose a distillation-aware meta accuracy prediction model, DaSS (Distillation-aware Student Search), which can predict a given architecture's final performances on a dataset when performing KD with a given teacher, without having actually to train it on the target task. The experimental results demonstrate that our proposed meta-prediction model successfully generalizes to multiple unseen datasets for DaNAS tasks, largely outperforming existing meta-NAS methods and rapid NAS baselines. Code is available at https://github.com/CownowAn/DaSS

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2021

Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets

Despite the success of recent Neural Architecture Search (NAS) methods o...
research
11/20/2019

Search to Distill: Pearls are Everywhere but not the Eyes

Standard Knowledge Distillation (KD) approaches distill the knowledge of...
research
02/11/2023

Improving Differentiable Architecture Search via Self-Distillation

Differentiable Architecture Search (DARTS) is a simple yet efficient Neu...
research
06/12/2021

LE-NAS: Learning-based Ensenble with NAS for Dose Prediction

Radiation therapy treatment planning is a complex process, as the target...
research
09/10/2021

Rapid Model Architecture Adaption for Meta-Learning

Network Architecture Search (NAS) methods have recently gathered much at...
research
01/29/2022

AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models

Knowledge distillation (KD) methods compress large models into smaller s...
research
11/09/2019

Learning to reinforcement learn for Neural Architecture Search

Reinforcement learning (RL) is a goal-oriented learning solution that ha...

Please sign up or login with your details

Forgot password? Click here to reset