CES-KD: Curriculum-based Expert Selection for Guided Knowledge Distillation

09/15/2022
by   Ibtihel Amara, et al.
3

Knowledge distillation (KD) is an effective tool for compressing deep classification models for edge devices. However, the performance of KD is affected by the large capacity gap between the teacher and student networks. Recent methods have resorted to a multiple teacher assistant (TA) setting for KD, which sequentially decreases the size of the teacher model to relatively bridge the size gap between these models. This paper proposes a new technique called Curriculum Expert Selection for Knowledge Distillation (CES-KD) to efficiently enhance the learning of a compact student under the capacity gap problem. This technique is built upon the hypothesis that a student network should be guided gradually using stratified teaching curriculum as it learns easy (hard) data samples better and faster from a lower (higher) capacity teacher network. Specifically, our method is a gradual TA-based KD technique that selects a single teacher per input image based on a curriculum driven by the difficulty in classifying the image. In this work, we empirically verify our hypothesis and rigorously experiment with CIFAR-10, CIFAR-100, CINIC-10, and ImageNet datasets and show improved accuracy on VGG-like models, ResNets, and WideResNets architectures.

READ FULL TEXT
research
06/21/2021

Knowledge Distillation via Instance-level Sequence Learning

Recently, distillation approaches are suggested to extract general knowl...
research
09/18/2020

Densely Guided Knowledge Distillation using Multiple Teacher Assistants

With the success of deep neural networks, knowledge distillation which g...
research
04/19/2019

Knowledge Distillation via Route Constrained Optimization

Distillation-based learning boosts the performance of the miniaturized n...
research
02/02/2023

Paced-Curriculum Distillation with Prediction and Label Uncertainty for Image Segmentation

Purpose: In curriculum learning, the idea is to train on easier samples ...
research
08/29/2022

How to Teach: Learning Data-Free Knowledge Distillation from Curriculum

Data-free knowledge distillation (DFKD) aims at training lightweight stu...
research
11/21/2017

Knowledge Concentration: Learning 100K Object Classifiers in a Single CNN

Fine-grained image labels are desirable for many computer vision applica...
research
10/18/2017

Visual Progression Analysis of Student Records Data

University curriculum, both on a campus level and on a per-major level, ...

Please sign up or login with your details

Forgot password? Click here to reset