On effects of Knowledge Distillation on Transfer Learning

10/18/2022
by   Sushil Thapa, et al.
0

Knowledge distillation is a popular machine learning technique that aims to transfer knowledge from a large 'teacher' network to a smaller 'student' network and improve the student's performance by training it to emulate the teacher. In recent years, there has been significant progress in novel distillation techniques that push performance frontiers across multiple problems and benchmarks. Most of the reported work focuses on achieving state-of-the-art results on the specific problem. However, there has been a significant gap in understanding the process and how it behaves under certain training scenarios. Similarly, transfer learning (TL) is an effective technique in training neural networks on a limited dataset faster by reusing representations learned from a different but related problem. Despite its effectiveness and popularity, there has not been much exploration of knowledge distillation on transfer learning. In this thesis, we propose a machine learning architecture we call TL+KD that combines knowledge distillation with transfer learning; we then present a quantitative and qualitative comparison of TL+KD with TL in the domain of image classification. Through this work, we show that using guidance and knowledge from a larger teacher network during fine-tuning, we can improve the student network to achieve better validation performances like accuracy. We characterize the improvement in the validation performance of the model using a variety of metrics beyond just accuracy scores, and study its performance in scenarios such as input degradation.

READ FULL TEXT

page 16

page 20

page 24

page 25

page 28

page 32

page 33

page 41

research
05/31/2022

What Knowledge Gets Distilled in Knowledge Distillation?

Knowledge distillation aims to transfer useful information from a teache...
research
12/01/2020

Solvable Model for Inheriting the Regularization through Knowledge Distillation

In recent years the empirical success of transfer learning with neural n...
research
11/30/2022

Explicit Knowledge Transfer for Weakly-Supervised Code Generation

Large language models (LLMs) can acquire strong code-generation capabili...
research
02/26/2021

Knowledge Distillation Circumvents Nonlinearity for Optical Convolutional Neural Networks

In recent years, Convolutional Neural Networks (CNNs) have enabled ubiqu...
research
02/05/2021

Show, Attend and Distill:Knowledge Distillation via Attention-based Feature Matching

Knowledge distillation extracts general knowledge from a pre-trained tea...
research
12/07/2021

ADD: Frequency Attention and Multi-View based Knowledge Distillation to Detect Low-Quality Compressed Deepfake Images

Despite significant advancements of deep learning-based forgery detector...
research
10/23/2022

Respecting Transfer Gap in Knowledge Distillation

Knowledge distillation (KD) is essentially a process of transferring a t...

Please sign up or login with your details

Forgot password? Click here to reset