Teacher-Student Architecture for Knowledge Distillation: A Survey

08/08/2023
by   Chengming Hu, et al.
0

Although Deep neural networks (DNNs) have shown a strong capacity to solve large-scale problems in many areas, such DNNs are hard to be deployed in real-world systems due to their voluminous parameters. To tackle this issue, Teacher-Student architectures were proposed, where simple student networks with a few parameters can achieve comparable performance to deep teacher networks with many parameters. Recently, Teacher-Student architectures have been effectively and widely embraced on various knowledge distillation (KD) objectives, including knowledge compression, knowledge expansion, knowledge adaptation, and knowledge enhancement. With the help of Teacher-Student architectures, current studies are able to achieve multiple distillation objectives through lightweight and generalized student networks. Different from existing KD surveys that primarily focus on knowledge compression, this survey first explores Teacher-Student architectures across multiple distillation objectives. This survey presents an introduction to various knowledge representations and their corresponding optimization objectives. Additionally, we provide a systematic overview of Teacher-Student architectures with representative learning algorithms and effective distillation schemes. This survey also summarizes recent applications of Teacher-Student architectures across multiple purposes, including classification, recognition, generation, ranking, and regression. Lastly, potential research directions in KD are investigated, focusing on architecture design, knowledge quality, and theoretical studies of regression-based learning, respectively. Through this comprehensive survey, industry practitioners and the academic community can gain valuable insights and guidelines for effectively designing, learning, and applying Teacher-Student architectures on various distillation objectives.

READ FULL TEXT
research
10/28/2022

Teacher-Student Architecture for Knowledge Learning: A Survey

Although Deep Neural Networks (DNNs) have shown a strong capacity to sol...
research
11/30/2020

A Selective Survey on Versatile Knowledge Distillation Paradigm for Neural Network Models

This paper aims to provide a selective survey about knowledge distillati...
research
07/16/2019

Light Multi-segment Activation for Model Compression

Model compression has become necessary when applying neural networks (NN...
research
09/09/2017

Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification

Knowledge distillation is a potential solution for model compression. Th...
research
06/19/2023

Categories of Response-Based, Feature-Based, and Relation-Based Knowledge Distillation

Deep neural networks have achieved remarkable performance for artificial...
research
04/13/2020

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Deep neural models in recent years have been successful in almost every ...
research
04/30/2021

Distilling EEG Representations via Capsules for Affective Computing

Affective computing with Electroencephalogram (EEG) is a challenging tas...

Please sign up or login with your details

Forgot password? Click here to reset