Student-Teacher Learning from Clean Inputs to Noisy Inputs

03/13/2021
by   Guanzhe Hong, et al.
0

Feature-based student-teacher learning, a training method that encourages the student's hidden features to mimic those of the teacher network, is empirically successful in transferring the knowledge from a pre-trained teacher network to the student network. Furthermore, recent empirical results demonstrate that, the teacher's features can boost the student network's generalization even when the student's input sample is corrupted by noise. However, there is a lack of theoretical insights into why and when this method of transferring knowledge can be successful between such heterogeneous tasks. We analyze this method theoretically using deep linear networks, and experimentally using nonlinear networks. We identify three vital factors to the success of the method: (1) whether the student is trained to zero training loss; (2) how knowledgeable the teacher is on the clean-input problem; (3) how the teacher decomposes its knowledge in its hidden features. Lack of proper control in any of the three factors leads to failure of the student-teacher learning method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2017

Learning Loss for Knowledge Distillation with Conditional Adversarial Networks

There is an increasing interest on accelerating neural networks for real...
research
07/09/2020

Learning to Teach with Deep Interactions

Machine teaching uses a meta/teacher model to guide the training of a st...
research
03/23/2021

Teacher-Explorer-Student Learning: A Novel Learning Method for Open Set Recognition

If an unknown example that is not seen during training appears, most rec...
research
09/05/2017

Knowledge Transfer Between Artificial Intelligence Systems

We consider the fundamental question: how a legacy "student" Artificial ...
research
02/15/2023

Spatially heterogeneous learning by a deep student machine

Despite the spectacular successes, deep neural networks (DNN) with a hug...
research
08/02/2019

Distilling Knowledge From a Deep Pose Regressor Network

This paper presents a novel method to distill knowledge from a deep pose...
research
10/16/2022

AttTrack: Online Deep Attention Transfer for Multi-object Tracking

Multi-object tracking (MOT) is a vital component of intelligent video an...

Please sign up or login with your details

Forgot password? Click here to reset