Student-Teacher Learning from Clean Inputs to Noisy Inputs

03/13/2021
by   Guanzhe Hong, et al.
0

Feature-based student-teacher learning, a training method that encourages the student's hidden features to mimic those of the teacher network, is empirically successful in transferring the knowledge from a pre-trained teacher network to the student network. Furthermore, recent empirical results demonstrate that, the teacher's features can boost the student network's generalization even when the student's input sample is corrupted by noise. However, there is a lack of theoretical insights into why and when this method of transferring knowledge can be successful between such heterogeneous tasks. We analyze this method theoretically using deep linear networks, and experimentally using nonlinear networks. We identify three vital factors to the success of the method: (1) whether the student is trained to zero training loss; (2) how knowledgeable the teacher is on the clean-input problem; (3) how the teacher decomposes its knowledge in its hidden features. Lack of proper control in any of the three factors leads to failure of the student-teacher learning method.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

09/02/2017

Learning Loss for Knowledge Distillation with Conditional Adversarial Networks

There is an increasing interest on accelerating neural networks for real...
08/19/2020

A new role for circuit expansion for learning in neural networks

Many sensory pathways in the brain rely on sparsely active populations o...
03/23/2021

Teacher-Explorer-Student Learning: A Novel Learning Method for Open Set Recognition

If an unknown example that is not seen during training appears, most rec...
08/02/2019

Distilling Knowledge From a Deep Pose Regressor Network

This paper presents a novel method to distill knowledge from a deep pose...
03/23/2020

Neural Networks and Polynomial Regression. Demystifying the Overparametrization Phenomena

In the context of neural network models, overparametrization refers to t...
09/05/2017

Knowledge Transfer Between Artificial Intelligence Systems

We consider the fundamental question: how a legacy "student" Artificial ...
09/28/2018

A theoretical framework for deep locally connected ReLU network

Understanding theoretical properties of deep and locally connected nonli...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.