Learning from Matured Dumb Teacher for Fine Generalization

08/12/2021
by   HeeSeung Jung, et al.
2

The flexibility of decision boundaries in neural networks that are unguided by training data is a well-known problem typically resolved with generalization methods. A surprising result from recent knowledge distillation (KD) literature is that random, untrained, and equally structured teacher networks can also vastly improve generalization performance. It raises the possibility of existence of undiscovered assumptions useful for generalization on an uncertain region. In this paper, we shed light on the assumptions by analyzing decision boundaries and confidence distributions of both simple and KD-based generalization methods. Assuming that a decision boundary exists to represent the most general tendency of distinction on an input sample space (i.e., the simplest hypothesis), we show the various limitations of methods when using the hypothesis. To resolve these limitations, we propose matured dumb teacher based KD, conservatively transferring the hypothesis for generalization of the student without massive destruction of trained information. In practical experiments on feed-forward and convolution neural networks for image classification tasks on MNIST, CIFAR-10, and CIFAR-100 datasets, the proposed method shows stable improvement to the best test performance in the grid search of hyperparameters. The analysis and results imply that the proposed method can provide finer generalization than existing methods.

READ FULL TEXT

page 1

page 3

page 5

page 7

page 9

research
07/23/2019

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

High storage and computational costs obstruct deep neural networks to be...
research
05/15/2018

Improving Knowledge Distillation with Supporting Adversarial Samples

Many recent works on knowledge distillation have provided ways to transf...
research
12/12/2022

Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization

Knowledge Distillation (KD) has been extensively used for natural langua...
research
05/11/2020

On the Transferability of Winning Tickets in Non-Natural Image Datasets

We study the generalization properties of pruned neural networks that ar...
research
10/19/2011

Readouts for Echo-state Networks Built using Locally Regularized Orthogonal Forward Regression

Echo state network (ESN) is viewed as a temporal non-orthogonal expansio...
research
05/02/2020

Heterogeneous Knowledge Distillation using Information Flow Modeling

Knowledge Distillation (KD) methods are capable of transferring the know...

Please sign up or login with your details

Forgot password? Click here to reset