Generalization bounds via distillation

04/12/2021
by   Daniel Hsu, et al.
15

This paper theoretically investigates the following empirical phenomenon: given a high-complexity network with poor generalization bounds, one can distill it into a network with nearly identical predictions but low complexity and vastly smaller generalization bounds. The main contribution is an analysis showing that the original network inherits this good generalization bound from its distillation, assuming the use of well-behaved data augmentation. This bound is presented both in an abstract and in a concrete form, the latter complemented by a reduction technique to handle modern computation graphs featuring convolutional layers, fully-connected layers, and skip connections, to name a few. To round out the story, a (looser) classical uniform convergence analysis of compression is also presented, as well as a variety of experiments on cifar and mnist demonstrating similar generalization performance between the original network and its distillation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2020

New Properties of the Data Distillation Method When Working With Tabular Data

Data distillation is the problem of reducing the volume oftraining data ...
research
06/06/2020

An Empirical Analysis of the Impact of Data Augmentation on Knowledge Distillation

Generalization Performance of Deep Learning models trained using the Emp...
research
07/29/2020

Compressing Deep Neural Networks via Layer Fusion

This paper proposes layer fusion - a model compression technique that di...
research
05/17/2019

Dream Distillation: A Data-Independent Model Compression Framework

Model compression is eminently suited for deploying deep learning on IoT...
research
06/15/2021

Compression Implies Generalization

Explaining the surprising generalization performance of deep neural netw...
research
02/23/2018

Sensitivity and Generalization in Neural Networks: an Empirical Study

In practice it is often found that large over-parameterized neural netwo...
research
01/27/2023

Fine-tuning Neural-Operator architectures for training and generalization

In this work, we present an analysis of the generalization of Neural Ope...

Please sign up or login with your details

Forgot password? Click here to reset