Good Students Play Big Lottery Better

01/08/2021
by   Haoyu Ma, et al.
3

Lottery ticket hypothesis suggests that a dense neural network contains a sparse sub-network that can match the test accuracy of the original dense net when trained in isolation from (the same) random initialization. However, the hypothesis failed to generalize to larger dense networks such as ResNet-50. As a remedy, recent studies demonstrate that a sparse sub-network can still be obtained by using a rewinding technique, which is to re-train it from early-phase training weights or learning rates of the dense model, rather than from random initialization. Is rewinding the only or the best way to scale up lottery tickets? This paper proposes a new, simpler and yet powerful technique for re-training the sub-network, called "Knowledge Distillation ticket" (KD ticket). Rewinding exploits the value of inheriting knowledge from the early training phase to improve lottery tickets in large networks. In comparison, KD ticket addresses a complementary possibility - inheriting useful knowledge from the late training phase of the dense model. It is achieved by leveraging the soft labels generated by the trained dense model to re-train the sub-network, instead of the hard labels. Extensive experiments are conducted using several large deep networks (e.g ResNet-50 and ResNet-110) on CIFAR-10 and ImageNet datasets. Without bells and whistles, when applied by itself, KD ticket performs on par or better than rewinding, while being nearly free of hyperparameters or ad-hoc selection. KD ticket can be further applied together with rewinding, yielding state-of-the-art results for large-scale lottery tickets.

READ FULL TEXT
research
05/19/2019

Sparse Transfer Learning via Winning Lottery Tickets

The recently proposed Lottery Ticket Hypothesis of Frankle and Carbin (2...
research
06/16/2022

Not All Lotteries Are Made Equal

The Lottery Ticket Hypothesis (LTH) states that for a reasonably sized n...
research
06/06/2021

Efficient Lottery Ticket Finding: Less Data is More

The lottery ticket hypothesis (LTH) reveals the existence of winning tic...
research
06/12/2020

How many winning tickets are there in one DNN?

The recent lottery ticket hypothesis proposes that there is one sub-netw...
research
12/11/2019

Linear Mode Connectivity and the Lottery Ticket Hypothesis

We introduce "instability analysis," a framework for assessing whether t...
research
02/07/2018

ShakeDrop regularization

This paper proposes a powerful regularization method named ShakeDrop reg...
research
10/23/2020

ResNet or DenseNet? Introducing Dense Shortcuts to ResNet

ResNet or DenseNet? Nowadays, most deep learning based approaches are im...

Please sign up or login with your details

Forgot password? Click here to reset