Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective

05/15/2022
by   Keitaro Sakamoto, et al.
0

The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2021

A General Framework for the Derandomization of PAC-Bayesian Bounds

PAC-Bayesian bounds are known to be tight and informative when studying ...
research
10/21/2021

User-friendly introduction to PAC-Bayes bounds

Aggregated predictors are obtained by making a set of basic predictors v...
research
02/20/2020

Bayesian Deep Learning and a Probabilistic Perspective of Generalization

The key distinguishing property of a Bayesian approach is marginalizatio...
research
11/04/2018

Nonlinear Collaborative Scheme for Deep Neural Networks

Conventional research attributes the improvements of generalization abil...
research
01/15/2019

Normalized Flat Minima: Exploring Scale Invariant Definition of Flat Minima for Neural Networks using PAC-Bayesian Analysis

The notion of flat minima has played a key role in the generalization pr...
research
09/17/2020

A Principle of Least Action for the Training of Neural Networks

Neural networks have been achieving high generalization performance on m...
research
06/05/2017

Emergence of Invariance and Disentangling in Deep Representations

Using established principles from Information Theory and Statistics, we ...

Please sign up or login with your details

Forgot password? Click here to reset