Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning

by   Soufiane Hayou, et al.

We study an approach to learning pruning masks by optimizing the expected loss of stochastic pruning masks, i.e., masks which zero out each weight independently with some weight-specific probability. We analyze the training dynamics of the induced stochastic predictor in the setting of linear regression, and observe a data-adaptive L1 regularization term, in contrast to the dataadaptive L2 regularization term known to underlie dropout in linear regression. We also observe a preference to prune weights that are less well-aligned with the data labels. We evaluate probabilistic fine-tuning for optimizing stochastic pruning masks for neural networks, starting from masks produced by several baselines. In each case, we see improvements in test error over baselines, even after we threshold fine-tuned stochastic pruning masks. Finally, since a stochastic pruning mask induces a stochastic neural network, we consider training the weights and/or pruning probabilities simultaneously to minimize a PAC-Bayes bound on generalization error. Using data-dependent priors, we obtain a selfbounded learning algorithm with strong performance and numerically tight bounds. In the linear model, we show that a PAC-Bayes generalization error bound is controlled by the magnitude of the change in feature alignment between the 'prior' and 'posterior' data.



There are no comments yet.


page 28


Entropy-SGD optimizes the prior of a PAC-Bayes bound: Data-dependent PAC-Bayes priors via differential privacy

We show that Entropy-SGD (Chaudhari et al., 2016), when viewed as a lear...

Improved PAC-Bayesian Bounds for Linear Regression

In this paper, we improve the PAC-Bayesian error bound for linear regres...

Weight Reparametrization for Budget-Aware Network Pruning

Pruning seeks to design lightweight architectures by removing redundant ...

On the role of data in PAC-Bayes bounds

The dominant term in PAC-Bayes bounds is often the Kullback–Leibler dive...

Studying the Consistency and Composability of Lottery Ticket Pruning Masks

Magnitude pruning is a common, effective technique to identify sparse su...

Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound

We investigate a stochastic counterpart of majority votes over finite en...

Learning Partially Known Stochastic Dynamics with Empirical PAC Bayes

We propose a novel scheme for fitting heavily parameterized non-linear s...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.