Learning Neural Networks with Adaptive Regularization

07/14/2019
by   Han Zhao, et al.
3

Feed-forward neural networks can be understood as a combination of an intermediate representation and a linear hypothesis. While most previous works aim to diversify the representations, we explore the complementary direction by performing an adaptive and data-dependent regularization motivated by the empirical Bayes method. Specifically, we propose to construct a matrix-variate normal prior (on weights) whose covariance matrix has a Kronecker product structure. This structure is designed to capture the correlations in neurons through backpropagation. Under the assumption of this Kronecker factorization, the prior encourages neurons to borrow statistical strength from one another. Hence, it leads to an adaptive and data-dependent regularization when training networks on small datasets. To optimize the model, we present an efficient block coordinate descent algorithm with analytical solutions. Empirically, we demonstrate that the proposed method helps networks converge to local optima with smaller stable ranks and spectral norms. These properties suggest better generalizations and we present empirical results to support this expectation. We also verify the effectiveness of the approach on multiclass classification and multitask regression problems with various network structures.

READ FULL TEXT

page 19

page 20

research
08/06/2019

Refining the Structure of Neural Networks Using Matrix Conditioning

Deep learning models have proven to be exceptionally useful in performin...
research
05/13/2020

Implicit Regularization in Deep Learning May Not Be Explainable by Norms

Mathematically characterizing the implicit regularization induced by gra...
research
12/31/2020

Adaptive filters for the moving target indicator system

Adaptive algorithms belong to an important class of algorithms used in r...
research
11/07/2019

How implicit regularization of Neural Networks affects the learned function – Part I

Today, various forms of neural networks are trained to perform approxima...
research
04/17/2014

A New Space for Comparing Graphs

Finding a new mathematical representations for graph, which allows direc...
research
02/12/2020

Topologically Densified Distributions

We study regularization in the context of small sample-size learning wit...

Please sign up or login with your details

Forgot password? Click here to reset