Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

12/23/2015
by   Chunyuan Li, et al.
0

Effective training of deep neural networks suffers from two main issues. The first is that the parameter spaces of these models exhibit pathological curvature. Recent methods address this problem by using adaptive preconditioning for Stochastic Gradient Descent (SGD). These methods improve convergence by adapting to the local geometry of parameter space. A second issue is overfitting, which is typically addressed by early stopping. However, recent work has demonstrated that Bayesian model averaging mitigates this problem. The posterior can be sampled by using Stochastic Gradient Langevin Dynamics (SGLD). However, the rapidly changing curvature renders default SGLD methods inefficient. Here, we propose combining adaptive preconditioners with SGLD. In support of this idea, we give theoretical properties on asymptotic convergence and predictive risk. We also provide empirical results for Logistic Regression, Feedforward Neural Nets, and Convolutional Neural Nets, demonstrating that our preconditioned SGLD method gives state-of-the-art performance on these models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2019

Training Dynamics of Deep Networks using Stochastic Gradient Descent via Neural Tangent Kernel

Stochastic Gradient Descent (SGD) is widely used to train deep neural ne...
research
05/30/2018

The Dynamics of Learning: A Random Matrix Approach

Understanding the learning dynamics of neural networks is one of the key...
research
01/18/2018

When Does Stochastic Gradient Algorithm Work Well?

In this paper, we consider a general stochastic optimization problem whi...
research
06/10/2019

Adaptively Preconditioned Stochastic Gradient Langevin Dynamics

Stochastic Gradient Langevin Dynamics infuses isotropic gradient noise t...
research
10/19/2020

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

We propose an adaptively weighted stochastic gradient Langevin dynamics ...
research
08/30/2019

Partitioned integrators for thermodynamic parameterization of neural networks

Stochastic Gradient Langevin Dynamics, the "unadjusted Langevin algorith...
research
06/17/2020

Constraint-Based Regularization of Neural Networks

We propose a method for efficiently incorporating constraints into a sto...

Please sign up or login with your details

Forgot password? Click here to reset