Better Training using Weight-Constrained Stochastic Dynamics

06/20/2021
by   Benedict Leimkuhler, et al.
23

We employ constraints to control the parameter space of deep neural networks throughout training. The use of customized, appropriately designed constraints can reduce the vanishing/exploding gradients problem, improve smoothness of classification boundaries, control weight magnitudes and stabilize deep neural networks, and thus enhance the robustness of training algorithms and the generalization capabilities of neural networks. We provide a general approach to efficiently incorporate constraints into a stochastic gradient Langevin framework, allowing enhanced exploration of the loss landscape. We also present specific examples of constrained training methods motivated by orthogonality preservation for weight matrices and explicit weight normalizations. Discretization schemes are provided both for the overdamped formulation of Langevin dynamics and the underdamped form, in which momenta further improve sampling efficiency. These optimization schemes can be used directly, without needing to adapt neural network architecture design choices or to modify the objective with regularization terms, and see performance improvements in classification tasks.

READ FULL TEXT
research
06/17/2020

Constraint-Based Regularization of Neural Networks

We propose a method for efficiently incorporating constraints into a sto...
research
11/09/2020

Numerical Exploration of Training Loss Level-Sets in Deep Neural Networks

We present a computational method for empirically characterizing the tra...
research
05/27/2021

Hamiltonian Deep Neural Networks Guaranteeing Non-vanishing Gradients by Design

Deep Neural Networks (DNNs) training can be difficult due to vanishing a...
research
10/14/2020

Deep Neural Network Training with Frank-Wolfe

This paper studies the empirical efficacy and benefits of using projecti...
research
04/03/2020

Gradient Centralization: A New Optimization Technique for Deep Neural Networks

Optimization techniques are of great importance to effectively and effic...
research
03/13/2017

Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks

Minimizing non-convex and high-dimensional objective functions is challe...
research
11/09/2020

Improving Neural Network Training in Low Dimensional Random Bases

Stochastic Gradient Descent (SGD) has proven to be remarkably effective ...

Please sign up or login with your details

Forgot password? Click here to reset