Laplacian Smoothing Gradient Descent

06/17/2018
by   Stanley Osher, et al.
15

We propose a very simple modification of gradient descent and stochastic gradient descent. We show that when applied to a variety of machine learning models including softmax regression, convolutional neural nets, generative adversarial nets, and deep reinforcement learning, this very simple surrogate can dramatically reduce the variance and improve the accuracy of the generalization. The new algorithm, (which depends on one nonnegative parameter) when applied to non-convex minimization, tends to avoid sharp local minima. Instead it seeks somewhat flatter local (and often global) minima. The method only involves preconditioning the gradient by the inverse of a tri-diagonal matrix that is positive definite. The motivation comes from the theory of Hamilton-Jacobi partial differential equations. This theory demonstrates that the new algorithm is almost the same as doing gradient descent on a new function which (a) has the same global minima as the original function and (b) is "more convex". Again, the programming effort in doing this is minimal, in cost, complexity and effort. We implement our algorithm into both PyTorch and Tensorflow platforms, which will be made publicly available.

READ FULL TEXT
research
03/21/2022

A Local Convergence Theory for the Stochastic Gradient Descent Method in Non-Convex Optimization With Non-isolated Local Minima

Non-convex loss functions arise frequently in modern machine learning, a...
research
05/19/2014

On the saddle point problem for non-convex optimization

A central challenge to many fields of science and engineering involves m...
research
11/22/2011

Stochastic gradient descent on Riemannian manifolds

Stochastic gradient descent is a simple approach to find the local minim...
research
08/14/2018

Discrete gradient descent differs qualitatively from gradient flow

We consider gradient descent on functions of the form L_1 = |f| and L_2 ...
research
06/05/2019

Global Optimality Guarantees For Policy Gradient Methods

Policy gradients methods are perhaps the most widely used class of reinf...
research
12/20/2022

Using Witten Laplacians to locate index-1 saddle points

We introduce a new stochastic algorithm to locate the index-1 saddle poi...
research
05/24/2019

Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models

We consider distributed optimization under communication constraints for...

Please sign up or login with your details

Forgot password? Click here to reset