Combining resampling and reweighting for faithful stochastic optimization

05/31/2021
by   Jing An, et al.
8

Many machine learning and data science tasks require solving non-convex optimization problems. When the loss function is a sum of multiple terms, a popular method is stochastic gradient descent. Viewed as a process for sampling the loss function landscape, the stochastic gradient descent is known to prefer flat local minimums. Though this is desired for certain optimization problems such as in deep learning, it causes issues when the goal is to find the global minimum, especially if the global minimum resides in a sharp valley. Illustrated with a simple motivating example, we show that the fundamental reason is that the difference in the Lipschitz constants of multiple terms in the loss function causes stochastic gradient descent to experience different variances at different minimums. In order to mitigate this effect and perform faithful optimization, we propose a combined resampling-reweighting scheme to balance the variance at local minimums and extend to general loss functions. We also explain from the stochastic asymptotics perspective how the proposed scheme is more likely to select the true global minimum when compared with the vanilla stochastic gradient descent. Experiments from robust statistics, computational chemistry, and neural network training are provided to demonstrate the theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2018

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

We study the problem of training deep neural networks with Rectified Lin...
research
03/24/2018

Gradient descent in Gaussian random fields as a toy model for high-dimensional optimisation in deep learning

In this paper we model the loss function of high-dimensional optimizatio...
research
06/17/2022

Landscape Learning for Neural Network Inversion

Many machine learning methods operate by inverting a neural network at i...
research
05/23/2019

How degenerate is the parametrization of neural networks with the ReLU activation function?

Neural network training is usually accomplished by solving a non-convex ...
research
05/20/2017

Stabilizing Adversarial Nets With Prediction Methods

Adversarial neural networks solve many important problems in data scienc...
research
10/01/2021

Update in Unit Gradient

In Machine Learning, optimization mostly has been done by using a gradie...
research
10/09/2018

Learning One-hidden-layer Neural Networks under General Input Distributions

Significant advances have been made recently on training neural networks...

Please sign up or login with your details

Forgot password? Click here to reset