A General Family of Stochastic Proximal Gradient Methods for Deep Learning

07/15/2020
by   Jihun Yun, et al.
0

We study the training of regularized neural networks where the regularizer can be non-smooth and non-convex. We propose a unified framework for stochastic proximal gradient descent, which we term ProxGen, that allows for arbitrary positive preconditioners and lower semi-continuous regularizers. Our framework encompasses standard stochastic proximal gradient methods without preconditioners as special cases, which have been extensively studied in various settings. Not only that, we present two important update rules beyond the well-known standard methods as a byproduct of our approach: (i) the first closed-form proximal mappings of ℓ_q regularization (0 ≤ q ≤ 1) for adaptive stochastic gradient methods, and (ii) a revised version of ProxQuant that fixes a caveat of the original approach for quantization-specific regularizers. We analyze the convergence of ProxGen and show that the whole family of ProxGen enjoys the same convergence rate as stochastic proximal gradient descent without preconditioners. We also empirically show the superiority of proximal methods compared to subgradient-based approaches via extensive experiments. Interestingly, our results indicate that proximal methods with non-convex regularizers are more effective than those with convex regularizers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2018

Asynchronous Stochastic Proximal Methods for Nonconvex Nonsmooth Optimization

We study stochastic algorithms for solving non-convex optimization probl...
research
09/14/2020

Effective Proximal Methods for Non-convex Non-smooth Regularized Learning

Sparse learning is a very important tool for mining useful information a...
research
04/07/2020

Orthant Based Proximal Stochastic Gradient Method for ℓ_1-Regularized Optimization

Sparsity-inducing regularization problems are ubiquitous in machine lear...
research
02/02/2022

HMC and Langevin united in the unadjusted and convex case

We consider a family of unadjusted HMC samplers, which includes standard...
research
06/14/2022

A Stochastic Proximal Method for Nonsmooth Regularized Finite Sum Optimization

We consider the problem of training a deep neural network with nonsmooth...
research
11/05/2020

A Bregman Method for Structure Learning on Sparse Directed Acyclic Graphs

We develop a Bregman proximal gradient method for structure learning on ...
research
01/24/2019

On the Complexity of Approximating Wasserstein Barycenter

We study the complexity of approximating Wassertein barycenter of m disc...

Please sign up or login with your details

Forgot password? Click here to reset