Structured Sparsity Inducing Adaptive Optimizers for Deep Learning

02/07/2021
by   Tristan Deleu, et al.
10

The parameters of a neural network are naturally organized in groups, some of which might not contribute to its overall performance. To prune out unimportant groups of parameters, we can include some non-differentiable penalty to the objective function, and minimize it using proximal gradient methods. In this paper, we derive the weighted proximal operator, which is a necessary component of these proximal methods, of two structured sparsity inducing penalties. Moreover, they can be approximated efficiently with a numerical solver, and despite this approximation, we prove that existing convergence guarantees are preserved when these operators are integrated as part of a generic adaptive proximal method. Finally, we show that this adaptive method, together with the weighted proximal operators derived here, is indeed capable of finding solutions with structure in their sparsity patterns, on representative examples from computer vision and natural language processing.

READ FULL TEXT
research
05/31/2016

Iterative Smoothing Proximal Gradient for Regression with Structured Sparsity

In the context high-dimensionnal predictive models, we consider the prob...
research
02/14/2012

Smoothing Proximal Gradient Method for General Structured Sparse Learning

We study the problem of learning high dimensional regression models regu...
research
04/11/2011

Convex and Network Flow Optimization for Structured Sparsity

We consider a class of learning problems regularized by a structured spa...
research
09/03/2012

Proximal methods for the latent group lasso penalty

We consider a regularized least squares problem, with regularization by ...
research
04/07/2020

Orthant Based Proximal Stochastic Gradient Method for ℓ_1-Regularized Optimization

Sparsity-inducing regularization problems are ubiquitous in machine lear...
research
01/10/2020

A first-order optimization algorithm for statistical learning with hierarchical sparsity structure

In many statistical learning problems, it is desired that the optimal so...
research
08/31/2010

Network Flow Algorithms for Structured Sparsity

We consider a class of learning problems that involve a structured spars...

Please sign up or login with your details

Forgot password? Click here to reset