Screening for Sparse Online Learning

01/18/2021
by   Jingwei Liang, et al.
8

Sparsity promoting regularizers are widely used to impose low-complexity structure (e.g. l1-norm for sparsity) to the regression coefficients of supervised learning. In the realm of deterministic optimization, the sequence generated by iterative algorithms (such as proximal gradient descent) exhibit "finite activity identification", namely, they can identify the low-complexity structure in a finite number of iterations. However, most online algorithms (such as proximal stochastic gradient descent) do not have the property owing to the vanishing step-size and non-vanishing variance. In this paper, by combining with a screening rule, we show how to eliminate useless features of the iterates generated by online algorithms, and thereby enforce finite activity identification. One consequence is that when combined with any convergent online algorithm, sparsity properties imposed by the regularizer can be exploited for computational gains. Numerically, significant acceleration can be obtained.

READ FULL TEXT

page 16

page 18

page 19

research
08/11/2022

An Accelerated Doubly Stochastic Gradient Method with Faster Explicit Model Identification

Sparsity regularized loss minimization problems play an important role i...
research
06/29/2020

Fast OSCAR and OWL Regression via Safe Screening Rules

Ordered Weighted L_1 (OWL) regularized regression is a new regression an...
research
08/24/2020

Noise-induced degeneration in online learning

In order to elucidate the plateau phenomena caused by vanishing gradient...
research
05/21/2016

Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

Asynchronous parallel optimization algorithms for solving large-scale ma...
research
02/07/2020

On the Effectiveness of Richardson Extrapolation in Machine Learning

Richardson extrapolation is a classical technique from numerical analysi...
research
04/21/2016

Stabilized Sparse Online Learning for Sparse Data

Stochastic gradient descent (SGD) is commonly used for optimization in l...
research
07/02/2021

Screening for a Reweighted Penalized Conditional Gradient Method

The conditional gradient method (CGM) is widely used in large-scale spar...

Please sign up or login with your details

Forgot password? Click here to reset