A generalization of regularized dual averaging and its dynamics

09/22/2019
by   Shih-Kang Chao, et al.
0

Excessive computational cost for learning large data and streaming data can be alleviated by using stochastic algorithms, such as stochastic gradient descent and its variants. Recent advances improve stochastic algorithms on convergence speed, adaptivity and structural awareness. However, distributional aspects of these new algorithms are poorly understood, especially for structured parameters. To develop statistical inference in this case, we propose a class of generalized regularized dual averaging (gRDA) algorithms with constant step size, which improves RDA (Xiao, 2010; Flammarion and Bach, 2017). Weak convergence of gRDA trajectories are studied, and as a consequence, for the first time in the literature, the asymptotic distributions for online l1 penalized problems become available. These general results apply to both convex and non-convex differentiable loss functions, and in particular, recover the existing regret bound for convex losses (Nemirovski et al., 2009). As important applications, statistical inferential theory on online sparse linear regression and online sparse principal component analysis are developed, and are supported by extensive numerical analysis. Interestingly, when gRDA is properly tuned, support recovery and central limiting distribution (with mean zero) hold simultaneously in the online setting, which is in contrast with the biased central limiting distribution of batch Lasso (Knight and Fu, 2000). Technical devices, including weak convergence of stochastic mirror descent, are developed as by-products with independent interest. Preliminary empirical analysis of modern image data shows that learning very sparse deep neural networks by gRDA does not necessarily sacrifice testing accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2015

On the Online Frank-Wolfe Algorithms for Convex and Non-convex Optimizations

In this paper, the online variants of the classical Frank-Wolfe algorith...
research
03/25/2020

Dimension Independent Generalization Error with Regularized Online Optimization

One classical canon of statistics is that large models are prone to over...
research
07/11/2018

Modified Regularized Dual Averaging Method for Training Sparse Convolutional Neural Networks

We proposed a modified regularized dual averaging method for training sp...
research
06/11/2020

A General Framework for Analyzing Stochastic Dynamics in Learning Algorithms

We present a general framework for analyzing high-probability bounds for...
research
05/04/2018

Analysis of nonsmooth stochastic approximation: the differential inclusion approach

In this paper we address the convergence of stochastic approximation whe...
research
10/23/2022

Stochastic Mirror Descent for Large-Scale Sparse Recovery

In this paper we discuss an application of Stochastic Approximation to s...
research
02/11/2019

Efficient Primal-Dual Algorithms for Large-Scale Multiclass Classification

We develop efficient algorithms to train ℓ_1-regularized linear classifi...

Please sign up or login with your details

Forgot password? Click here to reset