Obtaining Adjustable Regularization for Free via Iterate Averaging

08/15/2020
by   Jingfeng Wu, et al.
0

Regularization for optimization is a crucial technique to avoid overfitting in machine learning. In order to obtain the best performance, we usually train a model by tuning the regularization parameters. It becomes costly, however, when a single round of training takes significant amount of time. Very recently, Neu and Rosasco show that if we run stochastic gradient descent (SGD) on linear regression problems, then by averaging the SGD iterates properly, we obtain a regularized solution. It left open whether the same phenomenon can be achieved for other optimization problems and algorithms. In this paper, we establish an averaging scheme that provably converts the iterates of SGD on an arbitrary strongly convex and smooth objective function to its regularized counterpart with an adjustable regularization parameter. Our approaches can be used for accelerated and preconditioned optimization methods as well. We further show that the same methods work empirically on more general optimization objectives including neural networks. In sum, we obtain adjustable regularization for free for a large class of optimization problems and resolve an open question raised by Neu and Rosasco.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2018

Modified Regularized Dual Averaging Method for Training Sparse Convolutional Neural Networks

We proposed a modified regularized dual averaging method for training sp...
research
12/08/2012

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

Stochastic Gradient Descent (SGD) is one of the simplest and most popula...
research
02/22/2018

Iterate averaging as regularization for stochastic gradient descent

We propose and analyze a variant of the classic Polyak-Ruppert averaging...
research
03/09/2020

Revisiting SGD with Increasingly Weighted Averaging: Optimization and Generalization Perspectives

Stochastic gradient descent (SGD) has been widely studied in the literat...
research
06/08/2020

The Strength of Nesterov's Extrapolation in the Individual Convergence of Nonsmooth Optimization

The extrapolation strategy raised by Nesterov, which can accelerate the ...
research
10/20/2020

Dual Averaging is Surprisingly Effective for Deep Learning Optimization

First-order stochastic optimization methods are currently the most widel...
research
03/23/2021

Benign Overfitting of Constant-Stepsize SGD for Linear Regression

There is an increasing realization that algorithmic inductive biases are...

Please sign up or login with your details

Forgot password? Click here to reset