Kalman-based Stochastic Gradient Method with Stop Condition and Insensitivity to Conditioning

12/03/2015
by   Vivak Patel, et al.
0

Modern proximal and stochastic gradient descent (SGD) methods are believed to efficiently minimize large composite objective functions, but such methods have two algorithmic challenges: (1) a lack of fast or justified stop conditions, and (2) sensitivity to the objective function's conditioning. In response to the first challenge, modern proximal and SGD methods guarantee convergence only after multiple epochs, but such a guarantee renders proximal and SGD methods infeasible when the number of component functions is very large or infinite. In response to the second challenge, second order SGD methods have been developed, but they are marred by the complexity of their analysis. In this work, we address these challenges on the limited, but important, linear regression problem by introducing and analyzing a second order proximal/SGD method based on Kalman Filtering (kSGD). Through our analysis, we show kSGD is asymptotically optimal, develop a fast algorithm for very large, infinite or streaming data sources with a justified stop condition, prove that kSGD is insensitive to the problem's conditioning, and develop a unique approach for analyzing the complex second order dynamics. Our theoretical results are supported by numerical experiments on three regression problems (linear, nonparametric wavelet, and logistic) using three large publicly available datasets. Moreover, our analysis and experiments lay a foundation for embedding kSGD in multiple epoch algorithms, extending kSGD to other problem classes, and developing parallel and low memory kSGD implementations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2020

Towards Asymptotic Optimality with Conditioned Stochastic Gradient Descent

In this paper, we investigate a general class of stochastic gradient des...
research
02/01/2017

On SGD's Failure in Practice: Characterizing and Overcoming Stalling

Stochastic Gradient Descent (SGD) is widely used in machine learning pro...
research
05/21/2016

Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

Asynchronous parallel optimization algorithms for solving large-scale ma...
research
12/29/2016

Geometric descent method for convex composite minimization

In this paper, we extend the geometric descent method recently proposed ...
research
03/30/2020

Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models

Stochastic optimization lies at the core of most statistical learning mo...
research
07/19/2023

Convergence Guarantees for Stochastic Subgradient Methods in Nonsmooth Nonconvex Optimization

In this paper, we investigate the convergence properties of the stochast...

Please sign up or login with your details

Forgot password? Click here to reset