DeepAI AI Chat
Log In Sign Up

Kalman-based Stochastic Gradient Method with Stop Condition and Insensitivity to Conditioning

by   Vivak Patel, et al.
The University of Chicago

Modern proximal and stochastic gradient descent (SGD) methods are believed to efficiently minimize large composite objective functions, but such methods have two algorithmic challenges: (1) a lack of fast or justified stop conditions, and (2) sensitivity to the objective function's conditioning. In response to the first challenge, modern proximal and SGD methods guarantee convergence only after multiple epochs, but such a guarantee renders proximal and SGD methods infeasible when the number of component functions is very large or infinite. In response to the second challenge, second order SGD methods have been developed, but they are marred by the complexity of their analysis. In this work, we address these challenges on the limited, but important, linear regression problem by introducing and analyzing a second order proximal/SGD method based on Kalman Filtering (kSGD). Through our analysis, we show kSGD is asymptotically optimal, develop a fast algorithm for very large, infinite or streaming data sources with a justified stop condition, prove that kSGD is insensitive to the problem's conditioning, and develop a unique approach for analyzing the complex second order dynamics. Our theoretical results are supported by numerical experiments on three regression problems (linear, nonparametric wavelet, and logistic) using three large publicly available datasets. Moreover, our analysis and experiments lay a foundation for embedding kSGD in multiple epoch algorithms, extending kSGD to other problem classes, and developing parallel and low memory kSGD implementations.


page 1

page 2

page 3

page 4


Towards Asymptotic Optimality with Conditioned Stochastic Gradient Descent

In this paper, we investigate a general class of stochastic gradient des...

On SGD's Failure in Practice: Characterizing and Overcoming Stalling

Stochastic Gradient Descent (SGD) is widely used in machine learning pro...

Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

Asynchronous parallel optimization algorithms for solving large-scale ma...

Geometric descent method for convex composite minimization

In this paper, we extend the geometric descent method recently proposed ...

Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models

Stochastic optimization lies at the core of most statistical learning mo...

Convergence Guarantees for Stochastic Subgradient Methods in Nonsmooth Nonconvex Optimization

In this paper, we investigate the convergence properties of the stochast...