Nys-Curve: Nyström-Approximated Curvature for Stochastic Optimization

10/16/2021
by   Hardik Tankaria, et al.
0

The quasi-Newton methods generally provide curvature information by approximating the Hessian using the secant equation. However, the secant equation becomes insipid in approximating the Newton step owing to its use of the first-order derivatives. In this study, we propose an approximate Newton step-based stochastic optimization algorithm for large-scale empirical risk minimization of convex functions with linear convergence rates. Specifically, we compute a partial column Hessian of size (d× k) with k≪ d randomly selected variables, then use the Nyström method to better approximate the full Hessian matrix. To further reduce the computational complexity per iteration, we directly compute the update step (Δw) without computing and storing the full Hessian or its inverse. Furthermore, to address large-scale scenarios in which even computing a partial Hessian may require significant time, we used distribution-preserving (DP) sub-sampling to compute a partial Hessian. The DP sub-sampling generates p sub-samples with similar first and second-order distribution statistics and selects a single sub-sample at each epoch in a round-robin manner to compute the partial Hessian. We integrate our approximated Hessian with stochastic gradient descent and stochastic variance-reduced gradients to solve the logistic regression problem. The numerical experiments show that the proposed approach was able to obtain a better approximation of Newton's method with performance competitive with the state-of-the-art first-order and the stochastic quasi-Newton methods.

READ FULL TEXT
research
09/27/2016

Exact and Inexact Subsampled Newton Methods for Optimization

The paper studies the solution of stochastic optimization problems in wh...
research
01/27/2014

A Stochastic Quasi-Newton Method for Large-Scale Optimization

The question of how to incorporate curvature information in stochastic a...
research
09/28/2020

Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization

In this paper, we introduce Apollo, a quasi-Newton method for nonconvex ...
research
02/12/2020

A Random-Feature Based Newton Method for Empirical Risk Minimization in Reproducing Kernel Hilbert Space

In supervised learning using kernel methods, we encounter a large-scale ...
research
07/25/2023

mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization

Quasi-Newton methods still face significant challenges in training large...
research
03/18/2021

Hessian Initialization Strategies for L-BFGS Solving Non-linear Inverse Problems

L-BFGS is the state-of-the-art optimization method for many large scale ...
research
05/06/2022

Estimation and Inference by Stochastic Optimization

In non-linear estimations, it is common to assess sampling uncertainty b...

Please sign up or login with your details

Forgot password? Click here to reset