Asynchronous Parallel Stochastic Quasi-Newton Methods

11/02/2020
by   Qianqian Tong, et al.
0

Although first-order stochastic algorithms, such as stochastic gradient descent, have been the main force to scale up machine learning models, such as deep neural nets, the second-order quasi-Newton methods start to draw attention due to their effectiveness in dealing with ill-conditioned optimization problems. The L-BFGS method is one of the most widely used quasi-Newton methods. We propose an asynchronous parallel algorithm for stochastic quasi-Newton (AsySQN) method. Unlike prior attempts, which parallelize only the calculation for gradient or the two-loop recursion of L-BFGS, our algorithm is the first one that truly parallelizes L-BFGS with a convergence guarantee. Adopting the variance reduction technique, a prior stochastic L-BFGS, which has not been designed for parallel computing, reaches a linear convergence rate. We prove that our asynchronous parallel scheme maintains the same linear convergence rate but achieves significant speedup. Empirical evaluations in both simulations and benchmark datasets demonstrate the speedup in comparison with the non-parallel stochastic L-BFGS, as well as the better performance than first-order methods in solving ill-conditioned problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2018

Asynchronous Stochastic Proximal Methods for Nonconvex Nonsmooth Optimization

We study stochastic algorithms for solving non-convex optimization probl...
research
04/26/2017

Stochastic Orthant-Wise Limited-Memory Quasi-Newton Methods

The ℓ_1-regularized sparse model has been popular in machine learning so...
research
02/26/2020

Fast Linear Convergence of Randomized BFGS

Since the late 1950's when quasi-Newton methods first appeared, they hav...
research
07/08/2021

Identification and Adaptation with Binary-Valued Observations under Non-Persistent Excitation Condition

Dynamical systems with binary-valued observations are widely used in inf...
research
06/27/2015

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

Asynchronous parallel implementations of stochastic gradient (SG) have b...
research
10/21/2020

Progressive Batching for Efficient Non-linear Least Squares

Non-linear least squares solvers are used across a broad range of offlin...
research
11/03/2020

SGB: Stochastic Gradient Bound Method for Optimizing Partition Functions

This paper addresses the problem of optimizing partition functions in a ...

Please sign up or login with your details

Forgot password? Click here to reset