Stochastic Orthant-Wise Limited-Memory Quasi-Newton Methods

04/26/2017
by   Jianqiao Wangni, et al.
0

The ℓ_1-regularized sparse model has been popular in machine learning society. The orthant-wise quasi-Newton (OWL-QN) method is a representative fast algorithm for training the model. However, the proof of the convergence has been pointed out to be incorrect by multiple sources, and up until now, its convergence has not been proved at all. In this paper, we propose a stochastic OWL-QN method for solving ℓ_1-regularized problems, both with convex and non-convex loss functions. We address technical difficulties that have existed many years. We propose three alignment steps which are generalized from the the original OWL-QN algorithm, to encourage the parameter update be orthant-wise. We adopt several practical features from recent stochastic variants of L-BFGS and the variance reduction method for subsampled gradients. To the best of our knowledge, this is the first orthant-wise algorithms with comparable theoretical convergence rate with stochastic first order algorithms. We prove a linear convergence rate for our algorithm under strong convexity, and experimentally demonstrate that our algorithm achieves state-of-art performance on ℓ_1 regularized logistic regression and convolutional neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2020

Asynchronous Parallel Stochastic Quasi-Newton Methods

Although first-order stochastic algorithms, such as stochastic gradient ...
research
06/03/2023

Accelerated Quasi-Newton Proximal Extragradient: Faster Rate for Smooth Convex Optimization

In this paper, we propose an accelerated quasi-Newton proximal extragrad...
research
05/22/2023

Sketch-and-Project Meets Newton Method: Global 𝒪(k^-2) Convergence with Low-Rank Updates

In this paper, we propose the first sketch-and-project Newton method wit...
research
04/24/2008

A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning

We extend the well-known BFGS quasi-Newton method and its memory-limited...
research
07/08/2021

Identification and Adaptation with Binary-Valued Observations under Non-Persistent Excitation Condition

Dynamical systems with binary-valued observations are widely used in inf...
research
02/22/2022

Nonconvex Extension of Generalized Huber Loss for Robust Learning and Pseudo-Mode Statistics

We propose an extended generalization of the pseudo Huber loss formulati...
research
05/20/2017

Learning Feature Nonlinearities with Non-Convex Regularized Binned Regression

For various applications, the relations between the dependent and indepe...

Please sign up or login with your details

Forgot password? Click here to reset