PROMISE: Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates

09/05/2023
by   Zachary Frangella, et al.
0

This paper introduces PROMISE (Preconditioned Stochastic Optimization Methods by Incorporating Scalable Curvature Estimates), a suite of sketching-based preconditioned stochastic gradient algorithms for solving large-scale convex optimization problems arising in machine learning. PROMISE includes preconditioned versions of SVRG, SAGA, and Katyusha; each algorithm comes with a strong theoretical analysis and effective default hyperparameter values. In contrast, traditional stochastic gradient methods require careful hyperparameter tuning to succeed, and degrade in the presence of ill-conditioning, a ubiquitous phenomenon in machine learning. Empirically, we verify the superiority of the proposed algorithms by showing that, using default hyperparameter values, they outperform or match popular tuned stochastic gradient optimizers on a test bed of 51 ridge and logistic regression problems assembled from benchmark machine learning repositories. On the theoretical side, this paper introduces the notion of quadratic regularity in order to establish linear convergence of all proposed methods even when the preconditioner is updated infrequently. The speed of linear convergence is determined by the quadratic regularity ratio, which often provides a tighter bound on the convergence rate compared to the condition number, both in theory and in practice, and explains the fast global linear convergence of the proposed methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/30/2021

Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive Step Size

Establishing a fast rate of convergence for optimization methods is cruc...
research
06/08/2022

A Unified Convergence Theorem for Stochastic Optimization Methods

In this work, we provide a fundamental unified convergence theorem used ...
research
10/01/2021

Bilevel stochastic methods for optimization and machine learning: Bilevel stochastic descent and DARTS

Two-level stochastic optimization formulations have become instrumental ...
research
01/18/2020

Adaptive Stochastic Optimization

Optimization lies at the heart of machine learning and signal processing...
research
06/30/2017

Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning

The goal of this tutorial is to introduce key models, algorithms, and op...
research
01/30/2016

SCOPE: Scalable Composite Optimization for Learning on Spark

Many machine learning models, such as logistic regression (LR) and suppo...
research
10/26/2020

Stochastic Optimization with Laggard Data Pipelines

State-of-the-art optimization is steadily shifting towards massively par...

Please sign up or login with your details

Forgot password? Click here to reset