Beneath the valley of the noncommutative arithmetic-geometric mean inequality: conjectures, case-studies, and consequences

02/19/2012
by   Benjamin Recht, et al.
0

Randomized algorithms that base iteration-level decisions on samples from some pool are ubiquitous in machine learning and optimization. Examples include stochastic gradient descent and randomized coordinate descent. This paper makes progress at theoretically evaluating the difference in performance between sampling with- and without-replacement in such algorithms. Focusing on least means squares optimization, we formulate a noncommutative arithmetic-geometric mean inequality that would prove that the expected convergence rate of without-replacement sampling is faster than that of with-replacement sampling. We demonstrate that this inequality holds for many classes of random matrices and for some pathological examples as well. We provide a deterministic worst-case bound on the gap between the discrepancy between the two sampling models, and explore some of the impediments to proving this inequality in full generality. We detail the consequences of this inequality for stochastic gradient descent and the randomized Kaczmarz algorithm for solving linear systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2020

Recht-Ré Noncommutative Arithmetic-Geometric Mean Conjecture is False

Stochastic optimization algorithms have become indispensable in modern m...
research
02/09/2021

Berry–Esseen Bounds for Multivariate Nonlinear Statistics with Applications to M-estimators and Stochastic Gradient Descent Algorithms

We establish a Berry–Esseen bound for general multivariate nonlinear sta...
research
08/28/2018

Exponential inequality for chaos based on sampling without replacement

We are interested in the behavior of particular functionals, in a framew...
research
03/02/2016

Without-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization

Stochastic gradient methods for machine learning and optimization proble...
research
08/16/2016

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition

In 1963, Polyak proposed a simple condition that is sufficient to show a...
research
01/23/2023

On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality

When training neural networks with low-precision computation, rounding e...

Please sign up or login with your details

Forgot password? Click here to reset