DeepAI AI Chat
Log In Sign Up

Making SGD Parameter-Free

by   Yair Carmon, et al.
Tel Aviv University
University of Pittsburgh

We develop an algorithm for parameter-free stochastic convex optimization (SCO) whose rate of convergence is only a double-logarithmic factor larger than the optimal rate for the corresponding known-parameter setting. In contrast, the best previously known rates for parameter-free SCO are based on online parameter-free regret bounds, which contain unavoidable excess logarithmic terms compared to their known-parameter counterparts. Our algorithm is conceptually simple, has high-probability guarantees, and is also partially adaptive to unknown gradient norms, smoothness, and strong convexity. At the heart of our results is a novel parameter-free certificate for SGD step size choice, and a time-uniform concentration result that assumes no a-priori bounds on SGD iterates.


page 1

page 2

page 3

page 4


Making the Last Iterate of SGD Information Theoretically Optimal

Stochastic gradient descent (SGD) is one of the most widely used algorit...

Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

We propose a stochastic variant of the classical Polyak step-size (Polya...

Improved Learning Rates for Stochastic Optimization: Two Theoretical Viewpoints

Generalization performance of stochastic optimization stands a central p...

SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation

We provide several convergence theorems for SGD for two large classes of...

Graph-Dependent Implicit Regularisation for Distributed Stochastic Subgradient Descent

We propose graph-dependent implicit regularisation strategies for distri...

Parameter-free Regret in High Probability with Heavy Tails

We present new algorithms for online convex optimization over unbounded ...

Online to Offline Conversions, Universality and Adaptive Minibatch Sizes

We present an approach towards convex optimization that relies on a nove...