Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles

07/10/2023
by   Kevin Scaman, et al.
0

In this paper, we provide a novel framework for the analysis of generalization error of first-order optimization algorithms for statistical learning when the gradient can only be accessed through partial observations given by an oracle. Our analysis relies on the regularity of the gradient w.r.t. the data samples, and allows to derive near matching upper and lower bounds for the generalization error of multiple learning problems, including supervised learning, transfer learning, robust learning, distributed learning and communication efficient learning using gradient quantization. These results hold for smooth and strongly-convex optimization problems, as well as smooth non-convex optimization problems verifying a Polyak-Lojasiewicz assumption. In particular, our upper and lower bounds depend on a novel quantity that extends the notion of conditional standard deviation, and is a measure of the extent to which the gradient can be approximated by having access to the oracle. As a consequence, our analysis provides a precise meaning to the intuition that optimization of the statistical learning objective is as hard as the estimation of its gradient. Finally, we show that, in the case of standard supervised learning, mini-batch gradient descent with increasing batch sizes and a warm start can reach a generalization error that is optimal up to a multiplicative factor, thus motivating the use of this optimization scheme in practical applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2023

Select without Fear: Almost All Mini-Batch Schedules Generalize Optimally

We establish matching upper and lower generalization error bounds for mi...
research
03/19/2023

Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization

Recent progress was made in characterizing the generalization error of g...
research
04/08/2018

Distributed Non-Convex First-Order Optimization and Information Processing: Lower Complexity Bounds and Rate Optimal Algorithms

We consider a class of distributed non-convex optimization problems ofte...
research
04/19/2022

Making Progress Based on False Discoveries

We consider the question of adaptive data analysis within the framework ...
research
10/27/2020

Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where...
research
05/25/2016

Tight Complexity Bounds for Optimizing Composite Objectives

We provide tight upper and lower bounds on the complexity of minimizing ...
research
01/17/2022

Generalization in Supervised Learning Through Riemannian Contraction

We prove that Riemannian contraction in a supervised learning setting im...

Please sign up or login with your details

Forgot password? Click here to reset