Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles

by   Kevin Scaman, et al.

In this paper, we provide a novel framework for the analysis of generalization error of first-order optimization algorithms for statistical learning when the gradient can only be accessed through partial observations given by an oracle. Our analysis relies on the regularity of the gradient w.r.t. the data samples, and allows to derive near matching upper and lower bounds for the generalization error of multiple learning problems, including supervised learning, transfer learning, robust learning, distributed learning and communication efficient learning using gradient quantization. These results hold for smooth and strongly-convex optimization problems, as well as smooth non-convex optimization problems verifying a Polyak-Lojasiewicz assumption. In particular, our upper and lower bounds depend on a novel quantity that extends the notion of conditional standard deviation, and is a measure of the extent to which the gradient can be approximated by having access to the oracle. As a consequence, our analysis provides a precise meaning to the intuition that optimization of the statistical learning objective is as hard as the estimation of its gradient. Finally, we show that, in the case of standard supervised learning, mini-batch gradient descent with increasing batch sizes and a warm start can reach a generalization error that is optimal up to a multiplicative factor, thus motivating the use of this optimization scheme in practical applications.


page 1

page 2

page 3

page 4


Select without Fear: Almost All Mini-Batch Schedules Generalize Optimally

We establish matching upper and lower generalization error bounds for mi...

Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization

Recent progress was made in characterizing the generalization error of g...

Distributed Non-Convex First-Order Optimization and Information Processing: Lower Complexity Bounds and Rate Optimal Algorithms

We consider a class of distributed non-convex optimization problems ofte...

Making Progress Based on False Discoveries

We consider the question of adaptive data analysis within the framework ...

Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where...

Tight Complexity Bounds for Optimizing Composite Objectives

We provide tight upper and lower bounds on the complexity of minimizing ...

Generalization in Supervised Learning Through Riemannian Contraction

We prove that Riemannian contraction in a supervised learning setting im...

Please sign up or login with your details

Forgot password? Click here to reset