Select without Fear: Almost All Mini-Batch Schedules Generalize Optimally

We establish matching upper and lower generalization error bounds for mini-batch Gradient Descent (GD) training with either deterministic or stochastic, data-independent, but otherwise arbitrary batch selection rules. We consider smooth Lipschitz-convex/nonconvex/strongly-convex loss functions, and show that classical upper bounds for Stochastic GD (SGD) also hold verbatim for such arbitrary nonadaptive batch schedules, including all deterministic ones. Further, for convex and strongly-convex losses we prove matching lower bounds directly on the generalization error uniform over the aforementioned class of batch schedules, showing that all such batch schedules generalize optimally. Lastly, for smooth (non-Lipschitz) nonconvex losses, we show that full-batch (deterministic) GD is essentially optimal, among all possible batch schedules within the considered class, including all stochastic ones.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2022

Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD

We provide sharp path-dependent generalization and excess error guarante...
research
07/10/2023

Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles

In this paper, we provide a novel framework for the analysis of generali...
research
02/14/2022

Black-Box Generalization

We provide the first generalization error analysis for black-box learnin...
research
09/29/2022

Convergence of the mini-batch SIHT algorithm

The Iterative Hard Thresholding (IHT) algorithm has been considered exte...
research
12/14/2021

Minimization of Stochastic First-order Oracle Complexity of Adaptive Methods for Nonconvex Optimization

Numerical evaluations have definitively shown that, for deep learning op...
research
04/16/2015

Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting

We propose mS2GD: a method incorporating a mini-batching scheme for impr...
research
11/04/2020

Which Minimizer Does My Neural Network Converge To?

The loss surface of an overparameterized neural network (NN) possesses m...

Please sign up or login with your details

Forgot password? Click here to reset