Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization

02/19/2018
by   Yi Zhou, et al.
0

The success of deep learning has led to a rising interest in the generalization property of the stochastic gradient descent (SGD) method, and stability is one popular approach to study it. Existing works based on stability have studied nonconvex loss functions, but only considered the generalization error of the SGD in expectation. In this paper, we establish various generalization error bounds with probabilistic guarantee for the SGD. Specifically, for both general nonconvex loss functions and gradient dominant loss functions, we characterize the on-average stability of the iterates generated by SGD in terms of the on-average variance of the stochastic gradients. Such characterization leads to improved bounds for the generalization error for SGD. We then study the regularized risk minimization problem with strongly convex regularizers, and obtain improved generalization error bounds for proximal SGD. With strongly convex regularizers, we further establish the generalization error bounds for nonconvex loss functions under proximal SGD with high-probability guarantee, i.e., exponential concentration in probability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2020

Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent

Recently there are a considerable amount of work devoted to the study of...
research
10/23/2017

Stability and Generalization of Learning Algorithms that Converge to Global Optima

We establish novel generalization bounds for learning algorithms that co...
research
09/18/2020

Hybrid Stochastic-Deterministic Minibatch Proximal Gradient: Less-Than-Single-Pass Optimization with Nearly Optimal Generalization

Stochastic variance-reduced gradient (SVRG) algorithms have been shown t...
research
02/20/2023

On the Stability and Generalization of Triplet Learning

Triplet learning, i.e. learning from triplet data, has attracted much at...
research
06/09/2021

From inexact optimization to learning via gradient concentration

Optimization was recently shown to control the inductive bias in a learn...
research
04/26/2022

Beyond Lipschitz: Sharp Generalization and Excess Risk Bounds for Full-Batch GD

We provide sharp path-dependent generalization and excess error guarante...
research
02/20/2023

Stability-based Generalization Analysis for Mixtures of Pointwise and Pairwise Learning

Recently, some mixture algorithms of pointwise and pairwise learning (PP...

Please sign up or login with your details

Forgot password? Click here to reset