Stability of SGD: Tightness Analysis and Improved Bounds

02/10/2021
by   Yikai Zhang, et al.
8

Stochastic Gradient Descent (SGD) based methods have been widely used for training large-scale machine learning models that also generalize well in practice. Several explanations have been offered for this generalization performance, a prominent one being algorithmic stability [18]. However, there are no known examples of smooth loss functions for which the analysis can be shown to be tight. Furthermore, apart from the properties of the loss function, data distribution has also been shown to be an important factor in generalization performance. This raises the question: is the stability analysis of [18] tight for smooth functions, and if not, for what kind of loss functions and data distributions can the stability analysis be improved? In this paper we first settle open questions regarding tightness of bounds in the data-independent setting: we show that for general datasets, the existing analysis for convex and strongly-convex loss functions is tight, but it can be improved for non-convex loss functions. Next, we give a novel and improved data-dependent bounds: we show stability upper bounds for a large class of convex regularized loss functions, with negligible regularization parameters, and improve existing data-dependent bounds in the non-convex setting. We hope that our results will initiate further efforts to better understand the data-dependent setting under non-convex loss functions, leading to an improved understanding of the generalization abilities of deep networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2020

Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent

Recently there are a considerable amount of work devoted to the study of...
research
03/25/2020

Dimension Independent Generalization Error with Regularized Online Optimization

One classical canon of statistics is that large models are prone to over...
research
02/26/2021

On the Generalization of Stochastic Gradient Descent with Momentum

While momentum-based methods, in conjunction with stochastic gradient de...
research
06/29/2021

Optimal Rates for Random Order Online Optimization

We study online convex optimization in the random order model, recently ...
research
01/30/2022

Faster Convergence of Local SGD for Over-Parameterized Models

Modern machine learning architectures are often highly expressive. They ...
research
02/03/2022

Certifying Out-of-Domain Generalization for Blackbox Functions

Certifying the robustness of model performance under bounded data distri...
research
11/30/2020

Persistent Reductions in Regularized Loss Minimization for Variable Selection

In the context of regularized loss minimization with polyhedral gauges, ...

Please sign up or login with your details

Forgot password? Click here to reset