DeepAI AI Chat
Log In Sign Up

Generalization Bounds for Stochastic Gradient Descent via Localized ε-Covers

by   Sejun Park, et al.

In this paper, we propose a new covering technique localized for the trajectories of SGD. This localization provides an algorithm-specific complexity measured by the covering number, which can have dimension-independent cardinality in contrast to standard uniform covering arguments that result in exponential dimension dependency. Based on this localized construction, we show that if the objective function is a finite perturbation of a piecewise strongly convex and smooth function with P pieces, i.e. non-convex and non-smooth in general, the generalization error can be upper bounded by O(√((log nlog(nP))/n)), where n is the number of data samples. In particular, this rate is independent of dimension and does not require early stopping and decaying step size. Finally, we employ these results in various contexts and derive generalization bounds for multi-index linear models, multi-class support vector machines, and K-means clustering for both hard and soft label setups, improving the known state-of-the-art rates.


page 1

page 2

page 3

page 4


Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes

Stochastic Gradient Descent (SGD) is one of the simplest and most popula...

The Convergence Rate of SGD's Final Iterate: Analysis on Dimension Dependence

Stochastic Gradient Descent (SGD) is among the simplest and most popular...

Uniform Convergence of Gradients for Non-Convex Learning and Optimization

We investigate 1) the rate at which refined properties of the empirical ...

Fine-grained Generalization Analysis of Vector-valued Learning

Many fundamental machine learning tasks can be formulated as a problem o...

Towards Optimal Problem Dependent Generalization Error Bounds in Statistical Learning Theory

We study problem-dependent rates, i.e., generalization errors that scale...

Error Bounds for Piecewise Smooth and Switching Regression

The paper deals with regression problems, in which the nonsmooth target ...