Generalization Error Bounds for Optimization Algorithms via Stability

09/27/2016
by   Qi Meng, et al.
0

Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction (SVRG). Conventional analysis on these optimization algorithms focuses on their convergence rates during the training process, however, people in the machine learning community may care more about the generalization performance of the learned model on unseen test data. In this paper, we investigate on this issue, by using stability as a tool. In particular, we decompose the generalization error for R-ERM, and derive its upper bound for both convex and non-convex cases. In convex cases, we prove that the generalization error can be bounded by the convergence rate of the optimization algorithm and the stability of the R-ERM process, both in expectation (in the order of O((1/n)+Eρ(T)), where ρ(T) is the convergence error and T is the number of iterations) and in high probability (in the order of O(1/δ/√(n)+ρ(T)) with probability 1-δ). For non-convex cases, we can also obtain a similar expected generalization error bound. Our theorems indicate that 1) along with the training process, the generalization error will decrease for all the optimization algorithms under our investigation; 2) Comparatively speaking, SVRG has better generalization ability than GD and SGD. We have conducted experiments on both convex and non-convex problems, and the experimental results verify our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2023

Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent

Stochastic gradient descent (SGD) performed in an asynchronous manner pl...
research
06/11/2021

Towards Understanding Generalization via Decomposing Excess Risk Dynamics

Generalization is one of the critical issues in machine learning. Howeve...
research
07/17/2022

Uniform Stability for First-Order Empirical Risk Minimization

We consider the problem of designing uniformly stable first-order optimi...
research
06/28/2023

Ordering for Non-Replacement SGD

One approach for reducing run time and improving efficiency of machine l...
research
03/10/2020

Learning to be Global Optimizer

The advancement of artificial intelligence has cast a new light on the d...
research
06/06/2019

A Look at the Effect of Sample Design on Generalization through the Lens of Spectral Analysis

This paper provides a general framework to study the effect of sampling ...
research
05/20/2023

Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent

Algorithmic stability is an important notion that has proven powerful fo...

Please sign up or login with your details

Forgot password? Click here to reset