Distribution-Dependent Analysis of Gibbs-ERM Principle

02/05/2019
by   Ilja Kuzborskij, et al.
0

Gibbs-ERM learning is a natural idealized model of learning with stochastic optimization algorithms (such as Stochastic Gradient Langevin Dynamics and ---to some extent--- Stochastic Gradient Descent), while it also arises in other contexts, including PAC-Bayesian theory, and sampling mechanisms. In this work we study the excess risk suffered by a Gibbs-ERM learner that uses non-convex, regularized empirical risk with the goal to understand the interplay between the data-generating distribution and learning in large hypothesis spaces. Our main results are distribution-dependent upper bounds on several notions of excess risk. We show that, in all cases, the distribution-dependent excess risk is essentially controlled by the effective dimension tr(H^ (H^ + λI)^-1) of the problem, where H^ is the Hessian matrix of the risk at a local minimum. This is a well-established notion of effective dimension appearing in several previous works, including the analyses of SGD and ridge regression, but ours is the first work that brings this dimension to the analysis of learning using Gibbs densities. The distribution-dependent view we advocate here improves upon earlier results of Raginsky et al. (2017), and can yield much tighter bounds depending on the interplay between the data-generating distribution and the loss function. The first part of our analysis focuses on the localized excess risk in the vicinity of a fixed local minimizer. This result is then extended to bounds on the global excess risk, by characterizing probabilities of local minima (and their complement) under Gibbs densities, a results which might be of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2017

A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics

We study the Stochastic Gradient Langevin Dynamics (SGLD) algorithm for ...
research
06/29/2021

Never Go Full Batch (in Stochastic Convex Optimization)

We study the generalization performance of full-batch optimization algor...
research
05/30/2019

On stochastic gradient Langevin dynamics with dependent data streams: the fully non-convex case

We consider the problem of sampling from a target distribution which is ...
research
02/01/2021

Information-Theoretic Generalization Bounds for Stochastic Gradient Descent

We study the generalization properties of the popular stochastic gradien...
research
02/02/2019

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

Generalization error (also known as the out-of-sample error) measures ho...
research
03/03/2023

Learning High-Dimensional Single-Neuron ReLU Networks with Finite Samples

This paper considers the problem of learning a single ReLU neuron with s...
research
04/08/2022

Decision-Dependent Risk Minimization in Geometrically Decaying Dynamic Environments

This paper studies the problem of expected loss minimization given a dat...

Please sign up or login with your details

Forgot password? Click here to reset