Uniform Convergence of Gradients for Non-Convex Learning and Optimization

10/25/2018
by   Dylan J. Foster, et al.
0

We investigate 1) the rate at which refined properties of the empirical risk---in particular, gradients---converge to their population counterparts in standard non-convex learning tasks, and 2) the consequences of this convergence for optimization. Our analysis follows the tradition of norm-based capacity control. We propose vector-valued Rademacher complexities as a simple, composable, and user-friendly tool to derive dimension-free uniform convergence bounds for gradients in non-convex learning problems. As an application of our techniques, we give a new analysis of batch gradient descent methods for non-convex generalized linear models and non-convex robust regression, showing how to use any algorithm that finds approximate stationary points to obtain optimal sample complexity, even when dimension is high or possibly infinite and multiple passes over the dataset are allowed. Moving to non-smooth models we show----in contrast to the smooth case---that even for a single ReLU it is not possible to obtain dimension-independent convergence rates for gradients in the worst case. On the positive side, it is still possible to obtain dimension-independent rates under a new type of distributional assumption.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2023

Convex and Non-Convex Optimization under Generalized Smoothness

Classical analysis of convex and non-convex optimization methods often r...
research
02/29/2020

Dimension-free convergence rates for gradient Langevin dynamics in RKHS

Gradient Langevin dynamics (GLD) and stochastic GLD (SGLD) have attracte...
research
09/14/2020

Effective Proximal Methods for Non-convex Non-smooth Regularized Learning

Sparse learning is a very important tool for mining useful information a...
research
02/04/2021

Concentration of Non-Isotropic Random Tensors with Applications to Learning and Empirical Risk Minimization

Dimension is an inherent bottleneck to some modern learning tasks, where...
research
09/19/2022

Generalization Bounds for Stochastic Gradient Descent via Localized ε-Covers

In this paper, we propose a new covering technique localized for the tra...
research
04/30/2021

Convergence Analysis of a Local Stationarity Scheme for Rate-Independent Systems and Application to Damage

This paper is concerned with an approximation scheme for rate-independen...
research
05/29/2019

Vector-Valued Graph Trend Filtering with Non-Convex Penalties

We study the denoising of piecewise smooth graph signals that exhibit in...

Please sign up or login with your details

Forgot password? Click here to reset