Minimizing Nonconvex Population Risk from Rough Empirical Risk

03/25/2018
by   Chi Jin, et al.
0

Population risk---the expectation of the loss over the sampling mechanism---is always of primary interest in machine learning. However, learning algorithms only have access to empirical risk, which is the average loss over training examples. Although the two risks are typically guaranteed to be pointwise close, for applications with nonconvex nonsmooth losses (such as modern deep networks), the effects of sampling can transform a well-behaved population risk into an empirical risk with a landscape that is problematic for optimization. The empirical risk can be nonsmooth, and it may have many additional local minima. This paper considers a general optimization framework which aims to find approximate local minima of a smooth nonconvex function F (population risk) given only access to the function value of another function f (empirical risk), which is pointwise close to F (i.e., F-f_∞<ν). We propose a simple algorithm based on stochastic gradient descent (SGD) on a smoothed version of f which is guaranteed to find an ϵ-second-order stationary point if ν< O(ϵ^1.5/d), thus escaping all saddle points of F and all the additional local minima introduced by f. We also provide an almost matching lower bound showing that our SGD-based approach achieves the optimal trade-off between ν and ϵ, as well as the optimal dependence on problem dimension d, among all algorithms making a polynomial number of queries. As a concrete example, we show that our results can be directly used to give sample complexities for learning a ReLU unit, whose empirical risk is nonsmooth.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2017

A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics

We study the Stochastic Gradient Langevin Dynamics (SGLD) algorithm for ...
research
08/29/2017

Natasha 2: Faster Non-Convex Optimization Than SGD

We design a stochastic algorithm to train any smooth neural network to ε...
research
10/17/2018

Uniform Graphical Convergence of Subgradients in Nonconvex Optimization and Learning

We investigate the stochastic optimization problem of minimizing populat...
research
10/06/2019

Statistical Analysis of Stationary Solutions of Coupled Nonconvex Nonsmooth Empirical Risk Minimization

This paper has two main goals: (a) establish several statistical propert...
research
05/28/2018

Understanding Generalization and Optimization Performance of Deep CNNs

This work aims to provide understandings on the remarkable success of de...
research
03/03/2023

Learning High-Dimensional Single-Neuron ReLU Networks with Finite Samples

This paper considers the problem of learning a single ReLU neuron with s...
research
02/20/2023

Private (Stochastic) Non-Convex Optimization Revisited: Second-Order Stationary Points and Excess Risks

We consider the problem of minimizing a non-convex objective while prese...

Please sign up or login with your details

Forgot password? Click here to reset