Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability

02/18/2018
by   Belinda Tzen, et al.
0

We study the detailed path-wise behavior of the discrete-time Langevin algorithm for non-convex Empirical Risk Minimization (ERM) through the lens of metastability, adopting some techniques from Berglund and Gentz. For a particular local optimum of the empirical risk, with an arbitrary initialization, we show that, with high probability, one of the two mutually exclusive events will occur: either the Langevin trajectory ends up somewhere outside the ε-neighborhood of this particular optimum within a short recurrence time; or it enters this ε-neighborhood by the recurrence time and stays there until an exponentially long escape time. We call this phenomenon empirical metastability. This two-timescale characterization aligns nicely with the existing literature in the following two senses. First, the recurrence time is dimension-independent, and resembles the convergence time of deterministic Gradient Descent (GD). However unlike GD, the Langevin algorithm does not require strong conditions on local initialization, and has the possibility of eventually visiting all optima. Second, the scaling of the escape time is consistent with the Eyring-Kramers law, which states that the Langevin scheme will eventually visit all local minima, but it will take an exponentially long time to transit among them. We apply this path-wise concentration result in the context of statistical learning to examine local notions of generalization and optimality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2018

Breaking Reversibility Accelerates Langevin Dynamics for Global Non-Convex Optimization

Langevin dynamics (LD) has been proven to be a powerful technique for op...
research
02/18/2017

A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics

We study the Stochastic Gradient Langevin Dynamics (SGLD) algorithm for ...
research
04/18/2018

A Mean Field View of the Landscape of Two-Layers Neural Networks

Multi-layer neural networks are among the most powerful models in machin...
research
02/24/2021

Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

Numerous empirical evidences have corroborated the importance of noise i...
research
08/04/2023

Frustratingly Easy Model Generalization by Dummy Risk Minimization

Empirical risk minimization (ERM) is a fundamental machine learning para...
research
05/28/2018

Understanding Generalization and Optimization Performance of Deep CNNs

This work aims to provide understandings on the remarkable success of de...
research
05/29/2019

Global Guarantees for Blind Demodulation with Generative Priors

We study a deep learning inspired formulation for the blind demodulation...

Please sign up or login with your details

Forgot password? Click here to reset