A Deterministic Approach to Avoid Saddle Points

01/21/2019
by   Lisa Maria Kreusser, et al.
15

Loss functions with a large number of saddle points are one of the main obstacles to training many modern machine learning models. Gradient descent (GD) is a fundamental algorithm for machine learning and converges to a saddle point for certain initial data. We call the region formed by these initial values the "attraction region." For quadratic functions, GD converges to a saddle point if the initial data is in a subspace of up to n-1 dimensions. In this paper, we prove that a small modification of the recently proposed Laplacian smoothing gradient descent (LSGD) [Osher, et al., arXiv:1806.06317] contributes to avoiding saddle points without sacrificing the convergence rate of GD. In particular, we show that the dimension of the LSGD's attraction region is at most floor((n-1)/2) for a class of quadratic functions which is significantly smaller than GD's (n-1)-dimensional attraction region.

READ FULL TEXT
research
08/17/2020

A Realistic Example in 2 Dimension that Gradient Descent Takes Exponential Time to Escape Saddle Points

Gradient descent is a popular algorithm in optimization, and its perform...
research
03/02/2017

How to Escape Saddle Points Efficiently

This paper shows that a perturbed form of gradient descent converges to ...
research
05/29/2017

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Although gradient descent (GD) almost always escapes saddle points asymp...
research
03/05/2018

Convergence of Gradient Descent on Separable Data

The implicit bias of gradient descent is not fully understood even in si...
research
04/06/2021

A Caputo fractional derivative-based algorithm for optimization

We propose a novel Caputo fractional derivative-based optimization algor...
research
05/23/2018

A Two-Stage Subspace Trust Region Approach for Deep Neural Network Training

In this paper, we develop a novel second-order method for training feed-...
research
10/27/2020

Particle gradient descent model for point process generation

This paper introduces a generative model for planar point processes in a...

Please sign up or login with your details

Forgot password? Click here to reset