Low Rank Saddle Free Newton: Algorithm and Analysis

02/07/2020
by   Thomas O'Leary-Roseberry, et al.
0

Many tasks in engineering fields and machine learning involve minimizing a high dimensional non-convex function. The existence of saddle points poses a central challenge in practice. The Saddle Free Newton (SFN) algorithm can rapidly escape high dimensional saddle points by using the absolute value of the Hessian of the empirical risk function. In SFN, a Lanczos type procedure is used to approximate the absolute value of the Hessian. Motivated by recent empirical works that note neural network training Hessians are typically low rank, we propose using approximation via scalable randomized low rank methods. Such factorizations can be efficiently inverted via Sherman Morrison Woodbury formula. We derive bounds for convergence rates in expectation for a stochastic version of the algorithm, which quantify errors incurred in subsampling as well as in approximating the Hessian via low rank factorization. We test the method on standard neural network training benchmark problems: MNIST and CIFAR10. Numerical results demonstrate that in addition to avoiding saddle points, the method can converge faster than first order methods, and the Hessian can be subsampled significantly relative to the gradient and retain superior performance for the method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2014

On the saddle point problem for non-convex optimization

A central challenge to many fields of science and engineering involves m...
research
10/08/2020

Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks

Hessian captures important properties of the deep neural network loss la...
research
05/27/2020

PNKH-B: A Projected Newton-Krylov Method for Large-Scale Bound-Constrained Optimization

We present PNKH-B, a projected Newton-Krylov method with a low-rank appr...
research
02/13/2019

Do Subsampled Newton Methods Work for High-Dimensional Data?

Subsampled Newton methods approximate Hessian matrices through subsampli...
research
06/30/2021

Analytic Insights into Structure and Rank of Neural Network Hessian Maps

The Hessian of a neural network captures parameter interactions through ...
research
09/03/2013

SKYNET: an efficient and robust neural network training tool for machine learning in astronomy

We present the first public release of our generic neural network traini...
research
10/20/2017

Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods

Our goal is to improve variance reducing stochastic methods through bett...

Please sign up or login with your details

Forgot password? Click here to reset