Critical Point Finding with Newton-MR by Analogy to Computing Square Roots

06/12/2019
by   Charles G. Frye, et al.
0

Understanding of the behavior of algorithms for resolving the optimization problem (hereafter shortened to OP) of optimizing a differentiable loss function (OP1), is enhanced by knowledge of the critical points of that loss function, i.e. the points where the gradient is 0. Here, we describe a solution to the problem of finding critical points by proposing and solving three optimization problems: 1) minimizing the norm of the gradient (OP2), 2) minimizing the difference between the pre-conditioned update direction and the gradient (OP3), and 3) minimizing the norm of the gradient along the update direction (OP4). The result is a recently-introduced algorithm for optimizing invex functions, Newton-MR, which turns out to be highly effective at the problem of finding the critical points of the loss surfaces of neural networks. We precede this derivation with an analogous, but simpler, derivation of the nested-optimization algorithm for computing square roots by combining Heron's Method with Newton-Raphson division.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2020

Critical Point-Finding Methods Reveal Gradient-Flat Regions of Deep Network Losses

Despite the fact that the loss functions of deep neural networks are hig...
research
07/29/2021

Quasi-genetic algorithms and continuation Newton methods with deflation techniques for global optimization problems

The global minimum point of an optimization problem is of interest in en...
research
08/19/2016

Critical Points for Two-view Triangulation

Two-view triangulation is a problem of minimizing a quadratic polynomial...
research
10/07/2015

Efficient Per-Example Gradient Computations

This technical report describes an efficient technique for computing the...
research
01/29/2019

Numerically Recovering the Critical Points of a Deep Linear Autoencoder

Numerically locating the critical points of non-convex surfaces is a lon...
research
12/24/2021

Lyapunov Exponents for Diversity in Differentiable Games

Ridge Rider (RR) is an algorithm for finding diverse solutions to optimi...
research
01/16/2019

DINGO: Distributed Newton-Type Method for Gradient-Norm Optimization

For optimization of a sum of functions in a distributed computing enviro...

Please sign up or login with your details

Forgot password? Click here to reset