Ellipsoidal Trust Region Methods and the Marginal Value of Hessian Information for Neural Network Training

05/22/2019
by   Leonard Adolphs, et al.
4

We investigate the use of ellipsoidal trust region constraints for second-order optimization of neural networks. This approach can be seen as a higher-order counterpart of adaptive gradient methods, which we here show to be interpretable as first-order trust region methods with ellipsoidal constraints. In particular, we show that the preconditioning matrix used in RMSProp and Adam satisfies the necessary conditions for convergence of (first- and) second-order trust region methods and report that this ellipsoidal constraint constantly outperforms its spherical counterpart in practice. We furthermore set out to clarify the long-standing question of the potential superiority of Newton methods in deep learning. In this regard, we run extensive benchmarks across different datasets and architectures to find that comparable performance to gradient descent algorithms can be achieved but using Hessian information does not give rise to better limit points and comes at the cost of increased hyperparameter tuning.

READ FULL TEXT
research
07/30/2022

DRSOM: A Dimension Reduced Second-Order Method and Preliminary Analyses

We introduce a Dimension-Reduced Second-Order Method (DRSOM) for convex ...
research
09/12/2023

Trust-Region Neural Moving Horizon Estimation for Robots

Accurate disturbance estimation is essential for safe robot operations. ...
research
05/23/2018

A Two-Stage Subspace Trust Region Approach for Deep Neural Network Training

In this paper, we develop a novel second-order method for training feed-...
research
09/30/2020

Where Does Trust Break Down? A Quantitative Trust Analysis of Deep Neural Networks via Trust Matrix and Conditional Trust Densities

The advances and successes in deep learning in recent years have led to ...
research
06/20/2018

A Distributed Second-Order Algorithm You Can Trust

Due to the rapid growth of data and computational resources, distributed...
research
11/08/2013

An Experimental Comparison of Trust Region and Level Sets

High-order (non-linear) functionals have become very popular in segmenta...
research
05/21/2018

Small steps and giant leaps: Minimal Newton solvers for Deep Learning

We propose a fast second-order method that can be used as a drop-in repl...

Please sign up or login with your details

Forgot password? Click here to reset