Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs

08/03/2022
by   Severin Reiz, et al.
0

Training deep neural networks consumes increasing computational resource shares in many compute centers. Often, a brute force approach to obtain hyperparameter values is employed. Our goal is (1) to enhance this by enabling second-order optimization methods with fewer hyperparameters for large-scale neural networks and (2) to perform a survey of the performance optimizers for specific tasks to suggest users the best one for their problem. We introduce a novel second-order optimization method that requires the effect of the Hessian on a vector only and avoids the huge cost of explicitly setting up the Hessian for large-scale networks. We compare the proposed second-order method with two state-of-the-art optimizers on five representative neural network problems, including regression and very deep networks from computer vision or variational autoencoders. For the largest setup, we efficiently parallelized the optimizers with Horovod and applied it to a 8 GPU NVIDIA P100 (DGX-1) machine.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2020

Second-order Neural Network Training Using Complex-step Directional Derivative

While the superior performance of second-order optimization methods such...
research
12/20/2017

Block-diagonal Hessian-free Optimization for Training Neural Networks

Second-order methods for neural network optimization have several advant...
research
09/26/2018

Stochastic Second-order Methods for Non-convex Optimization with Inexact Hessian and Gradient

Trust region and cubic regularization methods have demonstrated good per...
research
05/26/2022

Faster Optimization on Sparse Graphs via Neural Reparametrization

In mathematical optimization, second-order Newton's methods generally co...
research
08/02/2017

On the Importance of Consistency in Training Deep Neural Networks

We explain that the difficulties of training deep neural networks come f...
research
10/19/2021

SOSP: Efficiently Capturing Global Correlations by Second-Order Structured Pruning

Pruning neural networks reduces inference time and memory costs. On stan...
research
05/03/2014

Supervised Descent Method for Solving Nonlinear Least Squares Problems in Computer Vision

Many computer vision problems (e.g., camera calibration, image alignment...

Please sign up or login with your details

Forgot password? Click here to reset