Dual Gauss-Newton Directions for Deep Learning

08/17/2023
by   Vincent Roulet, et al.
0

Inspired by Gauss-Newton-like methods, we study the benefit of leveraging the structure of deep learning objectives, namely, the composition of a convex loss function and of a nonlinear network, in order to derive better direction oracles than stochastic gradients, based on the idea of partial linearization. In a departure from previous works, we propose to compute such direction oracles via their dual formulation, leading to both computational benefits and new insights. We demonstrate that the resulting oracles define descent directions that can be used as a drop-in replacement for stochastic gradients, in existing optimization algorithms. We empirically study the advantage of using the dual formulation as well as the computational trade-offs involved in the computation of such oracles.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/16/2022

SketchySGD: Reliable Stochastic Optimization via Robust Curvature Estimates

We introduce SketchySGD, a stochastic quasi-Newton method that uses sket...
research
09/11/2017

GIANT: Globally Improved Approximate Newton Method for Distributed Optimization

For distributed computing environments, we consider the canonical machin...
research
10/29/2019

Adaptive Sampling Quasi-Newton Methods for Derivative-Free Stochastic Optimization

We consider stochastic zero-order optimization problems, which arise in ...
research
08/25/2020

Channel-Directed Gradients for Optimization of Convolutional Neural Networks

We introduce optimization methods for convolutional neural networks that...
research
04/02/2020

Using gradient directions to get global convergence of Newton-type methods

The renewed interest in Steepest Descent (SD) methods following the work...
research
02/16/2022

Using dual quaternions in robotics

We advocate for the use of dual quaternions to represent poses and twist...
research
10/26/2020

An Efficient Newton Method for Extreme Similarity Learning with Nonlinear Embeddings

We study the problem of learning similarity by using nonlinear embedding...

Please sign up or login with your details

Forgot password? Click here to reset