Gradient Descent on Neurons and its Link to Approximate Second-Order Optimization

01/28/2022
by   Frederik Benzing, et al.
0

Second-order optimizers are thought to hold the potential to speed up neural network training, but due to the enormous size of the curvature matrix, they typically require approximations to be computationally tractable. The most successful family of approximations are Kronecker-Factored, block-diagonal curvature estimates (KFAC). Here, we combine tools from prior work to evaluate exact second-order updates with careful ablations to establish a surprising result: Due to its approximations, KFAC is not closely related to second-order updates, and in particular, it significantly outperforms true second-order updates. This challenges widely held believes and immediately raises the question why KFAC performs so well. We answer this question by showing that KFAC approximates a first-order algorithm, which performs gradient descent on neurons rather than weights. Finally, we show that this optimizer often improves over KFAC in terms of computational cost and data-efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2016

A Kronecker-factored approximate Fisher matrix for convolution layers

Second-order optimization methods such as natural gradient descent have ...
research
05/24/2021

2nd-order Updates with 1st-order Complexity

It has long been a goal to efficiently compute and use second order info...
research
01/24/2019

Curvature-Exploiting Acceleration of Elastic Net Computations

This paper introduces an efficient second-order method for solving the e...
research
11/21/2020

A Trace-restricted Kronecker-Factored Approximation to Natural Gradient

Second-order optimization methods have the ability to accelerate converg...
research
10/11/2022

Component-Wise Natural Gradient Descent – An Efficient Neural Network Optimization

Natural Gradient Descent (NGD) is a second-order neural network training...
research
01/01/2021

An iterative K-FAC algorithm for Deep Learning

Kronecker-factored Approximate Curvature (K-FAC) method is a high effici...
research
06/02/2023

MKOR: Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates

This work proposes a Momentum-Enabled Kronecker-Factor-Based Optimizer U...

Please sign up or login with your details

Forgot password? Click here to reset