Analytic Insights into Structure and Rank of Neural Network Hessian Maps

06/30/2021
by   Sidak Pal Singh, et al.
0

The Hessian of a neural network captures parameter interactions through second-order derivatives of the loss. It is a fundamental object of study, closely tied to various problems in deep learning, including model design, optimization, and generalization. Most prior work has been empirical, typically focusing on low-rank approximations and heuristics that are blind to the network structure. In contrast, we develop theoretical tools to analyze the range of the Hessian map, providing us with a precise understanding of its rank deficiency as well as the structural reasons behind it. This yields exact formulas and tight upper bounds for the Hessian rank of deep linear networks, allowing for an elegant interpretation in terms of rank deficiency. Moreover, we demonstrate that our bounds remain faithful as an estimate of the numerical Hessian rank, for a larger class of models such as rectified and hyperbolic tangent networks. Further, we also investigate the implications of model architecture (e.g. width, depth, bias) on the rank deficiency. Overall, our work provides novel insights into the source and extent of redundancy in overparameterized networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2020

Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks

Hessian captures important properties of the deep neural network loss la...
research
05/16/2023

The Hessian perspective into the Nature of Convolutional Neural Networks

While Convolutional Neural Networks (CNNs) have long been investigated a...
research
02/01/2019

Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation

Current methods to interpret deep learning models by generating saliency...
research
02/07/2020

Low Rank Saddle Free Newton: Algorithm and Analysis

Many tasks in engineering fields and machine learning involve minimizing...
research
03/17/2021

Hessian Chain Bracketing

Second derivatives of mathematical models for real-world phenomena are f...
research
01/19/2023

On backpropagating Hessians through ODEs

We discuss the problem of numerically backpropagating Hessians through o...
research
06/04/2021

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) a...

Please sign up or login with your details

Forgot password? Click here to reset