Singular Value Perturbation and Deep Network Optimization

03/07/2022
by   Rudolf H. Riedi, et al.
3

We develop new theoretical results on matrix perturbation to shed light on the impact of architecture on the performance of a deep network. In particular, we explain analytically what deep learning practitioners have long observed empirically: the parameters of some deep architectures (e.g., residual networks, ResNets, and Dense networks, DenseNets) are easier to optimize than others (e.g., convolutional networks, ConvNets). Building on our earlier work connecting deep networks with continuous piecewise-affine splines, we develop an exact local linear representation of a deep network layer for a family of modern deep networks that includes ConvNets at one end of a spectrum and ResNets and DenseNets at the other. For regression tasks that optimize the squared-error loss, we show that the optimization loss surface of a modern deep network is piecewise quadratic in the parameters, with local shape governed by the singular values of a matrix that is a function of the local linear representation. We develop new perturbation results for how the singular values of matrices of this sort behave as we add a fraction of the identity and multiply by certain diagonal matrices. A direct application of our perturbation results explains analytically why a ResNet is easier to optimize than a ConvNet: thanks to its more stable singular values and smaller condition number, the local loss surface of a ResNet or DenseNet is less erratic, less eccentric, and features local minima that are more accommodating to gradient-based optimization. Our results also shed new light on the impact of different nonlinear activation functions on a deep network's singular values, regardless of its architecture.

READ FULL TEXT
research
11/11/2015

Piecewise Linear Activation Functions For More Efficient Deep Networks

This submission has been withdrawn by arXiv administrators because it is...
research
11/13/2017

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

It is well known that the initialization of weights in deep neural netwo...
research
10/29/2021

Takagi factorization of matrices depending on parameters and locating degeneracies of singular values

In this work we consider the Takagi factorization of a matrix valued fun...
research
08/19/2023

The extension of Weyl-type relative perturbation bounds

We extend several relative perturbation bounds to Hermitian matrices tha...
research
05/31/2019

Residual Networks as Nonlinear Systems: Stability Analysis using Linearization

We regard pre-trained residual networks (ResNets) as nonlinear systems a...
research
10/27/2020

Deep Networks from the Principle of Rate Reduction

This work attempts to interpret modern deep (convolutional) networks fro...
research
04/12/2023

Function Space and Critical Points of Linear Convolutional Networks

We study the geometry of linear networks with one-dimensional convolutio...

Please sign up or login with your details

Forgot password? Click here to reset