Residual Tangent Kernels

01/28/2020
by   Etai Littwin, et al.
0

A recent body of work has focused on the theoretical study of neural networks at the regime of large width. Specifically, it was shown that training infinitely-wide and properly scaled vanilla ReLU networks using the L2 loss is equivalent to kernel regression using the Neural Tangent Kernel, which is independent of the initialization instance, and remains constant during training. In this work, we derive the form of the limiting kernel for architectures incorporating bypass connections, namely residual networks (ResNets), as well as to densely connected networks (DenseNets). In addition, we derive finite width corrections for both cases. Our analysis reveals that deep practical residual architectures might operate much closer to the “kernel” regime than their vanilla counterparts: while in networks that do not use skip connections, convergence to the limiting kernels requires one to fix depth while increasing the layers' width, in both ResNets and DenseNets, convergence to the limiting kernel may occur for infinite deep and wide networks, provided proper initialization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2019

Dynamics of Deep Neural Networks and Neural Tangent Hierarchy

The evolution of a deep neural network trained by the gradient descent c...
research
06/20/2023

Principles for Initialization and Architecture Selection in Graph Neural Networks with ReLU Activations

This article derives and validates three principles for initialization a...
research
09/13/2019

Finite Depth and Width Corrections to the Neural Tangent Kernel

We prove the precise scaling, at finite depth and width, for the mean an...
research
11/27/2022

A Kernel Perspective of Skip Connections in Convolutional Networks

Over-parameterized residual networks (ResNets) are amongst the most succ...
research
03/15/2022

Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers

Training very deep neural networks is still an extremely challenging tas...
research
04/07/2021

Spectral Analysis of the Neural Tangent Kernel for Deep Residual Networks

Deep residual network architectures have been shown to achieve superior ...
research
11/23/2020

Scaling Wide Residual Networks for Panoptic Segmentation

The Wide Residual Networks (Wide-ResNets), a shallow but wide model vari...

Please sign up or login with your details

Forgot password? Click here to reset