Why ResNet Works? Residuals Generalize

04/02/2019
by   Fengxiang He, et al.
0

Residual connections significantly boost the performance of deep neural networks. However, there are few theoretical results that address the influence of residuals on the hypothesis complexity and the generalization ability of deep neural networks. This paper studies the influence of residual connections on the hypothesis complexity of the neural network in terms of the covering number of its hypothesis space. We prove that the upper bound of the covering number is the same as chain-like neural networks, if the total numbers of the weight matrices and nonlinearities are fixed, no matter whether they are in the residuals or not. This result demonstrates that residual connections may not increase the hypothesis complexity of the neural network compared with the chain-like counterpart. Based on the upper bound of the covering number, we then obtain an O(1 / √(N)) margin-based multi-class generalization bound for ResNet, as an exemplary case of any deep neural network with residual connections. Generalization guarantees for similar state-of-the-art neural network architectures, such as DenseNet and ResNeXt, are straight-forward. From our generalization bound, a practical implementation is summarized: to approach a good generalization ability, we need to use regularization terms to control the magnitude of the norms of weight matrices not to increase too much, which justifies the standard technique of weight decay.

READ FULL TEXT
research
08/07/2020

Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations

Using weight decay to penalize the L2 norms of weights in neural network...
research
06/13/2018

On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond

Our paper proposes a generalization error bound for a general family of ...
research
09/21/2020

Kernel-Based Smoothness Analysis of Residual Networks

A major factor in the success of deep neural networks is the use of soph...
research
12/08/2020

A General Computational Framework to Measure the Expressiveness of Complex Networks Using a Tighter Upper Bound of Linear Regions

The expressiveness of deep neural network (DNN) is a perspective to unde...
research
02/22/2023

Considering Layerwise Importance in the Lottery Ticket Hypothesis

The Lottery Ticket Hypothesis (LTH) showed that by iteratively training ...
research
02/25/2018

Functional Gradient Boosting based on Residual Network Perception

Residual Networks (ResNets) have become state-of-the-art models in deep ...
research
03/13/2023

Bayes Complexity of Learners vs Overfitting

We introduce a new notion of complexity of functions and we show that it...

Please sign up or login with your details

Forgot password? Click here to reset