Are ResNets Provably Better than Linear Predictors?

04/18/2018
by   Ohad Shamir, et al.
0

A residual network (or ResNet) is a standard deep neural net architecture, with state-of-the-art performance across numerous applications. The main premise of ResNets is that they allow the training of each layer to focus on fitting just the residual of the previous layer's output and the target output. Thus, we should expect that the trained network is no worse than what we can obtain if we remove the residual layers and train a shallower network instead. However, due to the non-convexity of the optimization problem, it is not at all clear that ResNets indeed achieve this behavior, rather than getting stuck at some arbitrarily poor local minimum. In this paper, we rigorously prove that arbitrarily deep, nonlinear ResNets indeed exhibit this behavior, in the sense that the optimization landscape contains no local minima with value above what can be obtained with a linear predictor (namely a 1-layer network). Notably, we show this under minimal or no assumptions on the precise network architecture, data distribution, or loss function used. We also provide a quantitative analysis of second-order stationary points for this problem, and show that with a certain tweak to the architecture, training the network with standard stochastic gradient descent achieves an objective value no worse than any fixed linear predictor.

READ FULL TEXT
research
07/09/2019

Are deep ResNets provably better than linear predictors?

Recently, a residual network (ResNet) with a single residual block has b...
research
11/09/2016

Diverse Neural Network Learns True Target Functions

Neural networks are a powerful class of functions that can be trained wi...
research
03/17/2019

Training Over-parameterized Deep ResNet Is almost as Easy as Training a Two-layer Network

It has been proved that gradient descent converges linearly to the globa...
research
02/25/2020

Exploring Learning Dynamics of DNNs via Layerwise Conditioning Analysis

Conditioning analysis uncovers the landscape of optimization objective b...
research
11/10/2021

ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees

Models recently used in the literature proving residual networks (ResNet...
research
09/10/2019

Towards Understanding the Importance of Shortcut Connections in Residual Networks

Residual Network (ResNet) is undoubtedly a milestone in deep learning. R...
research
09/16/2020

Landscape of Sparse Linear Network: A Brief Investigation

Network pruning, or sparse network has a long history and practical sign...

Please sign up or login with your details

Forgot password? Click here to reset