Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

01/24/2019
by   Sanjeev Arora, et al.
16

Recent works have cast some light on the mystery of why deep nets fit any data and generalize despite being very overparametrized. This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17]. (ii) Generalization bound independent of network size, using a data-dependent complexity measure. Our measure distinguishes clearly between random labels and true labels on MNIST and CIFAR, as shown by experiments. Moreover, recent papers require sample complexity to increase (slowly) with the size, while our sample complexity is completely independent of the network size. (iii) Learnability of a broad class of smooth functions by 2-layer ReLU nets trained via gradient descent. The key idea is to track dynamics of training and generalization via properties of a related kernel.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/03/2023

On Size-Independent Sample Complexity of ReLU Networks

We study the sample complexity of learning ReLU neural networks from the...
research
07/02/2020

A Revision of Neural Tangent Kernel-based Approaches for Neural Networks

Recent theoretical works based on the neural tangent kernel (NTK) have s...
research
02/14/2018

Stronger generalization bounds for deep nets via a compression approach

Deep nets generalize well despite having more parameters than the number...
research
06/08/2022

Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials

A recent goal in the theory of deep learning is to identify how neural n...
research
05/13/2019

Spectral Analysis of Kernel and Neural Embeddings: Optimization and Generalization

We extend the recent results of (Arora et al., 2019) by a spectral analy...
research
06/14/2020

Global Convergence of Sobolev Training for Overparametrized Neural Networks

Sobolev loss is used when training a network to approximate the values a...
research
10/18/2019

Towards Quantifying Intrinsic Generalization of Deep ReLU Networks

Understanding the underlying mechanisms that enable the empirical succes...

Please sign up or login with your details

Forgot password? Click here to reset