Deep Networks and the Multiple Manifold Problem

08/25/2020
by   Sam Buchanan, et al.
0

We study the multiple manifold problem, a binary classification task modeled on applications in machine vision, in which a deep fully-connected neural network is trained to separate two low-dimensional submanifolds of the unit sphere. We provide an analysis of the one-dimensional case, proving for a simple manifold configuration that when the network depth L is large relative to certain geometric and statistical properties of the data, the network width n grows as a sufficiently large polynomial in L, and the number of i.i.d. samples from the manifolds is polynomial in L, randomly-initialized gradient descent rapidly learns to classify the two manifolds perfectly with high probability. Our analysis demonstrates concrete benefits of depth and width in the context of a practically-motivated model problem: the depth acts as a fitting resource, with larger depths corresponding to smoother networks that can more readily separate the class manifolds, and the width acts as a statistical resource, enabling concentration of the randomly-initialized network and its gradients. The argument centers around the neural tangent kernel and its role in the nonasymptotic analysis of training overparameterized neural networks; to this literature, we contribute essentially optimal rates of concentration for the neural tangent kernel of deep fully-connected networks, requiring width n ≳ L poly(d_0) to achieve uniform concentration of the initial kernel over a d_0-dimensional submanifold of the unit sphere 𝕊^n_0-1, and a nonasymptotic framework for establishing generalization of networks trained in the NTK regime with structured data. The proof makes heavy use of martingale concentration to optimally treat statistical dependencies across layers of the initial random network. This approach should be of use in establishing similar results for other network architectures.

READ FULL TEXT
research
07/29/2021

Deep Networks Provably Classify Data on Curves

Data with low-dimensional nonlinear structure are ubiquitous in engineer...
research
09/09/2023

Approximation Results for Gradient Descent trained Neural Networks

The paper contains approximation guarantees for neural networks that are...
research
08/05/2022

On the non-universality of deep learning: quantifying the cost of symmetry

We prove computational limitations for learning with neural networks tra...
research
01/20/2020

Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective

It is known that any target function is realized in a sufficiently small...
research
06/25/2021

Connecting Sphere Manifolds Hierarchically for Regularization

This paper considers classification problems with hierarchically organiz...
research
06/06/2022

Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

We provide quantitative bounds measuring the L^2 difference in function ...
research
04/03/2022

Correlation Functions in Random Fully Connected Neural Networks at Finite Width

This article considers fully connected neural networks with Gaussian ran...

Please sign up or login with your details

Forgot password? Click here to reset