Poly-time universality and limitations of deep learning

01/07/2020
by   Emmanuel Abbe, et al.
0

The goal of this paper is to characterize function distributions that deep learning can or cannot learn in poly-time. A universality result is proved for SGD-based deep learning and a non-universality result is proved for GD-based deep learning; this also gives a separation between SGD-based deep learning and statistical query algorithms: (1) Deep learning with SGD is efficiently universal. Any function distribution that can be learned from samples in poly-time can also be learned by a poly-size neural net trained with SGD on a poly-time initialization with poly-steps, poly-rate and possibly poly-noise. Therefore deep learning provides a universal learning paradigm: it was known that the approximation and estimation errors could be controlled with poly-size neural nets, using ERM that is NP-hard; this new result shows that the optimization error can also be controlled with SGD in poly-time. The picture changes for GD with large enough batches: (2) Result (1) does not hold for GD: Neural nets of poly-size trained with GD (full gradients or large enough batches) on any initialization with poly-steps, poly-range and at least poly-noise cannot learn any function distribution that has super-polynomial cross-predictability, where the cross-predictability gives a measure of “average” function correlation – relations and distinctions to the statistical dimension are discussed. In particular, GD with these constraints can learn efficiently monomials of degree k if and only if k is constant. Thus (1) and (2) point to an interesting contrast: SGD is universal even with some poly-noise while full GD or SQ algorithms are not (e.g., parities).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2018

Provable limitations of deep learning

As the success of deep learning reaches more grounds, one would like to ...
research
02/27/2017

Depth Separation for Neural Networks

Let f:S^d-1×S^d-1→S be a function of the form f(x,x') = g(〈x,x'〉) for g:...
research
01/13/2020

Backward Feature Correction: How Deep Learning Performs Deep Learning

How does a 110-layer ResNet learn a high-complexity classifier using rel...
research
05/25/2023

Most Neural Networks Are Almost Learnable

We present a PTAS for learning random constant-depth networks. We show t...
research
10/02/2020

Improved Submodular Secretary Problem with Shortlists

First, for the for the submodular k-secretary problem with shortlists [1...
research
04/10/2023

Exponentially Improved Efficient Machine Learning for Quantum Many-body States with Provable Guarantees

Solving the ground state and the ground-state properties of quantum many...
research
08/02/2023

One Tree to Rule Them All: Poly-Logarithmic Universal Steiner Tree

A spanning tree T of graph G is a ρ-approximate universal Steiner tree (...

Please sign up or login with your details

Forgot password? Click here to reset