Provable limitations of deep learning

12/16/2018
by   Emmanuel Abbe, et al.
0

As the success of deep learning reaches more grounds, one would like to also envision the potential limits of deep learning. This paper gives a first set of results proving that deep learning algorithms fail at learning certain efficiently learnable functions. Parity functions form the running example of our results and the paper puts forward a notion of low cross-predictability that defines a more general class of functions for which such failures tend to generalize (with examples in community detection and arithmetic learning). Recall that it is known that the class of neural networks (NNs) with polynomial network size can express any function that can be implemented in polynomial time, and that their sample complexity scales polynomially with the network size. The challenge is with the optimization error (the ERM is NP-hard), and the success behind deep learning is to train deep NNs with descent algorithms. The failures shown in this paper apply to training poly-size NNs on function distributions of low cross-predictability with a descent algorithm that is either run with limited memory per sample or that is initialized and run with enough randomness (exponentially small for GD). We further claim that such types of constraints are necessary to obtain failures, in that exact SGD with careful non-random initialization can learn parities. The cross-predictability notion has some similarity with the statistical dimension used in statistical query (SQ) algorithms, however the two definitions are different for reasons explained in the paper. The proof techniques are based on exhibiting algorithmic constraints that imply a statistical indistinguishability between the algorithm's output on the test model v.s. a null model, using information measures to bound the total variation distance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/07/2020

Poly-time universality and limitations of deep learning

The goal of this paper is to characterize function distributions that de...
research
11/22/2021

Private and polynomial time algorithms for learning Gaussians and beyond

We present a fairly general framework for reducing (ε, δ) differentially...
research
07/17/2018

Are Efficient Deep Representations Learnable?

Many theories of deep learning have shown that a deep network can requir...
research
01/13/2020

Backward Feature Correction: How Deep Learning Performs Deep Learning

How does a 110-layer ResNet learn a high-complexity classifier using rel...
research
09/17/2023

Provable learning of quantum states with graphical models

The complete learning of an n-qubit quantum state requires samples expon...
research
06/01/2019

Graph-based Discriminators: Sample Complexity and Expressiveness

A basic question in learning theory is to identify if two distributions ...

Please sign up or login with your details

Forgot password? Click here to reset