The staircase property: How hierarchical structure can guide deep learning

08/24/2021
by   Emmanuel Abbe, et al.
0

This paper identifies a structural property of data distributions that enables deep neural networks to learn hierarchically. We define the "staircase" property for functions over the Boolean hypercube, which posits that high-order Fourier coefficients are reachable from lower-order Fourier coefficients along increasing chains. We prove that functions satisfying this property can be learned in polynomial time using layerwise stochastic coordinate descent on regular neural networks – a class of network architectures and initializations that have homogeneity properties. Our analysis shows that for such staircase functions and neural networks, the gradient-based algorithm learns high-level features by greedily combining lower-level features along the depth of the network. We further back our theoretical results with experiments showing that staircase functions are also learnable by more standard ResNet architectures with stochastic gradient descent. Both the theoretical and experimental results support the fact that staircase properties have a role to play in understanding the capabilities of gradient-based learning on regular networks, in contrast to general polynomial-size networks that can emulate any SQ or PAC algorithms as recently shown.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2019

Neural ODEs as the Deep Limit of ResNets with constant weights

In this paper we prove that, in the deep limit, the stochastic gradient ...
research
09/05/2016

Distribution-Specific Hardness of Learning Neural Networks

Although neural networks are routinely and successfully trained in pract...
research
09/18/2017

When is a Convolutional Filter Easy To Learn?

We analyze the convergence of (stochastic) gradient descent algorithm fo...
research
02/27/2017

SGD Learns the Conjugate Kernel Class of the Network

We show that the standard stochastic gradient decent (SGD) algorithm is ...
research
10/25/2019

Learning Boolean Circuits with Neural Networks

Training neural-networks is computationally hard. However, in practice t...
research
08/19/2019

On Regularization Properties of Artificial Datasets for Deep Learning

The paper discusses regularization properties of artificial data for dee...
research
07/17/2018

Are Efficient Deep Representations Learnable?

Many theories of deep learning have shown that a deep network can requir...

Please sign up or login with your details

Forgot password? Click here to reset