Is Deeper Better only when Shallow is Good?

03/08/2019
by   Eran Malach, et al.
0

Understanding the power of depth in feed-forward neural networks is an ongoing challenge in the field of deep learning theory. While current works account for the importance of depth for the expressive power of neural-networks, it remains an open question whether these benefits are exploited during a gradient-based optimization process. In this work we explore the relation between expressivity properties of deep networks and the ability to train them efficiently using gradient-based algorithms. We give a depth separation argument for distributions with fractal structure, showing that they can be expressed efficiently by deep networks, but not with shallow ones. These distributions have a natural coarse-to-fine structure, and we show that the balance between the coarse and fine details has a crucial effect on whether the optimization process is likely to succeed. We prove that when the distribution is concentrated on the fine details, gradient-based algorithms are likely to fail. Using this result we prove that, at least in some distributions, the success of learning deep networks depends on whether the distribution can be well approximated by shallower networks, and we conjecture that this property holds in general.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2021

The Connection Between Approximation, Depth Separation and Learnability in Neural Networks

Several recent works have shown separation results between deep neural n...
research
08/29/2016

Why does deep and cheap learning work so well?

We show how the success of deep learning could depend not only on mathem...
research
05/03/2015

Highway Networks

There is plenty of theoretical and empirical evidence that depth of neur...
research
03/23/2017

Failures of Gradient-Based Deep Learning

In recent years, Deep Learning has become the go-to solution for a broad...
research
09/30/2020

Deep Equals Shallow for ReLU Networks in Kernel Regimes

Deep networks are often considered to be more expressive than shallow on...
research
10/25/2019

Learning Boolean Circuits with Neural Networks

Training neural-networks is computationally hard. However, in practice t...
research
11/21/2015

GradNets: Dynamic Interpolation Between Neural Architectures

In machine learning, there is a fundamental trade-off between ease of op...

Please sign up or login with your details

Forgot password? Click here to reset