Deep Equals Shallow for ReLU Networks in Kernel Regimes

09/30/2020
by   Alberto Bietti, et al.
7

Deep networks are often considered to be more expressive than shallow ones in terms of approximation. Indeed, certain functions can be approximated by deep networks provably more efficiently than by shallow ones, however, no tractable algorithms are known for learning such deep models. Separately, a recent line of work has shown that deep networks trained with gradient descent may behave like (tractable) kernel methods in a certain over-parameterized regime, where the kernel is determined by the architecture and initialization, and this paper focuses on approximation for such kernels. We show that for ReLU activations, the kernels derived from deep fully-connected networks have essentially the same approximation properties as their shallow two-layer counterpart, namely the same eigenvalue decay for the corresponding integral operator. This highlights the limitations of the kernel framework for understanding the benefits of such deep architectures. Our main theoretical result relies on characterizing such eigenvalue decays through differentiability properties of the kernel function, which also easily applies to the study of other kernels defined on the sphere.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2019

On the Inductive Bias of Neural Tangent Kernels

State-of-the-art neural networks are heavily over-parameterized, making ...
research
06/20/2018

Como funciona o Deep Learning

Deep Learning methods are currently the state-of-the-art in many problem...
research
09/14/2023

How many Neurons do we need? A refined Analysis for Shallow Networks trained with Gradient Descent

We analyze the generalization properties of two-layer neural networks in...
research
06/06/2021

On the Power of Shallow Learning

A deluge of recent work has explored equivalences between wide neural ne...
research
10/15/2019

Neural tangent kernels, transportation mappings, and universal approximation

This paper establishes rates of universal approximation for the shallow ...
research
07/29/2021

Deep Networks Provably Classify Data on Curves

Data with low-dimensional nonlinear structure are ubiquitous in engineer...
research
03/08/2019

Is Deeper Better only when Shallow is Good?

Understanding the power of depth in feed-forward neural networks is an o...

Please sign up or login with your details

Forgot password? Click here to reset