Depth-Width Trade-offs for ReLU Networks via Sharkovsky's Theorem

by   Vaggos Chatziafratis, et al.

Understanding the representational power of Deep Neural Networks (DNNs) and how their structural properties (e.g., depth, width, type of activation unit) affect the functions they can compute, has been an important yet challenging question in deep learning and approximation theory. In a seminal paper, Telgarsky highlighted the benefits of depth by presenting a family of functions (based on simple triangular waves) for which DNNs achieve zero classification error, whereas shallow networks with fewer than exponentially many nodes incur constant error. Even though Telgarsky's work reveals the limitations of shallow neural networks, it does not inform us on why these functions are difficult to represent and in fact he states it as a tantalizing open question to characterize those functions that cannot be well-approximated by smaller depths. In this work, we point to a new connection between DNNs expressivity and Sharkovsky's Theorem from dynamical systems, that enables us to characterize the depth-width trade-offs of ReLU networks for representing functions based on the presence of generalized notion of fixed points, called periodic points (a fixed point is a point of period 1). Motivated by our observation that the triangle waves used in Telgarsky's work contain points of period 3 - a period that is special in that it implies chaotic behavior based on the celebrated result by Li-Yorke - we proceed to give general lower bounds for the width needed to represent periodic functions as a function of the depth. Technically, the crux of our approach is based on an eigenvalue analysis of the dynamical system associated with such functions.


page 1

page 2

page 3

page 4


Better Depth-Width Trade-offs for Neural Networks through the lens of Dynamical Systems

The expressivity of neural networks as a function of their depth, width ...

Depth-Width Trade-offs for Neural Networks via Topological Entropy

One of the central problems in the study of deep learning theory is to u...

Expressivity of Neural Networks via Chaotic Itineraries beyond Sharkovsky's Theorem

Given a target function f, how large must a neural network be in order t...

On the Optimal Expressive Power of ReLU DNNs and Its Application in Approximation with Kolmogorov Superposition Theorem

This paper is devoted to studying the optimal expressive power of ReLU d...

On the Spectral Bias of Deep Neural Networks

It is well known that over-parametrized deep neural networks (DNNs) are ...

A lattice-based approach to the expressivity of deep ReLU neural networks

We present new families of continuous piecewise linear (CPWL) functions ...

On the Expressive Power of Deep Neural Networks

We propose a new approach to the problem of neural network expressivity,...

Please sign up or login with your details

Forgot password? Click here to reset