On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics

07/30/2020
by   Weinan E, et al.
11

We develop Banach spaces for ReLU neural networks of finite depth L and infinite width. The spaces contain all finite fully connected L-layer networks and their L^2-limiting objects under bounds on the natural path-norm. Under this norm, the unit ball in the space for L-layer networks has low Rademacher complexity and thus favorable generalization properties. Functions in these spaces can be approximated by multi-layer neural networks with dimension-independent convergence rates. The key to this work is a new way of representing functions in some form of expectations, motivated by multi-layer neural networks. This representation allows us to define a new class of continuous models for machine learning. We show that the gradient flow defined this way is the natural continuous analog of the gradient descent dynamics for the associated multi-layer neural networks. We show that the path-norm increases at most polynomially under this continuous gradient flow dynamics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2023

Approximation Results for Gradient Descent trained Neural Networks

The paper contains approximation guarantees for neural networks that are...
research
06/10/2020

Representation formulas and pointwise properties for Barron functions

We study the natural function space for infinitely wide two-layer neural...
research
04/15/2020

A function space analysis of finite neural networks with insights from sampling theory

This work suggests using sampling theory to analyze the function space r...
research
11/20/2020

A global universality of two-layer neural networks with ReLU activations

In the present study, we investigate a universality of neural networks, ...
research
02/28/2023

Learning time-scales in two-layers neural networks

Gradient-based learning in multi-layer neural networks displays a number...
research
11/09/2018

Deep Learning Super-Diffusion in Multiplex Networks

Complex network theory has shown success in understanding the emergent a...
research
12/07/2017

CNNs are Globally Optimal Given Multi-Layer Support

Stochastic Gradient Descent (SGD) is the central workhorse for training ...

Please sign up or login with your details

Forgot password? Click here to reset