
How do infinite width bounded norm networks look in function space?
We consider the question of what functions can be captured by ReLU netwo...
read it

Approximating Continuous Functions by ReLU Nets of Minimal Width
This article concerns the expressive power of depth in deep feedforward...
read it

The Benefits of Overparameterization at Initialization in Deep ReLU Networks
It has been noted in existing literature that overparameterization in R...
read it

Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations
This article concerns the expressive power of depth in neural nets with ...
read it

A function space analysis of finite neural networks with insights from sampling theory
This work suggests using sampling theory to analyze the function space r...
read it

Inductive Bias of MultiChannel Linear Convolutional Networks with Bounded Weight Norm
We study the function space characterization of the inductive bias resul...
read it

Fundamental tradeoffs between memorization and robustness in random features and neural tangent regimes
This work studies the (non)robustness of twolayer neural networks in va...
read it
A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case
A key element of understanding the efficacy of overparameterized neural networks is characterizing how they represent functions as the number of weights in the network approaches infinity. In this paper, we characterize the norm required to realize a function f:R^d→R as a single hiddenlayer ReLU network with an unbounded number of units (infinite width), but where the Euclidean norm of the weights is bounded, including precisely characterizing which functions can be realized with finite norm. This was settled for univariate univariate functions in Savarese et al. (2019), where it was shown that the required norm is determined by the L1norm of the second derivative of the function. We extend the characterization to multivariate functions (i.e., networks with d input units), relating the required norm to the L1norm of the Radon transform of a (d+1)/2power Laplacian of the function. This characterization allows us to show that all functions in Sobolev spaces W^s,1(R), s≥ d+1, can be represented with bounded norm, to calculate the required norm for several specific functions, and to obtain a depth separation result. These results have important implications for understanding generalization performance and the distinction between neural networks and more traditional kernel learning.
READ FULL TEXT
Comments
There are no comments yet.