Best k-layer neural network approximations
We investigate the geometry of the empirical risk minimization problem for k-layer neural networks. We will provide examples showing that for the classical activation functions σ(x)= 1/(1 + (-x)) and σ(x)=(x), there exists a positive-measured subset of target functions that do not have best approximations by a fixed number of layers of neural networks. In addition, we study in detail the properties of shallow networks, classifying cases when a best k-layer neural network approximation always exists or does not exist for the ReLU activation σ=(0,x). We also determine the dimensions of shallow ReLU-activated networks.
READ FULL TEXT