Universal Approximation with Deep Narrow Networks
The classical Universal Approximation Theorem certifies that the universal approximation property holds for the class of neural networks of arbitrary width. Here we consider the natural `dual' theorem for width-bounded networks of arbitrary depth. Precisely, let n be the number of inputs neurons, m be the number of output neurons, and let ρ be any nonaffine continuous function, with a continuous nonzero derivative at some point. Then we show that the class of neural networks of arbitrary depth, width n + m + 2, and activation function ρ, exhibits the universal approximation property with respect to the uniform norm on compact subsets of R^n. This covers every activation function possible to use in practice; in particular this includes polynomial activation functions, making this genuinely different to the classical case. We go on to establish some natural extensions of this result. Firstly, we show an analogous result for a certain class of nowhere differentiable activation functions. Secondly, we establish an analogous result for noncompact domains, by showing that deep narrow networks with the ReLU activation function exhibit the universal approximation property with respect to the p-norm on R^n. Finally, we show that width of only n + m + 1 suffices for `most' activation functions (whilst it is known that width of n + m - 1 does not suffice in general).
READ FULL TEXT