Neural Network Approximation: Three Hidden Layers Are Enough
A three-hidden-layer neural network with super approximation power is introduced. This network is built with the Floor function (⌊ x⌋), the exponential function (2^x), the step function (_x≥ 0), or their compositions as activation functions in each neuron and hence we call such networks as Floor-Exponential-Step (FLES) networks. For any width hyper-parameter N∈ℕ^+, it is shown that FLES networks with a width max{d, N} and three hidden layers can uniformly approximate a Hölder function f on [0,1]^d with an exponential approximation rate 3λ d^α/22^-α N, where α∈(0,1] and λ are the Hölder order and constant, respectively. More generally for an arbitrary continuous function f on [0,1]^d with a modulus of continuity ω_f(·), the constructive approximation rate is ω_f(√(d) 2^-N)+2ω_f(√(d))2^-N. As a consequence, this new class of networks overcomes the curse of dimensionality in approximation power when the variation of ω_f(r) as r→ 0 is moderate (e.g., ω_f(r)≲ r^α for Hölder continuous functions), since the major term to be concerned in our approximation rate is essentially √(d) times a function of N independent of d within the modulus of continuity.
READ FULL TEXT