Optimal Approximation Rate of ReLU Networks in terms of Width and Depth
This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth. It is proved by construction that ReLU networks with width 𝒪(max{d⌊ N^1/d⌋, N+2}) and depth 𝒪(L) can approximate a Hölder continuous function on [0,1]^d with an approximation rate 𝒪(λ√(d) (N^2L^2ln N)^-α/d), where α∈ (0,1] and λ>0 are Hölder order and constant, respectively. Such a rate is optimal up to a constant in terms of width and depth separately, while existing results are only nearly optimal without the logarithmic factor in the approximation rate. More generally, for an arbitrary continuous function f on [0,1]^d, the approximation rate becomes 𝒪( √(d) ω_f( (N^2L^2ln N)^-1/d) ), where ω_f(·) is the modulus of continuity. We also extend our analysis to any continuous function f on a bounded set. Particularly, if ReLU networks with depth 31 and width 𝒪(N) are used to approximate one-dimensional Lipschitz continuous functions on [0,1] with a Lipschitz constant λ>0, the approximation rate in terms of the total number of parameters, W=𝒪(N^2), becomes 𝒪(λWln W), which has not been discovered in the literature for fixed-depth ReLU networks.
READ FULL TEXT