Minimax Lower Bounds for Ridge Combinations Including Neural Nets
Estimation of functions of d variables is considered using ridge combinations of the form ∑_k=1^m c_1,kϕ(∑_j=1^d c_0,j,kx_j-b_k) where the activation function ϕ is a function with bounded value and derivative. These include single-hidden layer neural networks, polynomials, and sinusoidal models. From a sample of size n of possibly noisy values at random sites X ∈ B = [-1,1]^d , the minimax mean square error is examined for functions in the closure of the ℓ_1 hull of ridge functions with activation ϕ . It is shown to be of order d/n to a fractional power (when d is of smaller order than n ), and to be of order ( d)/n to a fractional power (when d is of larger order than n ). Dependence on constraints v_0 and v_1 on the ℓ_1 norms of inner parameter c_0 and outer parameter c_1 , respectively, is also examined. Also, lower and upper bounds on the fractional power are given. The heart of the analysis is development of information-theoretic packing numbers for these classes of functions.
READ FULL TEXT