Risk Bounds for High-dimensional Ridge Function Combinations Including Neural Networks
Let f^ be a function on R^d satisfying a spectral norm condition. For various noise settings, we show that Ef̂ - f^^2 ≤ v_f^( d/n)^1/4 , where n is the sample size and f̂ is either a penalized least squares estimator or a greedily obtained version of such using linear combinations of ramp, sinusoidal, sigmoidal or other bounded Lipschitz ridge functions. Our risk bound is effective even when the dimension d is much larger than the available sample size. For settings where the dimension is larger than the square root of the sample size this quantity is seen to improve the more familiar risk bound of v_f^(d (n/d)/n)^1/2 , also investigated here.
READ FULL TEXT