Smaller generalization error derived for deep compared to shallow residual neural networks

10/05/2020
by   Aku Kammonen, et al.
0

Estimates of the generalization error are proved for a residual neural network with L random Fourier features layers z̅_ℓ+1=z̅_ℓ + Re∑_k=1^Kb̅_ℓ ke^ iω_ℓ kz̅_ℓ+ Re∑_k=1^Kc̅_ℓ ke^ iω'_ℓ k· x. An optimal distribution for the frequencies (ω_ℓ k,ω'_ℓ k) of the random Fourier features e^ iω_ℓ kz̅_ℓ and e^ iω'_ℓ k· x is derived. The derivation is based on the corresponding generalization error to approximate function values f(x). The generalization error turns out to be smaller than the estimate f̂^2_L^1(ℝ^d)/(LK) of the generalization error for random Fourier features with one hidden layer and the same total number of nodes LK, in the case the L^∞-norm of f is much less than the L^1-norm of its Fourier transform f̂. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network that shows promising results.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro