Smaller generalization error derived for deep compared to shallow residual neural networks

10/05/2020 ∙ by Aku Kammonen, et al. ∙ 0

Estimates of the generalization error are proved for a residual neural network with L random Fourier features layers z̅_ℓ+1=z̅_ℓ + Re∑_k=1^Kb̅_ℓ ke^ iω_ℓ kz̅_ℓ+ Re∑_k=1^Kc̅_ℓ ke^ iω'_ℓ k· x. An optimal distribution for the frequencies (ω_ℓ k,ω'_ℓ k) of the random Fourier features e^ iω_ℓ kz̅_ℓ and e^ iω'_ℓ k· x is derived. The derivation is based on the corresponding generalization error to approximate function values f(x). The generalization error turns out to be smaller than the estimate f̂^2_L^1(ℝ^d)/(LK) of the generalization error for random Fourier features with one hidden layer and the same total number of nodes LK, in the case the L^∞-norm of f is much less than the L^1-norm of its Fourier transform f̂. This understanding of an optimal distribution for random features is used to construct a new training method for a deep residual network that shows promising results.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.