How isotropic kernels learn simple invariants

06/17/2020
by   Jonas Paccolat, et al.
0

We investigate how the training curve of isotropic kernel methods depends on the symmetry of the task to be learned, in several settings. (i) We consider a regression task, where the target function is a Gaussian random field that depends only on d_∥ variables, fewer than the input dimension d. We compute the expected test error ϵ that follows ϵ∼ p^-β where p is the size of the training set. We find that β∼1/d independently of d_∥, supporting previous findings that the presence of invariants does not resolve the curse of dimensionality for kernel regression. (ii) Next we consider support-vector binary classification and introduce the stripe model where the data label depends on a single coordinate y( x) = y(x_1), corresponding to parallel decision boundaries separating labels of different signs, and consider that there is no margin at these interfaces. We argue and confirm numerically that for large bandwidth, β = d-1+ξ/3d-3+ξ, where ξ∈ (0,2) is the exponent characterizing the singularity of the kernel at the origin. This estimation improves classical bounds obtainable from Rademacher complexity. In this setting there is no curse of dimensionality since β→1/3 as d→∞. (iii) We confirm these findings for the spherical model for which y( x) = y(|| x||). (iv) In the stripe model, we show that if the data are compressed along their invariants by some factor λ (an operation believed to take place in deep networks), the test error is reduced by a factor λ^-2(d-1)/3d-3+ξ.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset