Locality defeats the curse of dimensionality in convolutional teacher-student scenarios

by   Alessandro Favero, et al.

Convolutional neural networks perform a local and translationally-invariant treatment of the data: quantifying which of these two aspects is central to their success remains a challenge. We study this problem within a teacher-student framework for kernel regression, using `convolutional' kernels inspired by the neural tangent kernel of simple convolutional architectures of given filter size. Using heuristic methods from physics, we find in the ridgeless case that locality is key in determining the learning curve exponent β (that relates the test error ϵ_t∼ P^-β to the size of the training set P), whereas translational invariance is not. In particular, if the filter size of the teacher t is smaller than that of the student s, β is a function of s only and does not depend on the input dimension. We confirm our predictions on β empirically. Theoretically, in some cases (including when teacher and student are equal) it can be shown that this prediction is an upper bound on performance. We conclude by proving, using a natural universality assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.


page 1

page 2

page 3

page 4


Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm

How many training data are needed to learn a supervised task? It is ofte...

Learning performance in inverse Ising problems with sparse teacher couplings

We investigate the learning performance of the pseudolikelihood maximiza...

How isotropic kernels learn simple invariants

We investigate how the training curve of isotropic kernel methods depend...

Failure and success of the spectral bias prediction for Kernel Ridge Regression: the case of low-dimensional data

Recently, several theories including the replica method made predictions...

How Wide Convolutional Neural Networks Learn Hierarchical Tasks

Despite their success, understanding how convolutional neural networks (...

Relative stability toward diffeomorphisms in deep nets indicates performance

Understanding why deep nets can classify data in large dimensions remain...

Capturing the learning curves of generic features maps for realistic data sets with a teacher-student model

Teacher-student models provide a powerful framework in which the typical...

Please sign up or login with your details

Forgot password? Click here to reset