Capturing the learning curves of generic features maps for realistic data sets with a teacher-student model

02/16/2021
by   Bruno Loureiro, et al.
6

Teacher-student models provide a powerful framework in which the typical case performance of high-dimensional supervised learning tasks can be studied in closed form. In this setting, labels are assigned to data - often taken to be Gaussian i.i.d. - by a teacher model, and the goal is to characterise the typical performance of the student model in recovering the parameters that generated the labels. In this manuscript we discuss a generalisation of this setting where the teacher and student can act on different spaces, generated with fixed, but generic feature maps. This is achieved via the rigorous study of a high-dimensional Gaussian covariate model. Our contribution is two-fold: First, we prove a rigorous formula for the asymptotic training loss and generalisation error achieved by empirical risk minimization for this model. Second, we present a number of situations where the learning curve of the model captures the one of a realistic data set learned with kernel regression and classification, with out-of-the-box feature maps such as random projections or scattering transforms, or with pre-learned ones - such as the features learned by training multi-layer neural networks. We discuss both the power and the limitations of the Gaussian teacher-student framework as a typical case analysis capturing learning curves as encountered in practice on real data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2022

Learning curves for the multi-class teacher-student perceptron

One of the most classical results in high-dimensional learning theory pr...
research
03/23/2023

A Simple and Generic Framework for Feature Distillation via Channel-wise Transformation

Knowledge distillation is a popular technique for transferring the knowl...
research
05/26/2019

Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm

How many training data are needed to learn a supervised task? It is ofte...
research
09/06/2023

Knowledge Distillation Layer that Lets the Student Decide

Typical technique in knowledge distillation (KD) is regularizing the lea...
research
06/16/2021

Locality defeats the curse of dimensionality in convolutional teacher-student scenarios

Convolutional neural networks perform a local and translationally-invari...
research
11/22/2021

Adaptive Transfer Learning: a simple but effective transfer learning

Transfer learning (TL) leverages previously obtained knowledge to learn ...
research
05/26/2022

Gaussian Universality of Linear Classifiers with Random Labels in High-Dimension

While classical in many theoretical settings, the assumption of Gaussian...

Please sign up or login with your details

Forgot password? Click here to reset