Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers

10/13/2021
by   Gabriele Prato, et al.
Montréal Institute of Learning Algorithms
10

Empirical science of neural scaling laws is a rapidly growing area of significant importance to the future of machine learning, particularly in the light of recent breakthroughs achieved by large-scale pre-trained models such as GPT-3, CLIP and DALL-e. Accurately predicting the neural network performance with increasing resources such as data, compute and model size provides a more comprehensive evaluation of different approaches across multiple scales, as opposed to traditional point-wise comparisons of fixed-size models on fixed-size benchmarks, and, most importantly, allows for focus on the best-scaling, and thus most promising in the future, approaches. In this work, we consider a challenging problem of few-shot learning in image classification, especially when the target data distribution in the few-shot phase is different from the source, training, data distribution, in a sense that it includes new image classes not encountered during training. Our current main goal is to investigate how the amount of pre-training data affects the few-shot generalization performance of standard image classifiers. Our key observations are that (1) such performance improvements are well-approximated by power laws (linear log-log plots) as the training set size increases, (2) this applies to both cases of target data coming from either the same or from a different domain (i.e., new classes) as the training data, and (3) few-shot performance on new classes converges at a faster rate than the standard classification performance on previously seen classes. Our findings shed new light on the relationship between scale and generalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

12/14/2022

Reproducible scaling laws for contrastive language-image learning

Scaling up neural networks has led to remarkable performance across a wi...
05/31/2021

Effect of large-scale pre-training on full and few-shot transfer learning for natural and medical images

Transfer learning aims to exploit pre-trained models for more efficient ...
04/04/2023

Strong Baselines for Parameter Efficient Few-Shot Fine-tuning

Few-shot classification (FSC) entails learning novel classes given only ...
09/05/2023

A study on the impact of pre-trained model on Just-In-Time defect prediction

Previous researchers conducting Just-In-Time (JIT) defect prediction tas...
02/02/2021

Scaling Laws for Transfer

We study empirical scaling laws for transfer learning between distributi...
10/12/2022

How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization

Despite the clear performance benefits of data augmentations, little is ...
09/15/2023

Scaling Laws for Sparsely-Connected Foundation Models

We explore the impact of parameter sparsity on the scaling behavior of T...

Please sign up or login with your details

Forgot password? Click here to reset