Is Pre-training Truly Better Than Meta-Learning?

06/24/2023
by   Brando Miranda, et al.
0

In the context of few-shot learning, it is currently believed that a fixed pre-trained (PT) model, along with fine-tuning the final layer during evaluation, outperforms standard meta-learning algorithms. We re-evaluate these claims under an in-depth empirical examination of an extensive set of formally diverse datasets and compare PT to Model Agnostic Meta-Learning (MAML). Unlike previous work, we emphasize a fair comparison by using: the same architecture, the same optimizer, and all models trained to convergence. Crucially, we use a more rigorous statistical tool – the effect size (Cohen's d) – to determine the practical significance of the difference between a model trained with PT vs. a MAML. We then use a previously proposed metric – the diversity coefficient – to compute the average formal diversity of a dataset. Using this analysis, we demonstrate the following: 1. when the formal diversity of a data set is low, PT beats MAML on average and 2. when the formal diversity is high, MAML beats PT on average. The caveat is that the magnitude of the average difference between a PT vs. MAML using the effect size is low (according to classical statistical thresholds) – less than 0.2. Nevertheless, this observation is contrary to the currently held belief that a pre-trained model is always better than a meta-learning model. Our extensive experiments consider 21 few-shot learning benchmarks, including the large-scale few-shot learning dataset Meta-Data set. We also show no significant difference between a MAML model vs. a PT model with GPT-2 on Openwebtext. We, therefore, conclude that a pre-trained model does not always beat a meta-learned model and that the formal diversity of a dataset is a driving factor.

READ FULL TEXT
research
08/02/2022

The Curse of Low Task Diversity: On the Failure of Transfer Learning to Outperform MAML and Their Empirical Equivalence

Recently, it has been observed that a transfer learning solution might b...
research
06/24/2023

Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data

Current trends to pre-train capable Large Language Models (LLMs) mostly ...
research
12/24/2021

The Curse of Zero Task Diversity: On the Failure of Transfer Learning to Outperform MAML and their Empirical Equivalence

Recently, it has been observed that a transfer learning solution might b...
research
01/27/2022

The Effect of Diversity in Meta-Learning

Few-shot learning aims to learn representations that can tackle novel ta...
research
12/24/2021

Does MAML Only Work via Feature Re-use? A Data Centric Perspective

Recent work has suggested that a good embedding is all we need to solve ...
research
03/02/2023

A Meta-Learning Approach to Predicting Performance and Data Requirements

We propose an approach to estimate the number of samples required for a ...
research
04/28/2020

Meta-Learning for Few-Shot Land Cover Classification

The representations of the Earth's surface vary from one geographic regi...

Please sign up or login with your details

Forgot password? Click here to reset