General-Purpose In-Context Learning by Meta-Learning Transformers

by   Louis Kirsch, et al.

Modern machine learning requires system designers to specify aspects of the learning pipeline, such as losses, architectures, and optimizers. Meta-learning, or learning-to-learn, instead aims to learn those aspects, and promises to unlock greater capabilities with less manual effort. One particularly ambitious goal of meta-learning is to train general-purpose in-context learning algorithms from scratch, using only black-box models with minimal inductive bias. Such a model takes in training data, and produces test-set predictions across a wide range of problems, without any explicit definition of an inference model, training loss, or optimization algorithm. In this paper we show that Transformers and other black-box models can be meta-trained to act as general-purpose in-context learners. We characterize phase transitions between algorithms that generalize, algorithms that memorize, and algorithms that fail to meta-train at all, induced by changes in model size, number of tasks, and meta-optimization. We further show that the capabilities of meta-trained algorithms are bottlenecked by the accessible state size (memory) determining the next prediction, unlike standard models which are thought to be bottlenecked by parameter count. Finally, we propose practical interventions such as biasing the training distribution that improve the meta-training and meta-generalization of general-purpose learning algorithms.


page 4

page 17

page 19


Learning to Learn from APIs: Black-Box Data-Free Meta-Learning

Data-free meta-learning (DFML) aims to enable efficient learning of new ...

Meta Learning Black-Box Population-Based Optimizers

The no free lunch theorem states that no model is better suited to every...

Local Nonparametric Meta-Learning

A central goal of meta-learning is to find a learning rule that enables ...

An Easy to Use Repository for Comparing and Improving Machine Learning Algorithm Usage

The results from most machine learning experiments are used for a specif...

From Learning to Meta-Learning: Reduced Training Overhead and Complexity for Communication Systems

Machine learning methods adapt the parameters of a model, constrained to...

An Introduction to Advanced Machine Learning : Meta Learning Algorithms, Applications and Promises

In [1, 2], we have explored the theoretical aspects of feature extractio...

Data-driven simulation for general purpose multibody dynamics using deep neural networks

In this paper, a machine learning-based simulation framework of general-...

Please sign up or login with your details

Forgot password? Click here to reset