Online gradient-based mixtures for transfer modulation in meta-learning

by   Ghassen Jerfel, et al.

Learning-to-learn or meta-learning leverages data-driven inductive bias to increase the efficiency of learning on a novel task. This approach encounters difficulty when transfer is not mutually beneficial, for instance, when tasks are sufficiently dissimilar or change over time. Here, we use the connection between gradient-based meta-learning and hierarchical Bayes (Grant et al., 2018) to propose a mixture of hierarchical Bayesian models over the parameters of an arbitrary function approximator such as a neural network. Generalizing the model-agnostic meta-learning (MAML) algorithm (Finn et al., 2017), we present a stochastic expectation maximization procedure to jointly estimate parameter initializations for gradient descent as well as a latent assignment of tasks to initializations. This approach better captures the diversity of training tasks as opposed to consolidating inductive biases into a single set of hyperparameters. Our experiments demonstrate better generalization performance on the standard miniImageNet benchmark for 1-shot classification. We further derive a novel and scalable non-parametric variant of our method that captures the evolution of a task distribution over time as demonstrated on a set of few-shot regression tasks.



page 1

page 2

page 3

page 4


Recasting Gradient-Based Meta-Learning as Hierarchical Bayes

Meta-learning allows an intelligent agent to leverage prior learning epi...

Task-similarity Aware Meta-learning through Nonparametric Kernel Regression

Meta-learning refers to the process of abstracting a learning rule for a...

Meta-Learning for Few-Shot Land Cover Classification

The representations of the Earth's surface vary from one geographic regi...

Meta-learning for Few-shot Natural Language Processing: A Survey

Few-shot natural language processing (NLP) refers to NLP tasks that are ...

Local Nonparametric Meta-Learning

A central goal of meta-learning is to find a learning rule that enables ...

Distributed Evolution Strategies Using TPUs for Meta-Learning

Meta-learning traditionally relies on backpropagation through entire tas...

Meta Learning by the Baldwin Effect

The scope of the Baldwin effect was recently called into question by two...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.