-
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
We propose an algorithm for meta-learning that is model-agnostic, in the...
read it
-
Provable Meta-Learning of Linear Representations
Meta-learning, or learning-to-learn, seeks to design algorithms that can...
read it
-
Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm
Learning to learn is a powerful paradigm for enabling models to learn fr...
read it
-
Online gradient-based mixtures for transfer modulation in meta-learning
Learning-to-learn or meta-learning leverages data-driven inductive bias ...
read it
-
Towards Understanding Generalization in Gradient-Based Meta-Learning
In this work we study generalization of neural networks in gradient-base...
read it
-
Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes
We consider stochastic gradient descent (SGD) for least-squares regressi...
read it
-
Meta-learning with differentiable closed-form solvers
Adapting deep networks to new concepts from few examples is extremely ch...
read it
Why Does MAML Outperform ERM? An Optimization Perspective
Model-Agnostic Meta-Learning (MAML) has demonstrated widespread success in training models that can quickly adapt to new tasks via one or few stochastic gradient descent steps. However, the MAML objective is significantly more difficult to optimize compared to standard Empirical Risk Minimization (ERM), and little is understood about how much MAML improves over ERM in terms of the fast adaptability of their solutions in various scenarios. We analytically address this issue in a linear regression setting consisting of a mixture of easy and hard tasks, where hardness is determined by the number of gradient steps required to solve the task. Specifically, we prove that for Ω(d_eff) labelled test samples (for gradient-based fine-tuning) where d_eff is the effective dimension of the problem, in order for MAML to achieve substantial gain over ERM, the optimal solutions of the hard tasks must be closely packed together with the center far from the center of the easy task optimal solutions. We show that these insights also apply in a low-dimensional feature space when both MAML and ERM learn a representation of the tasks, which reduces the effective problem dimension. Further, our few-shot image classification experiments suggest that our results generalize beyond linear regression.
READ FULL TEXT
Comments
There are no comments yet.