Current deep learning algorithms require a vary large amount of data to learn decent task-specific models, and acquiring enough labeled data is often expensive and laborious. Moreover, in many mission critical applications, such autonomous vehicles and drones, an agent needs to adapt rapidly to unseen environments. Humans are able to learn new skills and concepts rapidly by leveraging knowledge learned earlier; therefore, we aim to enable the artificial agents to do the same. Transfer learning transfers the knowledge obtained from one domain with a large amount of labeled data to other domains with less labeled data[Pan et al.2010]
. It achieves this by copying the initial feature extraction layers, and fine-tuning the resulting model on the target task[Yosinski et al.2014]. However, this method is still data hungry because gradient-based optimization algorithms need many iterations over numerous examples to adapt the models for new tasks [Ravi et al.2016]
. On the other hand, meta-learning is a class of machine learning algorithms concerned with the ability of learning process itself. Introduced by[Schmidhuber1987], meta-learning aims to train the model in task space rather than instance space. While transfer learning methods train a base model to use as a transfer source by optimizing a single monolithic task, meta-learning algorithms learn their base models by sampling many different smaller tasks from a large data source. As a result, one might expect that the meta-learned model is capable of generalizing well to new unseen tasks because of task-agnostic way of training.
Shortcomings of meta-learning algorithms. As we show in the experiments below, models trained using meta-learning perform worse than transfer learning in the following two scenarios: 1. When there are many training examples available for each class in the target task (here we would like the artificial agent to continue improving its model performance as more data becomes available); and 2. When there are many different classes in the target task.
The main contribution of this paper is a joint “meta-transfer” learning method that performs well for target tasks of both few and many shots/classes. Our method performs better than both transfer- and meta-learning baselines on all target task sizes we evaluate, and outperforms state-of-the-art few-shot learning algorithms when the target task has a large number of classes and shots.
Meta-Transfer Learning (MTL)
In order to overcome the two issues mentioned earlier, we propose a new training algorithm, which inherits advantages of both meta-learning and transfer learning. This joint training method employs two loss functions: 1) task-specific (transfer learning) 2) task-agnostic (meta-learning). The task-specific loss,, is defined over the entire base model’s training dataset. The task-agnostic loss,
, on the other hand, is a meta-learning loss defined over a distribution of tasks (e.g. 5-ways classification tasks). Two gradient updates are computed independently from these two loss functions, and the model is updated using the weighted average of these two update vectors (see Algorithm 1). The tasks in meta-learning are sampled from a distribution
, while all instances in the sampled tasks are used for the task-specific optimization. For adaptation to a new unseen task, regular stochastic gradient descent will be used. For the meta-learner, we evaluate our method using both Model Agnostic Meta-learning (MAML[Finn et al.2017]) and its first order variant, Reptile [Nichol et al.2018]. The reason that we use this class of meta-learning algorithms is that as opposed to Matching Networks [Vinyals et al.2016] and its variant [Snell et al.2017], they are model agnostic, and can be directly applied to any model which is trained with a gradient descent procedure.
The proposed model is evaluated on miniImageNet [Vinyals et al.2016] dataset, split into 64 training classes and 36 test classes as unseen tasks. The architecture of the model is shown in Figure 1 and the results are demonstrated in Table 1. The base model in the transfer learning baseline is trained on all 64 training classes of miniImageNet.
Note that for many-classes (35-ways) tasks, the transfer learning baseline outperforms previous meta-learning algorithms, while in few-classes problems, the result is reversed: meta-learning beats transfer learning. Our proposed method, MTL, outperforms both these algorithms in all scenarios by improving the weaknesses of few-shot learning algorithms in generalizing to many-shot and many-classes problems.
Conclusion and Future Work
Having a single general model that is adaptable to many new unseen tasks is a crucial component in artificial intelligence. In this work, we presented a method to extend the capability of few-shot learning algorithms to many-shot and many-classes learning problems, by integrating them with transfer learning model. The next step is to use this approach on a larger dataset and deeper model, to see whether meta-learning is still outperforming transfer learning or not. Moreover, the effectiveness of this approach can be evaluated on other models such as object detection and segmentation. The source codes of this work are availableonline.
- [Finn et al.2017] Finn, C.; Abbeel, P.; and Levine, S. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. CoRR abs/1703.03400.
- [Nichol et al.2018] Nichol, A.; Achiam, J.; and Schulman, J. 2018. On first-order meta-learning algorithms. CoRR abs/1803.02999.
- [Pan et al.2010] Pan, S. J., and Yang, Q. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10):1345–1359.
- [Ravi et al.2016] Ravi, S., and Larochelle, H. 2016. Optimization as a model for few-shot learning. ICLR.
- [Schmidhuber1987] Schmidhuber, J. 1987. Evolutionary principles in self-referential learning. on learning now to learn: The meta-meta-meta…-hook. Diploma thesis, Technische Universitat Munchen, Germany.
- [Snell et al.2017] Snell, J.; Swersky, K.; and Zemel, R. S. 2017. Prototypical networks for few-shot learning. CoRR abs/1703.05175.
- [Vinyals et al.2016] Vinyals, O.; Blundell, C.; Lillicrap, T. P.; Kavukcuoglu, K.; and Wierstra, D. 2016. Matching networks for one shot learning. CoRR abs/1606.04080.
- [Yosinski et al.2014] Yosinski, J.; Clune, J.; Bengio, Y.; and Lipson, H. 2014. How transferable are features in deep neural networks? CoRR abs/1411.1792.