1 Introduction
Metalearning algorithms for neural networks
[18, 8, 29]prepare networks to quickly adapt to unseen tasks. This is done in a metatraining phase that typically involves a large number of supervised learning tasks. Very recently, several approaches had been proposed that perform the metatraining by generating synthetic training tasks from an
unsupervised dataset. This requires us to generate samples with specific pairwise information: inclass pairs of samples that are with high likelihood in the same class, and outofclass pairs that are with high likelihood not in the same class. For instance, UMTRA [12] and AAL [1] achieve this through random selection from a domain with many classes for outofclass pairs and by augmentation for inclass pairs. CACTUs [10] creates synthetic labels through unsupervised clustering of the domain. Unfortunately, these algorithms depend on domain specific expertise for the appropriate clustering and augmentation techniques.In this paper, we rely on recent advances in the field of generative models, such as the variants of generative adversarial networks (GANs) and variational autoencoders (VAEs), to generate the inclass and outofclass pairs of metatraining data. The fundamental idea of our approach is that inclass pairs are close while outofclass pairs are far away in the latent space representation of the generative model. Thus, we can generate inclass pairs by interpolating between two outofclass samples in the latent space and choosing interpolation ratios that put the new sample close to one of the objects. From this latent sample, the generative model creates the new inclass object. Our approach requires minimal domainspecific tweaking, and the necessary tweaks are humancomprehensible. For instance, we need to choose thresholds for latent space distance that ensure that classes are in different domains, as well as interpolation ratio thresholds that ensure that the sample is in the same class as the nearest edge. Another advantage of the approach is that we can take advantage of offtheshelf, pretrained generative models.
The main contributions of this paper can be summarized as follows:

We describe an algorithm, LAtent Space Interpolation Unsupervised Metalearning (LASIUM), that creates training data for a downstream metalearning algorithm starting from an unlabeled dataset by taking advantage of interpolation in the latent space of a generative model.

We show that on the most widely used fewshot learning datasets, LASIUM outperforms or performs competitively with other unsupervised metalearning algorithms, significantly outperforms transfer learning in all cases, and in a number of cases approaches the performance of supervised metalearning algorithms.
2 Related Work
Metalearning or “learning to learn” in the field of neural networks is an umbrella term that covers a variety of techniques that involve training a neural network over the course of a metatraining phase, such that when presented with the target task, the network is able to learn it much more efficiently than an unprepared network would. Such techniques had been proposed since the 1980s [27, 2, 19, 30]. In recent years, metalearning has gained a resurgence, through approaches that either “learn to optimize” [8, 24, 17, 20, 26, 22] or learn embedding functions in a nonparametric setting [29, 32, 25, 14]. Hybrids between these two approaches had also been proposed [31, 33].
Most approaches use labeled data during the metalearning phase. While in some domains there is an abundance of labeled datasets, in many domains such labeled data is difficult to acquire. Unsupervised metalearning approaches aim to learn from an unsupervised dataset from a domain similar from that of the target task. Typically these approaches generate synthetic fewshot learning tasks for the metalearning phase through a variety of techniques. CACTUs [10] uses a progressive clustering method. UMTRA [12] utilizes the statistical diversity properties and domainspecific augmentations to generate synthetic training and validation data. AAL [1] uses augmentation of the unlabeled training set to generate the validation data. The accuracy of these approaches was shown to be comparable with but lower than supervised metalearning approaches, but with the advantage of requiring orders of magnitude less labeled training data. A common weakness of these approaches is that the techniques used to generate the synthetic tasks (clustering, augmentation, random sampling) are highly domain dependent.
Our proposed approach, LASIUM, takes advantage of generative models trained on the specific domain to create the inclass and outofclass pairs of metatraining data. The most successful neuralnetwork based generative models in recent years are variational autoencoders (VAE) [5] and generative adversarial networks (GANs) [9]. The implementation variants of the LASIUM algorithm described in this paper rely on the original VAE model and on two specific variations of the GAN concept, respectively. MSGAN (aka MissGAN) [16] aims to solve the missing mode problem of conditional GANs through a regularization term that maximizes the distance between the generated images with respect to the distance between their corresponding input latent codes. Progressive GANs [11] are growing both the generator and discriminator progressively, and approach resembling the layerwise training of autoencoders.
3 Method
3.1 Preliminaries
We define an way, shot supervised classification task, , as a set composed of data points such that there are exactly samples for each categorical label . During metalearning, an additional set ,, is attached to each task that contains another data points separate from the ones in . We have exactly samples for each class in as well.
It is straightforward to package way, shot tasks with and from a labeled dataset. However, in unsupervised metalearning setting, a key challenge is how to automatically construct tasks from the unlabeled dataset .
3.2 Generating metatasks using generative models
We have seen that in order to generate the training data for the metalearning phase, we need to generate way training tasks with training and validation samples. The label associated with the classes in these tasks is not relevant, as it will be discarded after the metalearning phase. Our objective is simply to generate samples of the type with and with the following properties: (a) all the samples are different (b) any two samples with the same index are inclass samples and (c) any two samples with different index are outofclass samples. In the absence of human provided labels, the class structure of the domain is defined only implicitly by the sample selection procedure. Previous approaches to unsupervised metalearning chose samples directly from the training data , or created new samples through augmentation. For instance, we can define the class structure of the domain by assuming that certain types of augmentations keep the samples inclass with the original sample. One challenge of such approaches is that the choice of the augmentation is domain dependent, and the augmentation itself can be a complex mathematical operation.
In this paper we approach the sample selection problem differently. Instead of sampling from , we use the unsupervised dataset to train a generative model
. Generative models represent the full probability distribution of a model, and allow us to sample new instances from the distribution. For many models, this sampling process can be computationally expensive iterative process. Many successful neural network based generative models use the
reparametrization trick for the training and sampling which concentrate the random component of the model in a latent representation . By choosing the latent representationfrom a simple (uniform or normal) distribution, we can obtain a sample from the complex distribution
by passing through a deterministic generator . Two of the most popular generative models, variational autoencoders (VAEs) and generative adversarial networks (GANs) follow this model.The idea of the LASIUM algorithm is that given a generator component , nearby latent space values and map to inclass samples and . Conversely, and values that are far away from each other, map to out of class samples. Naturally, we still need to define what we mean by “near” and “far” in the latent space and how to choose the corresponding values. However, this is a significantly simpler task than, for instance, defining the set of complex augmentations that might retain class membership.
[ht]
Training a generative model Our method for generating metatasks is agnostic to the choice of training algorithm for the generative model and can use either a VAE or a GAN with minimal adjustments. In our VAE experiments, we used a network trained with the standard VAE training algorithm [5]. For the experiments with GANs we used two different methods mode seeking GANs (MSGAN) [16] and progressive growing of GANs (proGAN) [11].
Algorithm 1 describes the steps of our method. We will delve into each step in the following parts of this section.
Sampling out of class instances from the latent space representation: Our sampling techniques differ slightly whether we are using a GAN or VAE. For GAN, we use rejection sampling to find latent space vectors that are at a pairwise distance of at least threshold  see Figure 1(a). When using a VAE, we also have an encoder network that allows us to map from the domain to the latent space. Taking advantage of this, we can additionally sample data points from our unlabeled dataset and embed them into a latent space. If the latent space representation of these images are too close to each other, we resample, otherwise we can use the images and their representations and continue the following steps exactly the same as GANs  see Figure 2(a) and (b). We will refer to the vectors selected here as anchor vectors.
Generating inclass latent space vectors Next, having sampled anchor vectors from the latent space representation, we aim to generate new vectors from the latent space representation such that the generated image belongs to the same class as the one of for . This process needs to be repeated for times.
The sampling strategy takes as input the sampled vectors and a number and returns new vectors such that and are an inclass pair for . This ensures that no two belong to the same class and creates groups of vectors in our latent space. We feed these vectors to our generator to get groups of images. From each group we pick the first for and the last for .
What remains is to define the strategy to sample the individual inclass vectors. We propose three different sampling strategies, all of which can be seen as variations of the idea of latent space interpolation sampling. This motivates the name of the algorithm LAtent Space Interpolation Unsupervised Metalearning (LASIUM).
LASIUMN (adding Noise): This technique generates inclass samples by adding Gaussian noise to the anchor vector where (see Figure 3Left). In the context of LASIUM, we can see this as an interpolation between the anchor vector and a noise vector, with the interpolation factor determined by . For the impact of different choices of see the ablation study in section 4.6.
LASIUMRO (with Random Outofclass samples) To generate a new inclass sample to anchor vector we first find a random outofclass sample , and choose an interpolated version closer to the anchor: (see Figure 3Middle). Here,
is a hyperparameter, which can be tuned to define the size of the class. As we are in a comparatively highdimensional latent space (in our case, 512 dimensions), we need relatively large values of
, such as to define classes of reasonable size. This model effectively allows us to define complex augmentations (such as a person seen without glasses, or in a changed lighting) with only one scalar hyperparameter to tune. By interpolating towards another sample we ensure that we are staying on the manifold that defines the dataset (in the case of Figure 3, this being human faces).LASIUMOC (with Other Classes’ samples) This technique is similar to LASIUMRO, but instead of using a randomly generated outofclass vector, we are interpolating towards vectors already chosen from the other classes in the same task (see Figure 3Right). This limits the selection of the samples to be confined to the convex hull defined by the initial anchor points. The intuition behind this approach is that choosing the samples this way focuses the attention of the metalearner towards the hard to distinguish samples that are between the classes in the few shot learning class (eg. they share certain attributes).
4 Experiments
We tested the proposed algorithms on three fewshot learning benchmarks: (a) the way Omniglot [13], a benchmark for fewshot handwritten character recognition, (b) the way CelebA fewshot identity recognition, and (c) the CelebA attributes dataset [15] proposed as a fewshot learning benchmark by [8] that comprises binary classification (
way) tasks in which each task is defined by selecting 3 different attributes and 3 boolean values corresponding to each attribute. Every image in a certain taskspecific class has the same attributes with each other while does not share any of these attributes with images in the other class. Last but not least we evaluate our results on (d) the miniImageNet
[23] fewshot learning benchmark.We partition each dataset into metatraining, metavalidation, and metatesting splits between classes. To evaluate our method, we use the classes in the test set to generate 1000 tasks as described in section 3.2. We set to be 15. We average the accuracy on all tasks and report a confidence interval. To ensure that comparisons are fair, we use the same random seed in the whole task generation process. For the Omniglot dataset, we report the results for , and . For CelebA identity recognition, we report our results for and . For CelebA attributes, we follow the and tasks as proposed by [10].
4.1 Baselines
As baseline algorithms for our approach we follow the practice of recent papers in the unsupervised metalearning literature. The simplest baseline is to train the same network architecture from scratch with images. More advanced baselines can be obtained by learning an unsupervised embedding on and use it for downstream task training. We used the ACAI [3], BiGAN [6, 7], and DeepCluster [4] as representative of the unsupervised learning literature. On top of these embeddings, we report accuracy for
nearest neighbors, linear classifier, multi layer perceptron (MLP) with dropout, and cluster matching.
The direct competition for our approach are the current stateoftheart algorithms in unsupervised metalearning. We compare our results with CACTUsMAML [10], CACTUsProtoNets [10] and UMTRA [12]. Finally, it is useful to compare our approach with algorithms that require supervised data. We include results for supervised standard transfer learning from VGG19 pretrained on ImageNet [28] and two supervised metalearning algorithms, MAML [10], and ProtoNets [10].
4.2 Neural network architectures
Since excessive tuning of hyperparameters can lead to the overestimation of the performance of a model [21], we keep the hyperparameters of the unsupervised metalearning as constant as possible (including the MAML, and ProtoNets model architectures) in all experiments. Our model architecture consists of four stacked convolutional blocks. Each block comprises 64 filters that carry out
convolutions, followed by batch normalization, a ReLU nonlinearity, and
maxpooling. For the MAML experiments, classification is performed by a fully connected layer, whereas for the ProtoNets model we compute distances based on the feature vectors produced by the last convolution module without any dense layers. The input size to our model is for CelebA and for Omniglot.For Omniglot, our VAE model is constructed symmetrically. The encoder is composed of four convolutional blocks, with batch normalization and ReLU activation following each of them. A dense layer is connected to the end such that given an input image of shape , the encoder produces a latent vector of length . On the other side, the decoder starts from a dense layer whose output has length . It is then fed into four modules each of which consists of a transposed convolutional layer, batch normalization and the ReLU nonlinearity. We use kernels,
channels and a stride of
for all the convolutional and transposed convolutional layers. Hence, the generated image has the size ofthat is identical to the input images. This VAE model is trained for 1000 epochs with a learning rate of 0.001.
Our GAN generator gets an input of size which is the dimensionality of the latent space and feeds it into a dense layer of size . After applying a Leaky ReLU with , we reshape the output of dense layer to 128 channels of shape . Then we feed it into two upsampling blocks, where each block has a transposed convolution with 128 channels, kernels and strides. Finally, we feed the outcome of the upsampling blocks into a convolution layer with 1 channel and a kernel with sigmoid activaiton. The discriminator takes a input and feeds it into three convolution layers with 64, 128 and 128 channels and strides. We apply leaky ReLU activation after each convolution layer with . Finally we apply a global
D max pooling layer and feed it into a dense layer with 1 neuron to classify the output as real or fake. We use the same loss function for training as described in
[16].For the CelebA GAN experiments, we use the pretrained network architecture described in [11]. For VAE, we use the same architecture as we described for Omniglot VAE with one more convolution block and more channels to handle the larger input size of . The exact architecture is described in section 4.6.
4.3 Results on Omniglot
Table 1 shows the results on the Omniglot dataset. We find that the LASIUMROGANMAML configuration outperforms all the unsupervised approaches, including the metalearning based ones like CACTUs [10] and UMTRA [12]. Beyond the increase in performance, we must note that the competing approaches use more domain specific knowledge (in case of UMTRA augmentations, in case of CACTUs, learned clustering). We also find that on this benchmark, LASIUM outperforms transfer learning using the much larger VGG19 network.
As expected even the best LASIUM result is worse than the supervised metalearning models. However, we need to consider that the unsupervised metalearning approaches use several orders of magnitude less labels. For instance, the 95.29% accuracy of LASIUMROGANMAML was obtained with only 25 labels, while the supervised approaches used 25,000.
Algorithm  Feature Extractor  = 1  = 5 

Training from scratch  
Knearest neighbors  ACAI  
Linear Classifier  ACAI  
MLP with dropout  ACAI  
Cluster matching  ACAI  
Knearest neighbors  BiGAN  
Linear Classifier  BiGAN  
MLP with dropout  BiGAN  
Cluster matching  BiGAN  
CACTUsMAML  BiGAN  
CACTUsMAML  ACAI  
UMTRAMAML  
LASIUMROGANMAML  
LASIUMNVAEMAML  
CACTUsProtoNets  BiGAN  
CACTUsProtoNets  ACAI  
LASIUMROGANProtoNets  
LASIUMOCVAEProtoNets  
Transfer Learning (VGG19)  
Supervised MAML  
Supervised ProtoNets 
4.4 Results on CelebA
Table 2 shows our results on the CelebA identity recognition tasks where the objective is to recognize different people given images for each. We find that on this benchmark as well, the LASIUMROGANMAML configuration performs better than other unsupervised metalearning models as well as transfer learning with VGG19  it only falls slightly behind LASIUMROGANProtoNets on the oneshot case. As we have discussed in the case of Omniglot results, the performance remains lower then the supervised metalearning approaches which use several orders of magnitude more labeled data.
Finally, Table 3 shows our results for CelebA attributes benchmark introduced in [10]. A peculiarity of this dataset is that the way in which classes are defined based on the attributes, the classes are unbalanced in the dataset, making the job of synthetic task selection more difficult. We find that LASIUMNGANMAML obtains the second best on this test with a performance of , within the confidence interval of the winner, CACTUs MAML with BiGAN . In this benchmark, transfer learning with the VGG19 network performed better than all unsupervised metalearning approaches, possibly due to existing representations of the discriminating attributes in that much more complex network.
Algorithm  = 1  = 5  = 15 

Training from scratch  
CACTUs  
UMTRA  
LASIUMROGANMAML  
LASIUMROVAEMAML  
LASIUMROGANProtoNets  
LASIUMROVAEProtoNets  
Transfer Learning (VGG19)  
Supervised MAML  
Supervised ProtoNets 
Algorithm  Feature Extractor  Accuracy 

Training from scratch  N/A  
Knearest neighbors  BiGAN  
Linear Classifier  BiGAN  
MLP with dropout  BiGAN  
Cluster matching  BiGAN  
Knearest neighbors  DeepCluster  
Linear Classifier  DeepCluster  
MLP with dropout  DeepCluster  
Cluster matching  DeepCluster  
CACTUs MAML  BiGAN  
CACTUs MAML  DeepCluster  
LASIUMNGANMAML  N/A  
CACTUs ProtoNets  BiGAN  
CACTUs ProtoNets  DeepCluster  
LASIUMNGANProtoNets  N/A  
Transfer Learning (VGG19)  N/A  
Supervised MAML  N/A  
Supervised ProtoNets  N/A 
4.5 Results on miniImageNet
In this section, we evaluate our algorithm on miniImageNet benchmark. Its complexity is high due to the use of ImageNet images. In total, there are 100 classes with 600 samples of color images per class. These 100 classes are divided into 64, 16, and 20 classes respectively for sampling tasks for metatraining, metavalidation, and metatest. A big difference between miniImageNet and CelebA is that we have to classify a group of concepts instead of just the identity of a subject. This makes interpreting the latent space a bit trickier. For example, it is not rational to interpolate between a bird and a piano. However, the assumption that nearby latent vectors belong to nearby instances is still valid. Thereby, we could be confident by not getting too far from the current latent vector, we generate something which belongs to the same class (identity).
For miniImageNet we use a pretrained network BigBiGAN^{1}^{1}1https://tfhub.dev/deepmind/bigbiganresnet50/1. Our experiments show that our method is very effective and can outperform stateoftheart algorithms. See Table 4 for the results on miniImageNet benchmark. Figure 4 demonstrates tasks constructed for miniImageNet by LASIUMN with .
Algorithm  Embedding  = 1  = 5  = 20  = 50 
Training from scratch  N/A  
Knearest neighbors  BiGAN  
Linear Classifier  BiGAN  
MLP with dropout  BiGAN  
Cluster matching  BiGAN  
Knearest neighbors  DeepCluster  
Linear Classifier  DeepCluster  
MLP with dropout  DeepCluster  
Cluster matching  DeepCluster  
CACTUs MAML  BiGAN  
CACTUs MAML  DeepCluster  
UMTRA MAML  N/A  
LASIUMNGANMAML  N/A  
CACTUs ProtoNets  BiGAN  
CACTUs ProtoNets  DeepCluster  
LASIUMNGANProtoNets  N/A  
Supervised MAML  N/A  
Supervised ProtoNets  N/A  
4.6 Hyperparameters and ablation studies
In this section, we report the hyperparameters of LASIUMMAML in Table 5 and LASIUMProtoNets in Table 6 for Omniglot, CelebA, CelebA attributes and miniImageNet datasets.
We also report the ablation studies on different strategies for task construction in Table 7. We run all the algorithm for just 1000 iterations and compared between them. We also apply a small shift to Omniglot images.
Hyperparameter  Omniglot  CelebA  CelebA attributes  miniImageNet 
Number of classes  
Input size  
Inner learning rate  0.4  0.05  0.05  0.05 
Meta learning rate  0.001  0.001  0.001  0.001 
Metabatch size  4  4  4  4 
metalearning  1  1  5  1 
metalearning  5  5  5  5 
evaluation  15  15  5  15 
Metaadaptation steps  5  5  5  5 
Evaluation adaptation steps  50  50  50  50 
Hyperparameter  Omniglot  CelebA  CelebA attributes  miniImageNet 

Number of classes  
Input size  
Meta learning rate  0.001  0.001  0.001  0.001 
Metabatch size  4  4  4  4 
metalearning  1  1  5  1 
metalearning  5  5  5  5 
evaluation  15  15  5  15 
Sampling Strategy  Hyperparameters  GANMAML  VAEMAML  GANProto  VAEProto 
LASIUMN  =0.5  
LASIUMN  =1.0  
LASIUMN  =2.0  
LASIUMRO  =0.2  
LASIUMRO  =0.4  
LASIUMOC  =0.2  
LASIUMOC  =0.4  
5 Conclusion
We described LASIUM, an unsupervised metalearning algorithm for fewshot classification. The algorithm is based on interpolation in the latent space of a generative model to create synthetic metatasks. In contrast to other approaches, LASIUM requires minimal domain specific knowledge. We found that LASIUM outperforms stateoftheart unsupervised algorithms on the Omniglot and CelebA identity recognition benchmarks and competes very closely with CACTUs on the CelebA attributes learning benchmark.
6 Acknowledgements
This work had been in part supported by the National Science Foundation under Grant Number IIS1409823.
References
 [1] (2019) Assume, augment and learn: unsupervised fewshot metalearning via random labels and data augmentation. arXiv preprint arXiv:1902.09884. Cited by: §1, §2.
 [2] (1990) Learning a synaptic learning rule. Université de Montréal, Département d’Informatique et de Recherche Opérationelle. Cited by: §2.
 [3] (2019) Understanding and improving interpolation in autoencoders via an adversarial regularizer. In Int’l Conf. on Learning Representations (ICLR), Cited by: §4.1.

[4]
(2018)
Deep clustering for unsupervised learning of visual features.
In
Proc. of the European Conf. on Computer Vision (ECCV)
, pp. 132–149. Cited by: §4.1.  [5] (2014) Autoencoding variational bayes. In Proc. of the Int’l Conf. on Learning Representations (ICLR), Vol. 1. Cited by: §2, §3.2.
 [6] (2017) Adversarial feature learning. In Int’l Conf. on Learning Representations (ICLR), Cited by: §4.1.
 [7] (2017) Adversarially learned inference. In Int’l Conf. on Learning Representations (ICLR), Cited by: §4.1.

[8]
(2017)
Modelagnostic metalearning for fast adaptation of deep networks.
In
Proc. of Int’l Conf. on Machine Learning (ICML)
, pp. 1126–1135. Cited by: §1, §2, §4.  [9] (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems (NeurIPS), pp. 2672–2680. Cited by: §2.
 [10] (2019) Unsupervised learning via metalearning. In Int’l Conf. on Learning Representations (ICLR), Cited by: §1, §2, §4.1, §4.3, §4.4, §4.
 [11] (2018) Progressive growing of GANs for improved quality, stability, and variation. Proc. of the Int’l Conf. on Learning Representations (ICLR). Cited by: §2, §3.2, §4.2.
 [12] (2019) Unsupervised metalearning for fewshot image classification. In Advances in Neural Information Processing Systems (NeurIPS), pp. 10132–10142. Cited by: §1, §2, §4.1, §4.3.
 [13] (2011) One shot learning of simple visual concepts. In Proc. of the Annual Meeting of the Cognitive Science Society, Vol. 33. Cited by: §4.
 [14] (2019) Learning to propagate labels: Transductive propagation network for fewshot learning. In Int’l Conf. on Learning Representations (ICLR), Cited by: §2.
 [15] (201512) Deep learning face attributes in the wild. In Proc. of Int’l Conf. on Computer Vision (ICCV), Cited by: §4.

[16]
(2019)
Mode seeking generative adversarial networks for diverse image synthesis.
Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition
, pp. 1429–1437. Cited by: §2, §3.2, §4.2.  [17] (2017) Metalearning with temporal convolutions. arXiv preprint arXiv:1707.03141. Cited by: §2.
 [18] (2018) A Simple Neural Attentive MetaLearner. In Int’l Conf. on Learning Representations (ICLR), Cited by: §1.
 [19] (1992) Metaneural networks that learn by learning. In [Proc. 1992] IJCNN Int’l Joint Conf. on Neural Networks, Vol. 1, pp. 437–442. Cited by: §2.
 [20] (2018) On firstorder metalearning algorithms. arXiv preprint arXiv:1803.02999. Cited by: §2.

[21]
(2018)
Realistic evaluation of deep semisupervised learning algorithms
. In Advances in Neural Information Processing Systems (NeurIPS), pp. 3235–3246. Cited by: §4.2.  [22] (2019) Metalearning with implicit gradients. In Advances in Neural Information Processing Systems (NeurIPS), pp. 113–124. Cited by: §2.
 [23] (2016) Optimization as a model for fewshot learning. Proc. of Int’l Conf. on Learning Representations (ICLR). Cited by: §4.
 [24] (2016) Optimization as a model for fewshot learning. Int’l Conf. on Learning Representations (ICLR). Cited by: §2.
 [25] (2018) MetaLearning for SemiSupervised FewShot Classification. In Int’l Conf. on Learning Representations (ICLR), Cited by: §2.
 [26] (2019) MetaLearning with Latent Embedding Optimization. In Int’l Conf. on Learning Representations (ICLR), Cited by: §2.
 [27] (1987) Evolutionary principles in selfreferential learning, or on learning how to learn: the metameta… hook. Ph.D. Thesis, Technische Universität München. Cited by: §2.
 [28] (2015) Very deep convolutional networks for largescale image recognition. Int’l Conf. on Learning Representations (ICLR). Cited by: §4.1.
 [29] (2017) Prototypical networks for fewshot learning. In Advances in Neural Information Processing Systems (NeurIPS), pp. 4077–4087. Cited by: §1, §2.
 [30] (1998) Learning to learn. Kluwer Academic Publishers. Cited by: §2.
 [31] (2020) MetaDataset: A Dataset of Datasets for Learning to Learn from Few Examples. In Int’l Conf. on Learning Representations (ICLR), Cited by: §2.
 [32] (2016) Matching networks for one shot learning. In Advances in Neural Information Processing Systems (NeurIPS), pp. 3630–3638. Cited by: §2.
 [33] (2019) A hybrid approach with optimizationbased and metricbased metalearner for fewshot learning. Neurocomputing 349, pp. 202–211. Cited by: §2.
Comments
There are no comments yet.