1 Introduction
The hallmark of learning new concepts from very few examples characterizes human intelligence. Though constantly pushing limits forward in various visual tasks, current deep learning approaches struggle in cases when abundant training data is impractical to gather. A straightforward idea to learn new concepts is to finetune a model pretrained on
base categories, using limited data from another set of novel categories. However, this usually leads to catastrophic forgetting goodfellow2013empirical , i.e., finetuning makes the model overfitting on novel classes, and agnostic to the majority of base classes kirkpatrick2017overcoming ; shmelkov2017incremental , deteriorating overall performance.One way to address this problem is to augment data for novel classes. Since generating images could be both unnecessary xian2017feature and impractical salimans2016improved on large datasets, feature augmentation arandjelovic2012three ; chen2018person is more preferable in this scenario. Building upon learned representations vinyals2016matching ; snell2017prototypical ; sung2017learning , recently two variants of generative models show promising capability of learning variation modes from base classes to imagine the missing pattern of novel classes. Bharath et al. proposed Feature Hallucination (FH) hariharan2016low , which can learn a finite set of transformation mappings between examples in each base category and directly apply them to seed novel points for extra data. However, since mappings are enumerable (even in large amount), this model suffers from poor generalization. To address this issue, Wang et al. wang2018low proposed Feature Imagination (FI), a metalearning based generation framework that can train an agent to synthesize extra data given a specific task. They circumvented the demand for latent distribution of novel classes by endtoend optimization. But the generation results usually collapse into certain modes. Finally, it should be noted that both works erroneously assume that intraclass variances of base classes are sharable with any novel classes. For example, the visual variability of the concept lemon cannot be generalized to other irrelevant categories such as raccoon.
In this work, we propose a new approach to addressing the problem of lowshot learning by enabling better feature augmentation beyond current limits. Our approaches are novel in two aspects: modeling and training strategy. We propose CovariancePreserving Adversarial Augmentation Networks (CPAAN), a new class of Generative Adversarial Networks (GAN) goodfellow2014generative ; mirza2014conditional
for feature augmentation. We take inspiration from unpaired imagetoimage translation
zhu2017unpaired ; zhu2017toward and formulate our feature augmentation problem as an imbalanced settoset translation problem where the conditional distribution of examples of each novel class can be conceptually expressed as a mixture of related base classes. We first extract all related basenovel class pairs by an intuitive yet effective approach called Neighborhood Batch Sampling. Then, our model aims to learn the latent distribution of each novel class given its base counterparts. Since the direct estimation on novel classes can be inductively biased during this process, we explicitly preserve the covariance base examples during the generation process.We systematically evaluate our approach by considering a series of objective functions. Our model achieves the stateoftheart performance over the challenging ImageNet benchmark deng2009imagenet . With ablation studies, we also demonstrate the effectiveness of each component in our method.
2 Related Works
Lowshot Learning For quick adaptation when very few novel examples are available, the community has often used a metaagent lei1996dynamic
to further tune base classifiers
vinyals2016matching ; snell2017prototypical ; sung2017learning . Intuitive yet often ignored, feature augmentation was recently brought into the field by Bharath et al. hariharan2016low to ease the data scarce scenario. Compared to traditional metalearning based approaches, they have reported noticeable improvement on not only the conventional setting (i.e., to test on novel examples only), but also the more challenging generalized setting (i.e., to test on all classes). Yet the drawback is that both the original work and its variants wang2018low fail to synthesize diverse examples because of illconstrained generation processes. Our approach falls in this line of research while seeking for more principal guidance from base examples in a selective, classspecific manner.Generative Adversarial Network for Settoset Translation GANs goodfellow2014generative map each latent code from an easily sampled prior to a realistic sample of a complex target distribution. Zhu et al. zhu2017unpaired have achieved astounding results on imagetoimage translation without any paired training samples. In our case, diverse feature augmentation is feasible through conditional translation given a pair of related novel and base classes. Yet two main challenges remain: practically, not all examples are semantically translatable. Second, given extremely scarce data for novel classes, we are unable to estimate their latent distributions (see Figure 4). In this work, we thoroughly investigate conditional GAN variants inspired by previous works salimans2016improved ; mirza2014conditional ; zhu2017toward ; odena2016conditional to enable lowshot generation. Furthermore, we introduce a novel batch sampling technique for learning salient settoset mappings using unpaired data with categorical conditions.
Generation from Limited Observations Estimation of latent distribution from a handful of observations is biased and inaccurate tobin1958estimation ; candes2007dantzig . The Bayesian approaches aim to model latent distributions of a variety of classes as hierarchical Gaussian mixture salakhutdinov2012one , or alternatively model generation as a sequential decision making process rezende2016one . For GANs, Gaussian mixture noise has also been incorporated for latent code sampling gurumurthy2017deligan . Recent works mroueh2017mcgan ; mroueh2017fisher
on integral probability metrics provide theoretical guidance towards the high order feature matching. In this paper, building upon the assumption that related classes should have similar intraclass variance, we introduce a new loss term for preserving covariance during the translation process.
3 Imbalanced Settoset Translation
In this section, we formulate our lowshot feature augmentation problem under an imbalanced settoset translation framework. Concretely, we are given two labeled datasets represented in the same dimensional semantic space: (1) a base set consisting of abundant samples and (2) a novel set with only a handful of observations. Their discrete label spaces are assumed to be nonoverlapping, i.e., . Our goal is to learn a mapping function in order to translate examples of the base classes into novel categories. After the generation process, a final classifier is trained using both original examples of the base classes and all (mostly synthesized) examples of the novel classes.
Existing works hariharan2016low ; wang2018low suffer from the use of arbitrary, and thus possibly unrelated, base classes for feature augmentation. Moreover, their performances are degraded by naive generation methods without modeling the latent distribution of each novel class. Our insight, conversely, is to sample extra features from continuous latent distributions rather than certain modes from enumerations, by learning a GAN model (see Figure 2).
Specifically, we address two challenges that impede good translation under imbalanced scenarios: (1) through which basenovel class pairs we can translate; and more fundamentally, (2) through what objectives for GAN training we can estimate latent distribution of novel classes with limited observations. We here start by proposing a straightforward batch sampling technique to address the first problem. Then we suggest a simple extension of existing methods and study its weakness, which motivates the development of our final approach. For clarity, we introduce a toy dataset for imbalanced settoset translation in Figure 3 as a conceptual demonstration of the proposed method compared to baselines.
3.1 Neighborhood Batch Sampling
It is widely acknowledged goldberger2005neighbourhood ; vinyals2016matching ; snell2017prototypical that a metriclearned high dimensional space encodes relational semantics between examples. Therefore, to define which base classes are translatable to a novel class, we can rank them by their distance in a semantic space. For simplicity, we formulate our approach on top of Prototypical Networks snell2017prototypical , learned by a nearest neighbor classifier on the semantic space measured by the Euclidean distance. We represent each class as a cluster and encode its categorical information by the cluster prototype :
(1) 
It should be noted that by “prototype” we mean the centroid of examples of a class. It should not be confused with the centroid of randomly sampled examples that is computed in each episode to train original Prototypical Networks.
We introduce translation mapping where is the powerset of the collection of all base classes. This defines a manytomany relationship between novel and base classes, and is used to translate data from selected base classes to each novel class. To this end, given a novel class , we compute its similarity scores with all base classes using softmax over Euclidean distances between prototypes,
(2) 
This results in a soft mapping (NBSS) between base and novel classes, in which each novel class is paired with all base classes with soft scores. In practice, translating from all base classes is unnecessary, and computationally expensive. Alternatively, we consider a hard version of based on nearest neighbor search, where the top base classes are selected and treated as equal (). This hard mapping (NBSH) saves memory, but introduces an extra hyperparameter.
3.2 Adversarial Objective
After constraining our translation process to selected class pairs, we develop a baseline based on Conditional GAN (cGAN) mirza2014conditional . To this end, a discriminator is trained to classify real examples as the corresponding novel classes, and classify synthesized examples as an auxiliary “fake” class. salimans2016improved . The generator takes an example from base classes that are paired with via NBS, and aims to fool the discriminator into classifying the generated example as instead of the “fake”. More specifically, the adversarial objective can be written as:
(3)  
(4) 
where consists of all novel examples labeled with in while consists all base examples labeled by one of the classes in .
We train cGAN by solving the minimax game of the adversarial loss. In this scenario, there is no explicit way to incorporate base classes intraclass variance into the generation of new novel examples. Also, any mappings that collapse synthesized features into existing observations yield the optimal solution goodfellow2014generative . These facts lead to unfavorable generation results as shown in Figure 2(b). We next explore different ways to explicitly force the generator to learn the latent conditional distributions.
3.3 Cycleconsistency Objective
A natural idea for preventing modes from getting dropped is to apply the cycleconsistency constraint whose effectiveness has been proven over imagetoimage translation tasks zhu2017unpaired . Besides extra supervision, it eliminates the demand for paired data, which is impossible to acquire for the lowshot learning setting. We extend this method for our conditional scenario and derive cCycGAN. Specifically, we learn two generators: , which is our main target, and as an auxiliary mapping that reinforces . We train the generators such that, the translation cycle recovers the original embedding in either a forward cycle or a backward cycle . Our cycleconsistency objective could then be derived as,
(5)  
(6) 
where a
dimensional noise vector sampled from a distribution
is injected into ’s input since novel examples lack variability given the very limited amount of data.is a normal distribution
for our cCycGAN model.While is hard to train due to the extremely small data volume; has more to learn from, and can thus indirectly guide through its gradient. During our experiments, we found that cycleconsistency is indispensable for stabilizing the training procedure. Swaminathan et al. gurumurthy2017deligan
observe that incorporating extra noise from a mixture of Gaussian distributions could result in more diverse results. Hence, we also report a variant called
cDeLiGAN which uses the same objective as cCycGAN, but sample the noise vector from a mixture of different Gaussian distributions,(7) 
We follow the initialization setup in the previous work gurumurthy2017deligan . For each
, we sample from a uniform distribution
. And for each , we first sample a vector from a Gaussian distribution , then we simply set3.4 Covariancepreserving Objective
While cycleconsistency alone can transfer certain degrees of intraclass variance from base classes, we find it rather weak and unreliable since there are still infinite candidate distributions that cannot be discriminated based on limited observations (See Figure 4).
Building upon the assumption that similar classes share similar intraclass variance, one straightforward idea is to penalize the change of “variability” during translation. Hierarchical Bayesian models salakhutdinov2012one , prescribe each class as a multivariate Gaussian, where intraclass variability is embedded in a covariance matrix. We generalize this idea and try to maintain covariance in the translation process, although we model the class distribution by GAN instead of any prescribed distributions.
To compute the difference between two covariance matrices mroueh2017mcgan , one typical way is to measure the worst case distance between them using Ky Fan norm, i.e.
, the sum of singular values of
truncated SVD, which we denote as . To this end, we define the pseudoprototype of each novel class as the centroid of all synthetic samples translated from related base classes. The covariance distance between a basenovel class pair can then be formulated as,(8) 
Consequently, our covariancepreserving objective can be written as the expectation of the weighted covariance distance using NBSS,
(9) 
Note that, for a matrix , is nondifferentiable with respect to itself, thus in practice, we calculate its subgradient instead. Specifically, we first compute the unitary matrices , by truncated SVD xu1998truncated , and then backpropagate for sequential parameter updates. Proof of the correctness is provided in the supplementary material.
Finally, we propose our covariancepreserving conditional cycleGAN, cCovGAN, as:
(10)  
As illustrated in Figure 2(e), preserving covariance information from relevant base classes to a novel class can improve lowshot generation quality. We attribute this empirical result to the interplay of adversarial learning, cycle consistency, and covariance preservation, that respectively lead to realistic generation, semantic consistency, and diversity.
3.5 Training
Following recent works on metalearning santoro2016one ; hariharan2016low ; wang2018low , we design a twostage training procedure. During the “metatraining” phase, we train our generative model with base examples only, by mimicking the lowshot scenario it would encounter later. After that, in the “metatesting” phase, we are given novel classes as well as their lowshot examples. We use the trained to augment each class until it has the average capacity of the base classes. Then we train a classifier as one would normally do in a supervised setting using both real and synthesized data. For the choice of this final classifier, we apply the same one as in the original representation learning stage. For examples, we use a nearest neighbor classifier for embeddings from Prototypical Networks, and a normal linear classifier for those from ResNets.
We follow the episodic procedure used by wang2018low during metatraining. In each episode, we sample “metanovel” classes from , and use the rest of as “metabase” classes. Then we sample examples from each metanovel class as metanovel examples. We compute prototypes of each class and similarity scores between each “metanovel” and “metabase” class. To sample a batch of size , we first include all “metanovel” examples, and sample examples uniformly from the “metabase” classes retrieved by translation mapping . Next, we push our samples through generators and discriminators to compute the loss. Finally, we update their weights for the current episode and start the next one.
4 Experiments
This section is organized as follows. In Section 4.1, we conduct lowshot learning experiments on the challenging ImageNet benchmark. In Section 4.2, we further discuss with ablation, both quantitatively and qualitatively, to better understand the performance gain. We demonstrate our model’s capacity to generate diverse and reliable examples and its effectiveness in lowshot classification.
Dataset We evaluate our method on the realworld benchmark proposed by Bharath et al. hariharan2016low . This is a challenging task because it requires us to learn a large variety of ImageNet deng2009imagenet given a few exemplars for each novel classes. To this end, our model must be able to model the visual diversity of a wide range of categories and transfer knowledge between them without confusing unrelated classes. Following hariharan2016low , we split the 1000 ImageNet classes into four disjoint class sets , which consist of 193, 300, 196, 311 classes respectively. All of our parameter tuning is done on validation splits, while final results are reported using heldout test splits.
Evaluation We repeat sampling novel examples five times for heldout novel sets and report results of mean top5 accuracy in both conventional lowshot learning (LSL, to test on novel classes only) and its generalized setting (GLSL, to test on all categories including base classes).
Baselines We compare our results to the exact numbers reported by Feature Hallucination hariharan2016low and Feature Imagination wang2018low . We also compared to other nongenerative methods including classical Siamese Networks koch2015siamese , Prototypical Networks snell2017prototypical , Matching Networks vinyals2016matching , and MAML finn2017model as well as more recent Prototypical Matching Networks wang2018low and Attentive Weight Generators gidaris2018dynamic . For stricter comparison, we provide two extra baselines to exclude the bias induced by different embedding methods: PFH builds on Feature Hallucinating by substituting their nonepisodic representation with learned prototypical features. Another baseline (first row in Table 1), on the contrary, replaces prototypical features with raw ResNet10 embeddings. The results for MAML and SN are reported using their published codebases online.
Implementation details
Our implementation is based on PyTorch
paszke2017automatic . Since deeper networks would unsurprisingly result in better performance, we confine all experiments in a ResNet10 backbone^{1}^{1}1Released on https://github.com/facebookresearch/lowshotshrinkhallucinate with a 512d output layer. We finetune the backbone following the procedure described in hariharan2016low. For all generators, we use threelayer MLPs with all hidden layers’ dimensions fixed at 512 as well as their output for synthesized features. Our discriminators are accordingly designed as threelayer MLPs to predict probabilities over target classes plus an extra fake category. We use leaky ReLU of slope 0.1 without batch normalization. Our GAN models are trained for 100000 episodes by ADAM
kingma2014adam with initial learning rate fixed at 0.0001 which anneals by 0.5 every 20000 episodes. We fix the hyperparameter for computing truncated SVD. For loss term contributions, we set and for all final objectives. We choose as the dimension of noise vectors for ’s input, and for the Gaussian mixture. We inject prototype embeddings instead of onehot vectors as categorical information for all networks (prototypes for novel classes are computed using the lowshot examples only). We empirically set batch size , and and for all training, no matter what would the number of shots be in the test. This is more efficient but possibly less accurate than snell2017prototypical who trained separate models for each testing scenario, so the number of shots in train and test always match. All hyperparameters are crossvalidated on the validation set using a coarse grid search.4.1 Main Results
LSL  GLSL  
Method  Representation  Generation  2  5  10  20  2  5  10  20  
Baseline  ResNet10 he2016deep    38.5  51.2  64.7  71.6  76.3  40.6  49.8  64.3  72.1  76.7 
SN koch2015siamese    38.9    64.6    76.4  48.7    68.3    73.8  
MAML finn2017model    39.2    64.2    76.8  49.5    69.6    74.2  
PN snell2017prototypical    39.4  52.2  66.6  72.0  76.5  49.3  61.0  69.6  72.8  74.7  
MN vinyals2016matching    43.6  54.0  66.0  72.5  76.9  54.4  61.0  69.0  73.7  76.5  
PMN wang2018low    43.3  55.7  68.4  74.0  77.0  55.8  63.1  71.1  75.0  77.1  
AWG gidaris2018dynamic    46.0  57.5  69.2  74.8  78.1  58.2  65.2  72.7  76.5  78.7  
FH hariharan2016low  ResNet10  LR w/ A.  40.7  50.8  62.0  69.3  76.4  52.2  59.7  68.6  73.3  76.9 
PFH  PN  LR w/ A.  41.5  52.2  63.5  71.8  76.4  53.6  61.7  69.0  73.5  75.9 
FI wang2018low  PN  metalearned LR  45.0  55.9  67.3  73.0  76.5  56.9  63.2  70.6  74.5  76.5 
PMN  metalearned LR  45.8  57.8  69.0  74.3  77.4  57.6  64.7  71.9  75.2  77.5  
CPAAN  ResNet10  cCovGAN  47.1  57.9  68.9  76.0  79.3  52.1  60.3  69.2  72.4  76.8 
(Ours)  PN  cGAN  38.6  51.8  64.9  71.9  76.2  49.4  61.5  69.7  73.0  75.1 
PN  cCycGAN  42.5  54.6  66.7  74.3  76.8  57.6  65.1  72.2  73.9  76.0  
PN  cDeLiGAN  46.0  58.1  68.8  74.6  77.4  58.0  65.1  72.4  74.8  76.9  
PN  cCovGAN  48.4  59.3  70.2  76.5  79.3  58.5  65.8  73.5  76.0  78.1 
LR w/ A.: Logistic Regressor with Analogies.
of all comparing methods under LSL and GLSL settings on ImageNet dataset. All results are averaged over five trials separately, and omit standard deviation for all numbers are of the order of
. The best and second best methods under each setting are marked in according formats.For comparisons, we include numbers reported in previous works under same experimental settings. Note that the results for MAML and SN are reported using their published codebases online. We decompose each method into stagewise operations for breaking performance gain down to detailed choices made in each stage.
We provide four models constructed with different GAN choices as justified in Section 3. All of our introduced CPAAN approaches are trained with NBSS which would be further investigated with ablation in next subsection. Results are shown in Table 1. Our best method consistently achieves significant improvement over previous augmentationbased approaches for different values of under both LSL and GLSL settings, achieving almost 2 performance gain compared to baselines. We also notice that apart from overall improvement, our best model achieves largest boost (~) at the lowest shot over naive baseline and over Feature Imagination (FI) wang2018low under the LSL setting, even though we use a simpler embedding technique (PN compared to their PMN). We believe such performance gain can be attributed to our advanced generation methods since at low shots, FI applies discrete transformations that its generator has previously learned while we can now sample through a smooth distribution combining all related base classes’ covariance information.
Note that in the LSL setting, all generative methods assume we still have access to original base examples when learning final classifiers while nongenerative baselines usually don’t have this constraint.
4.2 Discussions
In this subsection, we carefully examine our design choices for the final version of our CPAAN. We start by unpacking performance gain over the standard batch sampling procedure and proceed by showing both quantitative and qualitative evaluations on generation quality.
Ablation on NBS To validate the effectiveness of the NBS strategy over standard batch sampling for feature augmentation, we conduct an ablation study to show our absolute performance gain in Figure 4(a). In general, we empirically demonstrate that applying NBS improves the performance of lowshot recognition. We also show that the performance of NBSH is sensitive to the hyperparameter in the nearest neighbor search. Therefore, soft assignment is preferable if computational resources allow.
Quantitative Generation Quality We next quantitatively evaluate the generation quality of the variants introduced in Section 3 and previous works as shown in Figure 4(b). Note that for FH, we used their published codebase online; for FI, we implemented the network and train with the procedure described in the original paper. We measure the diversity of generation via the mean average pairwise Euclidean distance of generated examples within each novel class. We adopt same augmentation strategies as used for ImageNet experiments. For reference, the mean average Euclidean distance over real examples is . In summary, the results are consistent with our expectation and support our design choices. Feature Hallucination and Imagination show less diversity compared to real data. Naive cGAN even underperforms those baselines due to the mode collapse. Cycleconsistency and Gaussian mixture noise do help generation in both accuracy and diversity. However, they either under or overestimate the diversity. Our covariancepreserving objective leads to the best hallucination quality, since the generated distribution more closely resembles the real data diversity. Another insight from Figure 4(b) is that not surprisingly, underestimating data diversity is more detrimental to classification accuracy than overestimating.
Qualitative Generation Quality Figure 4(c), 4(d), 4(e) show tSNE maaten2008visualizing visualizations of the data generated by Feature Hallucination, Feature Imagination and our best model in the prototypical feature space. We fix the number of examples per novel class in all cases and plot their real distribution with translucent point clouds. The 5 real examples are plotted in crosses and synthesized examples are denoted by stars. Evidently, naive generators could only synthesize novel examples that are largely pulled together. Although tSNE might visually drag similar high dimensional points towards one mode, our model shows more diverse generation that is better aligned with the latent distribution, improving overall recognition performance by spreading seed examples in meaningful directions.
5 Conclusion
In this paper, we have presented a novel approach to lowshot learning that augments data for novel classes by training a cyclic GAN model, while shaping intraclass variability through similar base classes. We introduced and compared several GAN variants in a logical process and demonstrated the increasing performance of each model variant. Our proposed model significantly outperforms the state of the art on the challenging ImageNet benchmark in various settings. Quantitative and qualitative evaluations show the effectiveness of our method in generating realistic and diverse data for lowshot learning, given very few examples.
Acknowledgments This work was supported by the U.S. DARPA AIDA Program No. FA87501820014. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.
References
 [1] Ian J Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, and Yoshua Bengio. An empirical investigation of catastrophic forgetting in gradientbased neural networks. arXiv preprint arXiv:1312.6211, 2013.
 [2] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka GrabskaBarwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017.
 [3] Konstantin Shmelkov, Cordelia Schmid, and Karteek Alahari. Incremental learning of object detectors without catastrophic forgetting. arXiv preprint arXiv:1708.06977, 2017.
 [4] Yongqin Xian, Tobias Lorenz, Bernt Schiele, and Zeynep Akata. Feature generating networks for zeroshot learning. arXiv preprint arXiv:1712.00981, 2017.
 [5] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in Neural Information Processing Systems, pages 2234–2242, 2016.
 [6] Relja Arandjelović and Andrew Zisserman. Three things everyone should know to improve object retrieval. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2911–2918. IEEE, 2012.
 [7] YingCong Chen, Xiatian Zhu, WeiShi Zheng, and JianHuang Lai. Person reidentification by camera correlation aware feature augmentation. IEEE transactions on pattern analysis and machine intelligence, 40(2):392–408, 2018.
 [8] Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In Advances in Neural Information Processing Systems, pages 3630–3638, 2016.
 [9] Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for fewshot learning. In Advances in Neural Information Processing Systems, pages 4080–4090, 2017.
 [10] Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. Learning to compare: Relation network for fewshot learning. arXiv preprint arXiv:1711.06025, 2017.
 [11] Bharath Hariharan and Ross Girshick. Lowshot visual recognition by shrinking and hallucinating features. arXiv preprint arXiv:1606.02819, 2016.
 [12] YuXiong Wang, Ross Girshick, Martial Hebert, and Bharath Hariharan. Lowshot learning from imaginary data. arXiv preprint arXiv:1801.05401, 2018.
 [13] Xun Huang, MingYu Liu, Serge Belongie, and Jan Kautz. Multimodal unsupervised imagetoimage translation networks. arXiv preprint arXiv:1804.04732, 2018.
 [14] Ian Goodfellow, Jean PougetAbadie, Mehdi Mirza, Bing Xu, David WardeFarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
 [15] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
 [16] JunYan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired imagetoimage translation using cycleconsistent adversarial networks. arXiv preprint arXiv:1703.10593, 2017.
 [17] JunYan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A Efros, Oliver Wang, and Eli Shechtman. Toward multimodal imagetoimage translation. In Advances in Neural Information Processing Systems, pages 465–476, 2017.
 [18] Jia Deng, Wei Dong, Richard Socher, LiJia Li, Kai Li, and Li FeiFei. Imagenet: A largescale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. IEEE, 2009.
 [19] David Lei, Michael A Hitt, and Richard Bettis. Dynamic core competences through metalearning and strategic context. Journal of management, 22(4):549–569, 1996.
 [20] Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis with auxiliary classifier gans. arXiv preprint arXiv:1610.09585, 2016.
 [21] James Tobin. Estimation of relationships for limited dependent variables. Econometrica: journal of the Econometric Society, pages 24–36, 1958.
 [22] Emmanuel Candes, Terence Tao, et al. The dantzig selector: Statistical estimation when p is much larger than n. The Annals of Statistics, 35(6):2313–2351, 2007.

[23]
Ruslan Salakhutdinov, Joshua Tenenbaum, and Antonio Torralba.
Oneshot learning with a hierarchical nonparametric bayesian model.
In
Proceedings of ICML Workshop on Unsupervised and Transfer Learning
, pages 195–206, 2012.  [24] Danilo Jimenez Rezende, Shakir Mohamed, Ivo Danihelka, Karol Gregor, and Daan Wierstra. Oneshot generalization in deep generative models. arXiv preprint arXiv:1603.05106, 2016.
 [25] Swaminathan Gurumurthy, Ravi Kiran Sarvadevabhatla, and V Babu Radhakrishnan. Deligan: Generative adversarial networks for diverse and limited data. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, 2017.
 [26] Youssef Mroueh, Tom Sercu, and Vaibhava Goel. Mcgan: Mean and covariance feature matching gan. arXiv preprint arXiv:1702.08398, 2017.
 [27] Youssef Mroueh and Tom Sercu. Fisher gan. In Advances in Neural Information Processing Systems, pages 2510–2520, 2017.
 [28] Jacob Goldberger, Geoffrey E Hinton, Sam T Roweis, and Ruslan R Salakhutdinov. Neighbourhood components analysis. In Advances in neural information processing systems, pages 513–520, 2005.
 [29] Peiliang Xu. Truncated svd methods for discrete linear illposed problems. Geophysical Journal International, 135(2):505–514, 1998.
 [30] Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. Oneshot learning with memoryaugmented neural networks. arXiv preprint arXiv:1605.06065, 2016.
 [31] Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. Siamese neural networks for oneshot image recognition. In ICML Deep Learning Workshop, volume 2, 2015.
 [32] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Modelagnostic metalearning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400, 2017.
 [33] Spyros Gidaris and Nikos Komodakis. Dynamic fewshot visual learning without forgetting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4367–4375, 2018.
 [34] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.
 [35] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [36] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.

[37]
Laurens van der Maaten and Geoffrey Hinton.
Visualizing data using tsne.
Journal of machine learning research
, 9(Nov):2579–2605, 2008.
Comments
There are no comments yet.