1 Introduction
We address the problem of object recognition from a very small amount of labeled data. This problem is of particular importance when limited labels can be collected due to either time or financial constraints. Though this is a difficult challenge, we are encouraged by evidence from cognitive science suggesting that infants can quickly learn new concepts from very few examples [19, 1].
Many recognition problems in computer vision are concerned with learning on few labeled data. Semisupervised learning, transfer learning, and fewshot recognition all aim to achieve fast generalization from few examples, by leveraging unlabeled data or labeled data from other domains.
The fundamental difficulty of this problem is that naive supervised training with very few examples results in severe overfitting. Because of this, prior work in semisupervised learning rely on strong regularizations such as augmentations [9], temporal consistency [18], and adversarial examples [25]
to improve performance. Some related works in fewshot learning do not even refine an online classifier. Instead, they simply apply the similarity metric learned from training categories to new categories without adaptation. Meta learning
[7] seeks to optimize an online parametric classifier with few samples, but under the assumption that just a few steps of optimization will lead to effective generalization with less overfitting. These approaches indirectly address the inherent problem of limited training data.In this paper, we propose a novel approach to this problem, where we create training data by label propagation to an unlabeled dataset, so that training a supervised model with great learning capacity no longer faces overfitting. This approach is related to work on “pseudolabeling” [20, 29], where the model is bootstrapped from limited data and trained on the new data/label pairs it infers. However, that is unlikely to work well when the labeled data is scarce, since the initial model is likely to be poor. Our work shares the spirit of amazing work [6]
showing that label propagation can work well with simple GIST descriptors. We bring it to the context of deep learning, and show that it is the metric transfer that enables accurate, diverse, and generalizable label propagation.
Our approach works with three data domains: a source domain to learn a similarity metric, few labeled examples to define the target problem, and an unlabeled dataset in which to propagate labels. Concretely, we first learn a similarity metric on the source domain, which can be either labeled or unlabeled. Supervised learning or unsupervised (selfsupervised) learning is used to learn the metric accordingly. Then, given few observations of the target problem, we propagate the labels from these observations to the unlabeled dataset using the metric learned in the source domain. This creates an abundance of labeled data for learning a classifier. Finally, we train a standard supervised model using the propagated labels.
The main contribution of this work is the metric transfer approach for label propagation. By studying different combinations of metric pretraining methods (e.g. unsupervised, supervised) and label propagation algorithms (e.g.
nearest neighbors, spectral clustering), we find that our metric transfer approach on unlabeled data is general enough to work effectively for many settings. For semisupervised learning on CIFAR10, we obtain an absolute
improvement over the stateoftheart when labeled data is limited ( examples). Our work also provides an alternative approach for transfer learning and fewshot recognition when unlabeled data is given. Compared to pretraining on the source dataset and then finetuning on the limited labeled examples, we achieve a improvement on transferring representations from ImageNet to CIFAR10. We also demonstrate improved performance for fewshot recognition on the miniImageNet benchmark.2 Related Work
Largescale Recognition. To solve a computer vision problem, it has become a common practice to build a largescale dataset [5, 2] and train deep neural networks [17, 32] on it. This philosophy has achieved unprecedented success on many important computer vision problems [5, 22, 30]
. However, constructing a largescale dataset is often timeconsuming and expensive, and this has motivated work on unsupervised learning and problems defined on few labeled samples.
Semisupervised Learning. Semisupervised learning [38] is a problem that lies in between supervised learning and unsupervised learning. It aims to make more accurate predictions by leveraging a large amount of unlabeled data than by relying on the labeled data alone. In the era of deep learning, one line of work leverages unlabeled data through deep generative models [15, 27]. However, training of generative models is often unstable, making it tricky to work with recognition tasks. Recent efforts on semisupervised learning focus on regularization by selfensembling through consistency loss, such as temporal ensembling [18], adversarial ensembling [25], and teacherstudent distillation [34]. These models treat labeled data and unlabeled data separately without considering their relationships. The pseudolabeling approach [20, 29] initializes a model on a smalled labeled dataset and bootstraps on the new data it predicts. This tends to fail when the labeled set is small. Our work is most closely related to label propagation approaches [6, 37], and we propose metric transfer to significantly improve the propagation performance.
Fewshot Recognition. Given some training data in training categories, fewshot recognition [1] requires the classifier to generalize to new categories from observing very few examples, often 1shot or 5shot. A body of work approaches this problem by offline metric learning [35, 33, 39], where a generic similarity metric is learned on the training data and directly transferred to the new categories using simple nearest neighbor classifiers without further adaptation. Recent works on metalearning [7, 21, 24] take a learningtolearn approach using online algorithms. In order not to overfit to the few examples, they develop metalearners to find a common embedding space, which can be further finetuned with fast convergence to the target problem. Recent works [28, 8] using metalearning consider the combined problem of semisupervised learning and fewshot recognition, by allowing access to unlabeled data in fewshot recognition. This drives fewshot recognition into more realistic scenarios. We follow this setting as we study fewshot recognition.
Transfer Learning. Since the inception of the ImageNet challenge [30], transfer learning has emerged almost everywhere in visual recognition, such as in object detection [10] and semantic segmentation [23], by simply transferring the network weights learned on ImageNet classification and finetuning on the target task. When the pretraining task and the target task are closely related, this tends to generalize much better than training from scratch on the target task alone. Domain adaptation seeks to address a much more difficult scenario where there is a large gap between the inputs of the source and target domains [13], for example, between real images and synthetic images. What we study in this paper is metric transfer. Different from prior work [41] that employ metric transfer just to reduce the distribution divergence of different domains, we use metric transfer to propagate labels. Through this, we show that metric propagation is an effective method for learning with small data.
3 Approach
To deal with the shortage of labeled data, our approach is to enlarge it by propagating labels from annotated images to unlabeled data using the similarity metric between data pairs. The creation of much more labeled data enables us to train deep neural networks to their full learning capacity.
Our framework works on three data domains: the source domain , the target domain , and additional unlabeled data . The source domain can be labeled or unlabeled with abundant data, and it is used to learn a generic similarity metric between data pairs. The target domain only has few labeled data, but it defines the problem we want to optimize. The unlabeled data is the resource in which to propagate labels, and may potentially contain similar classes to the task defined in . It may or may not have overlapping classes with . Below we introduce our approach in detail.
3.1 Metric Pretraining
The source domain is used for pretraining a similarity metric between data pairs. Ideally, we desire the metric to capture the inherent structure in the target domain , so that transferring labels from is reliable and useful. For this to happen, we usually hold some prior knowledge about the source and the target . For example, the source domain is sampled from the same distribution as the target domain, but is completely unannotated, or the source domain is annotated with a different task but is closely related to the target. Formally, a similarity metric between data and can be defined as
(1) 
where
is the similarity function to be learned. In this work, we use deep neural networks as a parametric model of this similarity function. The metric can be trained with either supervised or unsupervised methods, depending on whether labels are given in the source domain
. We briefly review the training algorithms as follows.Unsupervised Metric Pretraining
Recently, there has been growing interest in unsupervised learning and selfsupervised learning.
Different algorithms are based on different data properties (e.g. color [43], context [3], motion [44]) and thus may vary in performance on the target task we may want to transfer.
However, it is not our intent to give a comprehensive comparison over various methods and choose the best one.
Instead, we show that general unsupervised transfer
is beneficial for label propagation and leads to improved performance.
In this work, we utilize two unsupervised learning methods: instance discrimination [40]
and colorization
[43]. For instance discrimination, we treat each instance as a class, and maximize the probability of each example belonging to the class of itself,
(2) 
For colorization, the idea is to learn a mapping from grayscale images to colorful ones. Following the original paper [43], instead of predicting raw pixel colors, we quantize the color space into soft bins , and use the crossentropy loss on the soft bins,
(3) 
where are spatial indices. We follow previous work [4] for applying ResNet to colorization, where we use a base network to map inputs to features, and a head network of three convolutional layers to convert features to colors. Since colorization does not automatically output a metric, we use the Euclidean distance on the features from the base network to measure similarity.
Supervised Metric Pretraining
In some scenarios, we have access to a labeled dataset, such as PASCAL VOC and ImageNet, having commonalities with the target task. Traditional metric learning with supervision minimizes the intraclass distance and maximizes the interclass distance of the labeled samples. For this purpose, many types of loss functions such as contrastive loss, triplet loss
[12], and neighborhood analysis [11] have been proposed. In this work, we use neighborhood analysis [11] to learn our metric. Concretely, we maximize the likelihood of each example being supported by other examples belonging to the same category,(4) 
3.2 Label Propagation
Given a target represented by a small number of labeled examples, and a unlabeled set , we propagate labels from to using the similarity function learned from . Suppose , and , where and are the number of images in , respectively. Label
is represented as a vector with the groundtruth class element set to
and the others set to . We consider two propagation algorithms.Naive Nearest Neighbors
A straightforward propagation approach is to vote for the class of an unlabeled sample based on its similarity to each of the exemplars in the target set . For an unlabeled example
, we calculate its logits
for every class ,(5) 
where is the indicator function, denotes the similarity between example and , and is the number of labeled images available for class .
The nearest neighbor propagation method is essentially a onestep random walk where the similarity metric acts as the transition matrix and the indicator function acts as the initial distribution. The effectiveness of such onestep propagation depends heavily on the quality of the similarity metric.
In general, it is hard to learn such a metric well, especially when limited supervision is available, because of the visual diversity of images. Figure 2 (left) shows a typical similarity matrix computed from unsupervised features. Data points in the similarity matrix are sparsely connected, thus limiting the onestep label propagation approach.
Constrained Spectral Clustering
Constrained spectral clustering [14, 6] may potentially relieve such a problem. Instead of propagating labels by one step as in the naive nearest neighbor approach, constrained spectral clustering propagates labels through multiple steps by taking advantage of structure within the unlabeled dataset.
It computes a spectral embedding [31, 36] from the original similarity metric, which is then used as the new metric for label propagation.
The spectral embedding is formulated as
(6) 
where and
are the eigenvalues and eigenvectors of the normalized Laplacian in ascending order. The Laplacian matrix
is derived from the original similarity metric as , with degree matrix and . Parameter is the total number of eigen components used.Due to its globalized nature, spectral clustering is able to pass messages between distant areas, which is in contrast to the local behavior of the naive nearest neighbors approach. The embedded metric is usually densely connected and better aligned with object classes, as illustrated in Figure 2 (right). Using the same voting approach as in Eqn (5), labeled propagation can be more accurate than using the original raw similarity metric.
Constrained spectral clustering is also efficient. By following the common practice of using nearest neighbors to build the similarity graph [36], propagating labels to images takes about 10 seconds on a regular GPU.


Metric pretraining  Propagation method  50  100  250  500  1000  2000  4000  8000  

Nearest neighbor  22.03  25.74  48.35  68.03  77.57  77.28  87.77  90.88  
Spectral  23.49  28.88  54.46  70.02  80.94  87.77  93.94  96.23  

Nearest neighbor  57.32  67.61  75.48  79.34  80.70  82.14  83.66  84.79  
Spectral  60.85  67.34  76.31  80.04  81.78  81.89  82.93  82.03  

Nearest neighbor  54.82  62.99  77.08  84.90  88.68  91.34  92.72  93.67  
Spectral  72.59  79.21  86.64  90.01  91.04  91.57  91.77  91.94  



Metric pretraining  Propagation method  50  100  250  500  1000  2000  4000  8000  
No  No  20.95  25.35  41.63  54.06  65.08  73.22  81.44  86.23  

Nearest neighbor  21.79  25.37  42.70  54.14  68.08  75.17  83.30  87.68  
Spectral  22.78  27.95  47.28  60.73  72.60  78.20  85.10  88.26  

No  49.57  55.41  64.65  68.81  73.40  77.93  82.17  86.25  
Nearest neighbor  49.96  52.69  65.63  65.88  70.88  76.36  80.16  84.64  
Spectral  53.47  55.08  68.40  71.15  72.38  76.50  80.31  84.03  

No  35.27  37.87  62.46  71.04  75.96  80.12  83.90  87.82  
Nearest neighbor  46.68  54.45  66.93  74.16  79.17  82.24  84.56  87.92  
Spectral  56.34  63.53  71.26  74.77  79.38  82.34  84.52  87.48  

3.3 Confidence Weighted Supervised Training
Given the logits , the pseudo label
is estimated as
(7) 
With the estimated pseudo labels on the unlabeled data, we have considerably more data for training a classifier. However, the pseudo labels may not be accurate, and directly using these labels may lead to degraded performance. For example, not all the data in the unlabeled set are related to the target problem. Here, we devise a simple weighting mechanism to compensate for inaccurate labels.
Given the logits produced by the label propagation algorithm, we first normalize it into a probabilistic distribution,
(8) 
where indexes the dimension of categories, and the temperature controls the sharpness of the distribution. We then define the confidence measure of the pseudo label as the difference between the maximum response and the second largest response,
(9) 
A high value of indicates a confident estimate of the pseudo label, and a low value of indicates an ambiguous estimate. In Figure 3, we measure the accumulated accuracy of pseudo labels on validation data sorted by this confidence. It can be seen that our confidence measure gives a good indication of the quality of pseudo labels.
Our final training criterion is given by
(10) 
where is the pseudo label for example , and is the softmax probability output of the classification network.
In practice, since some pseudo labels have relatively low confidence, e.g. , and thus contribute negligibly to the overall learning criterion, we may safely discard those examples to speed up learning.
4 Experiments
Through experiments, we show that, with unlabeled data, metric propagation is able to effectively label lots of data when little labeled data is given. We verify our approach on semisupervised learning, where an unsupervised metric is transferred, and on transfer learning, where supervised metrics generalize across different data distributions, and on fewshot recognition, where the metric can generalize across openset object categories. While studying fewshot recognition, we leverage an extra unlabeled data for label propagation, which is also known as semisupervised fewshot recognition [28].
Our approach has two major hyperparameters: the number of the eigenvectors for spectral clustering and the temperature controlling the confidence distribution. Different parameter settings may slightly change the performance. We use and across the experiments. A detailed analysis is provided in the supplementary materials.
4.1 SemiSupervised Learning


Methods  Network architectures  50  100  250  500  1000  2000  4000  8000  

WideResNet282  29.66  36.62  45.49  57.19  65.07  79.26  84.38  87.55  
WideResNet2810  27.35  38.83  49.44  59.45  70.03  82.62  86.71  89.38  

WideResNet282  56.34  63.53  71.26  74.77  79.38  82.34  84.52  87.48  
WideResNet2810  73.13  75.87  80.30  81.76  84.97  86.82  88.70  91.01  

We follow a recent evaluation paper [26], which gives a comprehensive benchmark for stateoftheart semisupervised learning approaches. All of our experiments are conducted on the CIFAR10 [16] dataset. We use the same WideResNet [42] architecture with 28 layers and a width factor of 2. We report performance as we vary the number of labeled examples from to of the total examples in the original CIFAR10 dataset.
For training our model, we pretrain the metric using the unlabeled split of CIFAR10, and propagate labels to the same unlabeled set. This means in our framework. We use SGD for optimization with an initial learning rate of 0.01 and a cosine decay schedule. We fix the total number of optimization iterations to
as opposed to fixing optimization epochs, because it gives more consistent comparisons when the number of labeled data varies.
Study of different pretrained metrics.
Our label propagation algorithm needs a pretrained similarity metric to guide it. The pretrained metric can be learned by supervised methods using limited labeled data, or by unsupervised methods using largescale unlabeled data. Here, we consider three metric pretraining methods:

[nolistsep]

supervised learning on limited labeled data.

selfsupervised learning by image colorization [43].

unsupervised learning by instance discrimination [40].
We train the models using the optimal parameters for each pretraining method. Then we use cosine similarity in the feature space for propagating labels to the unlabeled data.
In Table 1, we evaluate the quality of pseudo labels as the mean average precision (mAP) sorted by the confidence as in Figure 3. Table 2 lists the final semisupervised recognition accuracy. We can see that both unsupervised methods generalize much better than the supervised bootstrapping method most of the time, until the labeled set is relatively large with 4000 labels. This confirms our claim that unsupervised transfer is the key for label propagation. For the unsupervised methods, nonparametric metric learning performs better than colorization, probably because it explicitly learns a similarity metric. We also include the result of the naive baseline which trains from scratch using limited labeled data without label propagation.
Study of different label propagation schemes.
Given the pretrained metrics, there are various ways to transfer the metrics. We consider three possible solutions:

[nolistsep]

no propagation, only transfer network weights.

nearest neighbor metric transfer.

spectral metric transfer.
The first baseline is a common practice, which basically transfers the network weights and then finetunes on the labeled data. The second is much weaker than the third because it only considers onehop distances, without taking into account the similarities between unlabeled pairs.
The results are summarized in Table 1 and Table 2. Compared to the stateoftheart performance in Table 4, even a simple finetuning approach outperforms the stateofthearts when the labeled data is small. For example, by finetuning from instance discrimination, we achieve with labeled data, significantly outperforming the stateoftheart result of . This suggests that unsupervised pretraining generally improves semisupervised learning.
When unlabeled data is used for label propagation, metric transfer can be much stronger than just weight transfer, improving the performance to with labeled data. It is also evident that the spectral clustering method performs better than weighted nearest neighbors because of its globalization behavior.


Num Labeled  250  4000 


Ours  71.26  84.52 


Pi Model [18]  47.07  84.17 
+Ours  74.90  85.32 


Mean Teacher [34]  45.49  84.38 
+Ours  74.54  85.45 


VAT [25]  44.83  86.79 
+Ours  78.34  86.93 


VAT+EM [25]  46.29  86.96 
+Ours  78.63  87.20 



Metric pretraining  Transfer method  50  100  250  500  1000  2000  4000  8000  



Network finetuning  28.92  34.56  57.14  67.54  76.20  80.92  85.01  88.74  
Spectral  44.30  46.51  61.29  68.31  72.61  77.86  84.00  88.19  



Network finetuning  54.95  61.88  73.01  78.43  84.52  88.79  91.44  93.05  
Spectral  77.71  85.34  86.07  86.91  88.27  89.93  91.22  93.49  

Scalability to large network architectures.
In contrast to prior methods which face overfitting issues, our approach can easily scale to larger network architectures. Here, we keep all the learning parameters unchanged, and experiment with a wider version of WideResNet28 with a width factor of 10. We consider a stateoftheart method meanteacher [34] for comparison. In Table 3, meanteacher only shows a limited improvement of about . Our method enjoys consistently significant gains from a larger network on all the testing scenarios. It achieves an unprecedented accuracy using only labels with WideResNet2810.
Comparison to stateoftheart methods.
We compare our approach to state of the art methods in Figure 4. Ours is particularly stronger when the labeled set is small, but this advantage diminishes as the labeled set grows. However, as most prior approaches focus on selfensembling, ours is orthogonal to them. We examine the complementarity of our method by combining it with each of the prior approaches. To do so, we generate our most confident pseudo labels (about of the full data), and use it as groundtruth for the other algorithms. For fair comparisons, we run public code^{1}^{1}1https://github.com/brainresearch/realisticsslevaluation with our generated pseudo labels. In Table 4, combining our approach leads to improved performance for all of the methods.
4.2 Transfer Learning
So far, our experiments have been conducted on CIFAR10, splitting the entire dataset into labeled and unlabeled splits. We also examine whether the proposed metric transfer can work across different data distributions. For this, we study supervised and unsupervised pretraining for transfer learning.


Method  Fintune  5way Classification  
1shot  5shot  
NN baseline [35]  No  41.10.7  51.00.7 
MAML [7]  Yes  48.70.7  63.20.9 
MetaSGD [21]  No  50.51.9  64.00.9 
Matching net [35]  Yes  46.60.8  60.00.7 
Prototypical [33]  No  49.40.8  68.20.7 
Soft kmeans [28] 
Yes  50.40.3  64.40.2 


SNCA [39]  No  50.30.7  64.10.8 
Our supervised  Yes  56.10.6  70.70.5 
Our unsupervised  Yes  50.80.6  66.00.5 

Transferring from labeled ImageNet. We resize ImageNet images to a resolution of and pretrain the metric on them by supervised learning. We keep the network architecture WideResNet282 for meaningful comparison with the semisupervised settings in Sec 4.1. This obtains an accuracy of on the ImageNet validation set. Then we transfer the metric to CIFAR10. This transfer is conducted by network finetuning and by metric propagation. In Table 5, we can see that simple network finetuning can reach the best results obtained in the semisupervised settings of the previous subsection. By using label propagation with spectral clustering, we can observe a large improvement, yielding accuracy with just labeled images. This illustrates the generality of our metric transfer approach, where supervised transfer can also take advantage of unlabeled data to improve generalization.
Transferring from unlabeled ImageNet. Instead of supervised training which encodes prior knowledge about object categories, we treat ImageNet images as unlabeled and repeat the previous experiment. Different from the earlier unsupervised experiments, this setting involves substantially more unlabeled data, which could potentially lead to a better unsupervised metric. However, our results suggest otherwise. When propagating to CIFAR10, the unsupervised metric learned from ImageNet is inferior to the metric learned from CIFAR10. This is possibly due to the data distribution gap between CIFAR10 and ImageNet. Nevertheless, our unsupervised transfer from ImageNet still surpasses the stateoftheart in the semisupervised setting when labeled samples are limited.
4.3 FewShot Recognition
Fewshot recognition targets a more challenging scenario, the generalization across object categories (a.k.a. openset recognition). Originally, the problem is defined with numerous labeled examples in a source dataset, and few examples in the target categories. Recent works [28, 8] also explore the scenario where extra unlabeled data is available for this problem. This fits into our framework for studying label propagation via metric transfer.
We follow the protocols in [28] for conducting the experiments, because it introduces distractor categories in the unlabeled set. The experiments are evaluated on the miniImageNet dataset, consisting of a total of categories, with for training, for validation and for testing. Images in each category are split into as labeled, and as unlabeled. Training uses only the labeled split in the training categories. During evaluation, a testing episode is constructed by sampling fewshot labeled observations from the labeled split in the testing categories, and all of the unlabeled images in all the testing categories. A testing episode requires the model to find useful information in the unlabeled set to aid recognition from the fewshot observations. Unlike [28], which includes five distractor categories in the unlabeled set, we consider all categories in the testing set, which better reflects practical scenarios. We test episodes and report the results.
We follow prior work [35]
by using a shallow architecture with four convolutional layers and a final fully connected layer. Each convolutional layer has 64 channels, interleaved with ReLU, subsampling and a batch normalization layer. Images are resized to
to train the model. We use the spectral embedding approach for label propagation. During online training, we use an initial learning rate of with a total of 30 epochs and decrease the learning rate to be times smaller after epochs.Transfer from supervised models. We use a recent supervised metric learning approach [39] as the baseline. After label propagation and finetuning on the new data, we obtain a significant performance boost of . Prior work [28] improves upon its baselines, but fails to make further improvement because of limited () training data. In Figure 5, we visualize the top retrievals from the unlabeled set in the oneshot scenario. These retrievals not only accurately belong to the same class as the ground truth, but their diverse appearance facilitates learning a strong classifier.
Transfer from unsupervised models. We also investigate pretraining the metric without labels, using instance discrimination [40] for learning the metric. Surprisingly, in Table 6, this obtains better performance than the offline metric learning approach with annotations [39], by in 1shot recognition and for 5shot. This suggests that leveraging unlabeled data in the target problem can be more beneficial than using labeled samples in the source domain for fewshot recognition.
5 Discussions

[nolistsep]

The effectiveness of label propagation depends heavily on the learned metric, so advances in metric learning should lead to improved results. Since the prevalent pretraining methods in deep learning use softmax classification, we hope to draw more attention to pretraining networks with metric learning.

Currently, we study metric pretraining and label propagation separately. It may be beneficial to formulate them jointly in an endtoend framework, which would be an interesting direction for future work.

Because of the label propagation process, the complexity of our approach depends on the unlabeled dataset, instead of the target problem. Our current algorithm cannot run in an online fashion. We hope to address this in the future.

Our algorithm takes advantage of the unlabeled dataset to create more training data. The overall performance is affected by the relevance of image content in the unlabeled set to that of the target , as this impacts the ability to effectively propagate labels.
Appendix A1 Ablations of Model Parameters
Our model depends on two parameters: the number of eigen components used for spectral clustering, and the temperature used for controlling the confidence. We used and in our main submission. In Figure 6, we show the effects of the two parameters respectively.
The number of eigenvectors works well in the range between and . We can see a tradeoff of the value for performance under various number of labeled samples. Smaller benefits very few labeled samples, while larger benefits comparably more labeled samples. For the temperature parameter , it is generally robust for a wide range of values between to .
Appendix A2 Additional Visualizations
We provide more retrieval visualizations in the CIFAR10 and miniImageNet dataset in Figure 7 and Figure 8. For CIFAR10, we show the top retrievals for each class in the unlabeled set given labeled examples. For miniImageNet, we show the top retrievals in the 5class 1shot scenario.
References
 [1] S. Carey and E. Bartlett. Acquiring a single new word. 1978.
 [2] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. FeiFei. Imagenet: A largescale hierarchical image database. In CVPR. Ieee, 2009.
 [3] C. Doersch, A. Gupta, and A. A. Efros. Unsupervised visual representation learning by context prediction. In ICCV, 2015.
 [4] C. Doersch and A. Zisserman. Multitask selfsupervised visual learning. In ICCV, 2017.
 [5] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. IJCV, 2010.
 [6] R. Fergus, Y. Weiss, and A. Torralba. Semisupervised learning in gigantic image collections. In NIPS, 2009.
 [7] C. Finn, P. Abbeel, and S. Levine. Modelagnostic metalearning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400, 2017.
 [8] V. Garcia and J. Bruna. Fewshot learning with graph neural networks. arXiv preprint arXiv:1711.04043, 2017.
 [9] X. Gastaldi. Shakeshake regularization. arXiv preprint arXiv:1705.07485, 2017.
 [10] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
 [11] J. Goldberger, G. E. Hinton, S. T. Roweis, and R. R. Salakhutdinov. Neighbourhood components analysis. In Advances in neural information processing systems, 2005.

[12]
E. Hoffer and N. Ailon.
Deep metric learning using triplet network.
In
International Workshop on SimilarityBased Pattern Recognition
. Springer, 2015.  [13] J. Hoffman, E. Tzeng, T. Park, J.Y. Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell. Cycada: Cycleconsistent adversarial domain adaptation. arXiv preprint arXiv:1711.03213, 2017.
 [14] H. Hu, J. Feng, C. Yu, and J. Zhou. Multiclass constrained normalized cut with hard, soft, unary and pairwise priors and its applications to object segmentation. TIP, 2013.
 [15] D. P. Kingma, S. Mohamed, D. J. Rezende, and M. Welling. Semisupervised learning with deep generative models. In NIPS, 2014.
 [16] A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.

[17]
A. Krizhevsky, I. Sutskever, and G. E. Hinton.
Imagenet classification with deep convolutional neural networks.
In Advances in neural information processing systems, 2012.  [18] S. Laine and T. Aila. Temporal ensembling for semisupervised learning. arXiv preprint arXiv:1610.02242, 2016.
 [19] B. Lake, R. Salakhutdinov, J. Gross, and J. Tenenbaum. One shot learning of simple visual concepts. In Proceedings of the Annual Meeting of the Cognitive Science Society, 2011.
 [20] D.H. Lee. Pseudolabel: The simple and efficient semisupervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, 2013.
 [21] Z. Li, F. Zhou, F. Chen, and H. Li. Metasgd: Learning to learn quickly for few shot learning. arXiv preprint arXiv:1707.09835, 2017.
 [22] T.Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV. Springer, 2014.
 [23] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, pages 3431–3440, 2015.
 [24] N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel. A simple neural attentive metalearner. 2018.
 [25] T. Miyato, S.i. Maeda, M. Koyama, and S. Ishii. Virtual adversarial training: a regularization method for supervised and semisupervised learning. 2017.
 [26] A. Oliver, A. Odena, C. Raffel, E. D. Cubuk, and I. J. Goodfellow. Realistic evaluation of semisupervised learning algorithms. 2018.
 [27] A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko. Semisupervised learning with ladder networks. In NIPS, 2015.
 [28] M. Ren, E. Triantafillou, S. Ravi, J. Snell, K. Swersky, J. B. Tenenbaum, H. Larochelle, and R. S. Zemel. Metalearning for semisupervised fewshot classification. arXiv preprint arXiv:1803.00676, 2018.
 [29] C. Rosenberg, M. Hebert, and H. Schneiderman. Semisupervised selftraining of object detection models. In WACV/MOTION, 2005.
 [30] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. IJCV, 2015.
 [31] J. Shi and J. Malik. Normalized cuts and image segmentation. TPAMI, 2000.
 [32] K. Simonyan and A. Zisserman. Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556, 2014.
 [33] J. Snell, K. Swersky, and R. Zemel. Prototypical networks for fewshot learning. In NIPS, 2017.
 [34] A. Tarvainen and H. Valpola. Mean teachers are better role models: Weightaveraged consistency targets improve semisupervised deep learning results. In NIPS, 2017.
 [35] O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. Matching networks for one shot learning. In NIPS, 2016.
 [36] U. Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 2007.
 [37] J. Wang, S. Kumar, and S.F. Chang. Semisupervised hashing for largescale search. TPAMI, 2012.
 [38] J. Weston, F. Ratle, H. Mobahi, and R. Collobert. Deep learning via semisupervised embedding. In Neural Networks: Tricks of the Trade. Springer, 2012.
 [39] Z. Wu, A. A. Efros, and S. X. Yu. Improving generalization via scalable neighborhood component analysis. arXiv preprint arXiv:1808.04699, 2018.
 [40] Z. Wu, Y. Xiong, X. Y. Stella, and D. Lin. Unsupervised feature learning via nonparametric instance discrimination. In CVPR, 2018.
 [41] Y. Xu, S. J. Pan, H. Xiong, Q. Wu, R. Luo, H. Min, and H. Song. A unified framework for metric transfer learning. IEEE Trans. Knowl. Data Eng., 2017.
 [42] S. Zagoruyko and N. Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
 [43] R. Zhang, P. Isola, and A. A. Efros. Colorful image colorization. In ECCV. Springer, 2016.
 [44] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe. Unsupervised learning of depth and egomotion from video. In CVPR, 2017.
Comments
There are no comments yet.