Few-shot learning Miller et al. (2000); Fei-Fei et al. (2006); Vinyals et al. (2016) aims to learn a model that generalizes well with a few instances of each novel class. In general, a few-shot learner is firstly trained on a substantial annotated dataset, also noted as the base-class set, and then adapted to unseen novel classes with a few labeled instances. During the evaluation, a set of few-shot tasks are fed to the learner, where each task consists of a few support (labeled) samples and a certain number of query (unlabeled) data. This research topic has been proved immensely appealing in the past few years, as a large number of few-shot learning methods are proposed from various perspectives. Mainstream methods can be roughly grouped into two categories. The first one is learning from episodes Vinyals et al. (2016)
, also known as meta-learning, which adopts the base-class data to create a set of episodes. Each episode is a few-shot learning task, with support and query samples that simulate the evaluation procedure. The second type is the transfer-learning based method, which focuses on learning a decent classifier by transferring the domain knowledge from a model pre-trained on the large base-class setChen et al. (2018); Qiao et al. (2018). This paradigm decouples the few-shot learning progress into representation learning and classification, and has shown favorable performance against meta-learning methods in recent works Tian et al. (2020); Ziko et al. (2020). Our method shares somewhat similar motivation with transfer-learning based methods and proposes to utilize the extra unlabeled novel-class data and a pre-trained embedding to tackle the few-shot problem.
Compared with collecting labeled novel-class data, it is much easier to obtain abundant unlabeled data from these classes. Therefore, semi-supervised few-shot learning (SSFSL) Ren et al. (2018); Liu et al. (2018); Li et al. (2019b); Yu et al. (2020) is proposed to combine the auxiliary information from labeled base-class data and extra unlabeled novel-class data to enhance the performance of few-shot learners. The core challenge in SSFSL is how to fully explore the auxiliary information from these unlabeled. Previous SSFSL works indicate that graph-based models Liu et al. (2018); Ziko et al. (2020) can learn a better classifier than inductive ones Ren et al. (2018); Li et al. (2019b); Yu et al. (2020), since these methods directly model the relationship between the labeled and unlabeled samples during the inference. However, current graph-based models adopt the Laplace learning Zhu et al. (2003) to conduct label propagation, the solutions of Laplace learning develop localized spikes near the labeled samples but are almost constant far from the labeled samples, i.e., label values are not propagated well, especially with few labeled samples. Therefore, these models suffer from the underdeveloped message-passing capacity for the labels. On the other hand, most SSFSL methods adapt the feature embedding pre-trained on base-class data (meta- or transfer- pre-trained) as the novel-class embedding. This may lead to the embedding degeneration problem, as the pre-trained model is designed for the base-class recognition, it tends to learn the embedding that represents only base-class information, and lose information that might be useful outside base classes.
To address the above issues, we propose a novel transfer-learning based SSFSL method, named Poisson Transfer Network (PTN). Specifically, to improve the capacity of graph-based SSFSL models in message passing, we propose to revise the Poisson model tailored for few-shot problems by incorporating the query feature calibration and the Poisson MBO model. Poisson learning Calder et al. (2020) has been provably more stable and informative than traditional Laplace learning in low label rate semi-supervised problems. However, directly employing Poisson MBO for SSFSL may suffer from the cross-class bias due to the data distribution drift between the support and query data. Therefore, we improve the Poisson MBO model by explicitly eliminating the cross-class bias before label inference. To tackle the novel-class embedding degeneration problem, we propose to transfer the pre-trained base-class embedding to the novel-class embedding by adopting unsupervised contrastive training He et al. (2020); Chen et al. (2020) on the extra unlabeled novel-class data. Constraining the distances between the augmented positive pairs, while pushing the negative ones distant, the proposed transfer scheme captures the novel-class distribution implicitly. This strategy effectively avoids the possible overfitting of retraining feature embedding on the few labeled instances.
By integrating the Poisson learning and the novel-class specific embedding, the proposed PTN model can fully explore the auxiliary information of extra unlabeled data for SSFSL tasks. The contributions are summarized as follows:
We propose a Poisson learning based model to improve the capacity of mining the relations between the labeled and unlabeled data for graph-based SSFSL.
We propose to adapt unsupervised contrastive learning in the representation learning with extra unlabeled data to improve the generality of the pre-trained base-class embedding for novel-class recognition.
Extensive experiments are conducted on two benchmark datasets to investigate the effectiveness of PTN, and PTN achieves state-of-the-art performance.
As a representative of the learning methods with limited samples, e.g.,
weakly supervised learningLan et al. (2017); Zhang et al. (2018) et al. (2003); Calder and Slepčev (2019), few-shot learning can be roughly grouped into two categories: meta-learning models and transfer-learning models. Meta-learning models adopt the episode training mechanism Vinyals et al. (2016), of which metric-based models optimize the transferable embedding of both auxiliary and target data, and queries are identified according to the embedding distances Sung et al. (2018); Li et al. (2019a); Simon et al. (2020); Zhang et al. (2020). Meanwhile, meta-optimization models Finn et al. (2017); Rusu et al. (2018) target at designing optimization-centered algorithms to adapt the knowledge from meta-training to meta-testing. Instead of separating base classes into a set of few-shot tasks, transfer-learning methods Qiao et al. (2018); Gidaris and Komodakis (2018); Chen et al. (2018); Qi et al. (2018) utilize all base classes to pre-train the few-shot model, which is then adapted to novel-class recognition. Most recently, Tian et al. Tian et al. (2020)
decouple the learning procedure into the base-class embedding pre-training and novel-class classifier learning. By adopting multivariate logistic regression and knowledge distillation, the proposed model outperforms the meta-learning approaches. Our proposed method is inspired by the transfer-learning framework, where we adapt this framework to the semi-supervised few-shot learning by exploring both unlabeled novel-class data and base-class data to boost the performance of few-shot tasks.
Semi-Supervised Few-shot Learning (SSFSL)
SSFSL aims to leverage the extra unlabeled novel-class data to improve the few-shot learning. Ren et al. Ren et al. (2018) propose a meta-learning based framework by extending the prototypical network Snell et al. (2017) with unlabeled data to refine class prototypes. LST Li et al. (2019b) re-trains the base model using the unlabeled data with generated pseudo labels. During the evaluation, it dynamically adds the unlabeled sample with high prediction confidence into testing. In Yu et al. (2020), TransMatch proposes to initialize the novel-class classifier with the pre-trained feature imprinting, and then employs MixMatch Berthelot et al. (2019) to fine-tune the whole model with both labeled and unlabeled data. As closely related research to SSFSL, the transductive few-shot approaches Liu et al. (2018); Kim et al. (2019); Ziko et al. (2020) also attempt to utilize unlabeled data to improve the performance of the few-shot learning. These methods adopt the entire query set as the unlabeled data and perform inference on all query samples together. For instance, TPN Liu et al. (2018) employs graph-based transductive inference to address the few-shot problem, and a semi-supervised extension model is also presented in their work.
Unlike the above approaches, in this paper, we adopt the transfer-learning framework and propose to fully explore the extra unlabeled information in both classifier learning and embedding learning with different learning strategies.
In the standard few-shot learning, there exists a labeled support set of different classes, , where is the labeled sample and
denote its label. We use the standard basis vectorrepresent the -th class, i.e., . Given an unlabeled query sample from the query set , the goal is to assign the query to one of the support classes. The labeled support set and unlabeled query set share the same label space, and the novel-class dataset is thus defined as . If contains labeled samples for each of categories, the task is noted as a -way--shot problem. It is far from obtaining an ideal classifier with the limited annotated . Therefore, few-shot models usually utilize a fully annotated dataset, which has similar data distribution but disjoint label space with as an auxiliary dataset , noted as the base-class set.
For the semi-supervised few-shot learning (SSFSL), we have an extra unlabeled support set . These additional unlabeled samples are usually from each of the support classes in standard-setting, or other novel-class under distractor classification settings. Then the new novel-class dataset is defined as . The goal of SSFSL is maximizing the value of the extra unlabeled data to improve the few-shot methods.
For a clear understanding, the details of proposed PTN are introduced as follows: we first introduce the proposed Representation Learning, and then we illustrate the proposed Poisson learning model for label inference.
The representation learning aims to learn a well-generalized novel-class embedding through Feature Embedding Pre-training and Unsupervised Embedding Transfer.
Feature Embedding Pre-training
On the left side of Figure 1, the first part of PTN is the feature embedding pre-training. By employing the cross-entropy loss between predictions and ground-truth labels in , we train the base encoder in a fully-supervised way, which is the same as Chen et al. (2018); Yu et al. (2020); Tian et al. (2020). This stage can generate powerful embedding for the downstream few-shot learner.
Unsupervised Embedding Transfer
Directly employ the pre-trained base-class embedding for the novel-class may suffer from the degeneration problem. However, retraining the base-class embedding with the limited labeled instances is easy to lead to overfitting. How can we train a novel-class embedding to represent things beyond labels when our only supervision is the limited labels? Our solution is unsupervised contrastive learning. Unsupervised learning, especially Contrastive learningHe et al. (2020); Chen et al. (2020), recently has shown great potential in representation learning for various downstream vision tasks, and most of these works training a model from scratch. However, unsupervised pre-trained models perform worse than fully-supervised pre-trained models. Unlike previous works, we propose to adopt contrastive learning to retrain the pre-trained embedding with the unlabeled novel data. In this way, we can learn a decent novel-class embedding by integrating the fully-supervised pre-trained scheme with unsupervised contrastive fine-tuning.
Specifically, for a minibatch of examples from the unlabeled novel-class subset , randomly sampling two data augmentation operators , we can generate a new feature set , resulting in pairs of feature points. We treat each feature pair from the same raw data input as the positive pair, and the other feature points as negative samples. Then the contrastive loss for the minibatch is defined as
where denote a positive feature pair from , is a temperature parameter, and
represents the consine similarity. Then, we adopt a Kullback-Leibler divergence () between two feature subset and as the regulation term. Therefore, the final unsupervised embedding transfer loss is defined as
By training the extra unlabeled data with this loss, we can learn a robust novel-class embedding from .
Poisson Label Inference
Previous studies Zhu et al. (2003); Zhou et al. (2004); Zhu et al. (2005); Liu et al. (2018); Ziko et al. (2020) indicate that the graph-based few-shot classifier has shown superior performance against inductive ones. Therefore, we propose constructing the classifier with a graph-based Poisson model, which adopts different optimizing strategy with representation learning. Poisson model Calder et al. (2020) has been proved superior over traditional Laplace-based graph models Zhu et al. (2003); Zhou et al. (2004) both theoretically and experimentally, especially for the low label rate semi-supervised problem. However, directly applying this model to the few-shot task will suffer from a cross-class bias challenge, caused by the data distribution bias between support data (including labeled support and unlabeled support data) and query data.
Therefore, we revise this powerful model by eliminating the support-query bias as the classifier. We explicitly propose a query feature calibration strategy before the final Poisson label inference. It is worth noticing that the proposed graph-based classifier can be directly appended to the pre-trained embedding without adopting the unsupervised embedding transfer training. We dob this baseline model as Decoupled Poisson Network (DPN).
Query Feature Calibration
The support-query data distribution bias, also referred to as the cross-class bias Liu et al. (2020), is one of the reasons for the degeneracy of the few-shot learner. In this paper, we propose a simple but effective method to eliminate this distribution bias for Poisson graph inference. For a SSFSL task, we fuse the labeled support set and the extra unlabeled set as the final support set . We denote the normalized embedded support feature set and query feature set as and , the cross-class bias is defined as
We then add the bias to query features. To such a degree, support-query bias is somewhat eliminated. After that, a Poisson MBO model is adopted to infer the query label.
The Poisson Merriman–Bence–Osher Model
We denote the embedded feature set as (, where the first feature points belong to the labeled support set, the last feature points belong to the query set, and the remaining points denote the unlabeled support set. We build a graph with the feature points as the vertices, and the edge weight is the similarity between feature point and , defined as , where is the distance between and its -th nearest neighbor. We set and . Correspondingly, we define the weight matrix as , the degree matrix as , and the unnormalized Laplacian as . As the first feature points have the ground-truth label, we use to denote the average label vector, and we let indicator if , else . The goal of this model is to learn a classifier . By solving the Poisson equation:
satisfying , we can then result in the label prediction function . The predict label of vertex is then determined as . Let denote the set of matrix, which is the prediction label matrix of the all data. We concatenate the support label to form a label matrix . Let denotes the initial label of all the data, in which all unlabeled data’s label is zero. The query label of Eq. (4) can be determined by:
where denotes the predicted labels of all data at the timestamp . We can get a stable classifier with a certain number of iteration using Eq. (5). After that, we adopt a graph-cut method to improve the inference performance by incrementally adjusting the classifier’s decision boundary. The graph-cut problem is defined as
where denotes the annotated samples’ label set, is the fraction of vertices to each of classes, and is the piror knowledge of the class size distribution that is the fraction of data belonging to class . With the constraint , we can encode the prior knowledge into the Poisson Model. , this term is the graph-cut energy of the classification given by , widely used in semi-supervised graph models Zhu et al. (2003, 2005); Zhou et al. (2004).
In Eq. (6), the solution will get discrete values, which is hard to solve. To relax this problem, we use the Merriman-Bence-Osher (MBO) scheme Garcia-Cardona et al. (2014) by replacing the graph-cut energy with the Ginzburg-Landau approximation:
In Eq. (7), represents the space of projections , which allow the classifier to take on any real values, instead of the discrete value from in Eq. (6). More importantly, this leads to a more efficiently computation of the Poisson model. The Eq. (7) can be efficiently solved with alternates gradient decent strategy, as shown in lines 9-20 of Algorithm 1.
The overall proposed algorithm is summarized in Algorithm 1. Inputting the base-class set , novel-class set , prior classes’ distribution , and other parameters, PTN will predict the query samples’ label . The query label is then determined as . More specifically, once the encoder is learned using the base set , we employ the proposed unsupervised embedding transfer method in step 2 in Algorithm 1. After that, we build the graph with the feature set and compute the related matrices in step 3-5. In the label inference stage in steps 6-20, we first apply Poisson model to robust propagate the labels in step 7, and then solve the graph-cut problem by using MBO scheme in several steps of gradient-descent to boost the classification performance. The stop condition in step 7 follow the constraint: , where is a all-ones column vector, , is a -column vector with ones in the first positions and zeros elsewhere. Steps 9-19 are aimed to solve the graph-cut problem in Eq. (7), To solve the problem, we first divide the Eq. (7) into and , and then employing the gradient decent alternative on these two energy functions. Steps 10-12 are used to optimize the . We optimize the in steps 14-17, is the closet point projection, (), is the time step, and are the clipping values, By adopting the gradient descent scheme in steps 14-17, the vector is generated that also satisfies the constraint in Eq.(7). After obtaining the PoissonMBO’s solution , the query samples’ label prediction matrix is resolved by step 20.
The main inference complexity of PTN is , where is the number of edges in the graph. As a graphed-based model, PTN’s inference complexity is heavier than inductive models. However, previous studies Liu et al. (2018); Calder et al. (2020) indicate that this complexity is affordable for few-shot tasks since the data scale is not very big. Moreover, we do not claim that our model is the final solution for SSFSL. We aim to design a new method to make full use of the extra unlabeled information. We report inference time comparison experiments in Table 4. The average inference time of PTN is 13.68s.
We evaluate the proposed PTN on two few-shot benchmark datasets: miniImageNet and tieredImageNet. The miniImageNet dataset Vinyals et al. (2016)
is a subset of the ImageNet, consisting of 100 classes, and each class contains 600 images of size 8484. We follow the standard split of 64 base, 16 validation , and 20 test classes Vinyals et al. (2016); Tian et al. (2020). The tieredImageNet Ren et al. (2018) is another subset but with 608 classes instead. We follow the standard split of 351 base, 97 validation, and 160 test classes for the experiments Ren et al. (2018); Liu et al. (2018). We resize the images from tieredImageNet to 8484 pixels, and randomly select classes from the novel class to construct the few-shot task. Within each class, examples are selected as the labeled data, and examples from the rest as queries. The extra unlabeled samples are selected from the classes or rest novel classes. We set and study different sizes of
. We run 600 few-shot tasks and report the mean accuracy with the 95% confidence interval.
Same as previous works Rusu et al. (2018); Dhillon et al. (2019); Liu et al. (2020); Tian et al. (2020); Yu et al. (2020), we adopt the wide residual network (WRN-28-10) Zagoruyko and Komodakis (2016) as the backbone of our base model , and we follow the protocals in Tian et al. (2020); Yu et al. (2020) fusing the base and validation classes to train the base model from scratch. We set the batch size to 64 with SGD learning rate as 0.05 and weight decay as
. We reduce the learning rate by 0.1 after 60 and 80 epochs. The base model is trained for 100 epochs.
|Prototypical-Net Snell et al. (2017)||Metric, Meta||ConvNet-256||49.420.78||68.200.66|
|Relation Network Sung et al. (2018)||Metric, Meta||ConvNet-64||50.440.82||65.320.70|
|TADAM Oreshkin et al. (2018)||Metric, Meta||ResNet-12||58.500.30||76.700.30|
|DPGN Yang et al. (2020)||Metric, Meta||ResNet-12||67.770.32||84.600.43|
|RFS Tian et al. (2020)||Metric, Transfer||ResNet-12||64.820.60||82.140.43|
|MAML Finn et al. (2017)||Optimization, Meta||ConvNet-64||48.701.84||63.110.92|
|SNAIL Mishra et al. (2018)||Optimization, Meta||ResNet-12||55.710.99||68.880.92|
|LEO Rusu et al. (2018)||Optimization, Meta||WRN-28-10||61.760.08||77.590.12|
|MetaOptNet Lee et al. (2019)||Optimization, Meta||ResNet-12||64.090.62||80.000.45|
|TPN Liu et al. (2018)||Transductive, Meta||ConvNet-64||55.510.86||69.860.65|
|BD-CSPN Liu et al. (2020)||Transductive, Meta||WRN-28-10||70.310.93||81.890.60|
|Transductive Fine-tuning Dhillon et al. (2019)||Transductive, Transfer||WRN-28-10||65.730.68||78.400.52|
|LaplacianShot Ziko et al. (2020)||Transductive, Transfer||DenseNet||75.570.19||84.720.13|
Masked Soft k-MeansRen et al. (2018)
|TPN-semi Liu et al. (2018)||Semi, Meta||ConvNet-64||52.780.27||66.420.21|
|LST Li et al. (2019b)||Semi, Meta||ResNet-12||70.101.90||78.700.80|
|TransMatch Yu et al. (2020)||Semi, Transfer||WRN-28-10||62.931.11||82.240.59|
|DPN (Ours)||Semi, Transfer||WRN-28-10||79.671.06||86.300.95|
|PTN (Ours)||Semi, Transfer||WRN-28-10||82.660.97||88.430.67|
|Prototypical-Net Snell et al. (2017)||Metric, Meta||ConvNet-256||53.310.89||72.690.74|
|Relation Network Sung et al. (2018)||Metric, Meta||ConvNet-64||54.480.93||71.320.78|
|DPGN Yang et al. (2020)||Metric, Meta||ResNet-12||72.450.51||87.240.39|
|RFS Tian et al. (2020)||Metric, Transfer||ResNet-12||71.520.69||86.030.49|
|MAML Finn et al. (2017)||Optimization, Meta||ConvNet-64||51.671.81||70.301.75|
|LEO Rusu et al. (2018)||Optimization, Meta||WRN-28-10||66.330.05||81.440.09|
|MetaOptNet Lee et al. (2019)||Optimization, Meta||ResNet-12||65.810.74||81.750.53|
|TPN Liu et al. (2018)||Transductive, Meta||ConvNet-64||59.910.94||73.300.75|
|BD-CSPN Liu et al. (2020)||Transductive, Meta||WRN-28-10||78.740.95||86.920.63|
|Transductive Fine-tuning Dhillon et al. (2019)||Transductive, Transfer||WRN-28-10||73.340.71||85.500.50|
|LaplacianShot Ziko et al. (2020)||Transductive, Transfer||DenseNet||80.300.22||87.930.15|
|Masked Soft k-Means Ren et al. (2018)||Semi, Meta||ConvNet-128||52.390.44||69.880.20|
|TPN-semi Liu et al. (2018)||Semi, Meta||ConvNet-64||55.740.29||71.010.23|
|LST Li et al. (2019b)||Semi, Meta||ResNet-12||77.701.60||85.200.80|
|DPN (Ours)||Semi, Transfer||WRN-28-10||82.181.06||88.020.72|
|PTN (Ours)||Semi, Transfer||WRN-28-10||84.701.14||89.140.71|
In unsupervised embedding transfer, the data augmentation is defined same as Lee et al. (2019); Tian et al. (2020). For fair comparisons against TransMatch Yu et al. (2020), we also augment each labeled image 10 times by random transformations and generate the prototypes of each class as labeled samples. We apply SGD optimizer with a momentum of 0.9. The learning rate is initialized as , and the cosine learning rate scheduler is used for 10 epochs. We set the batch size to 80 with in Eq. (2). For Poisson inference, we construct the graph by connecting each sample to its -nearest neighbors with Gaussian weights. We set and the weight matrix is summarized with , which accelerates the convergence of the iteration in Algorithm 1 without change the solution of the Equation 4. We set the max in step 7 of Algorithm 1 by referring to the stop constraint discussed in the Proposed Algorithm section. We set hyper-parameters and empirically. Moreover, we set .
Comparison with the State-Of-The-Art
In our experiments, we group the compared methods into five categories, and the experimental results on two datasets are summarized in Table 1. With the auxiliary unlabeled data available, our proposed PTN outperforms the metric-based and optimization-based few-shot models by large margins, indicating that the proposed model effectively utilizes the unlabeled information for assisting few-shot recognition. By integrating the unsupervised embedding transfer and PoissonMBO classifier, PTN achieves superior performance over both transductive and existing SSFSL approaches. Specifically, under the 5-way-1-shot setting, the classification accuracies are 81.57% vs. 63.02% TransMatch Yu et al. (2020), 84.70% vs. 80.30% LaplacianShot Ziko et al. (2020) on miniImageNet and tieredImageNet, respectively; under the 5-way-5-shot setting, the classification accuracies are 88.43% vs. 78.70% LST Li et al. (2019b), 89.14% vs. 81.89% BD-CSPN Liu et al. (2020) on miniImageNet and tieredImageNet, respectively. These results demonstrate the superiority of PTN for SSFSL tasks.
Different Extra Unlabeled Samples
We show the results of using different numbers of extra unlabeled instances in Table 2. For Num_U = 0, PTN can be viewed as the transductive model without extra unlabeled data, where we treat query samples as the unlabeled data, and we do not fine-tune the embedding with query labels for fair comparisons. Contrary to PTN, the proposed PTN model utilize the query samples to fine-tune the embedding when Num_U=0. It can be observed that our PTN model achieves better performances with more extra unlabeled samples, which indicates the effectiveness of PTN in mining the unlabeled auxiliary information for the few-shot problem.
Results with Distractor Classes
Inspired by Ren et al. (2018); Liu et al. (2018); Yu et al. (2020), we further investigate the influence of distractor classes, where the extra unlabeled data are collected from classes with no overlaps to labeled support samples. We follow the settings in Ren et al. (2018); Liu et al. (2018). As shown in Figure 2, even with distractor class data, the proposed PTN still outperforms other SSFSL methods by a large margin, which indicates the robustness of the proposed PTN in dealing with distracted unlabeled data.
We analyze different components of the PTN and summarize the results in Table 3. All compared approaches are based on the pre-trained WRN-28-10 embedding.
First of all, we investigate the graph propagation component (classifier). It can be observed that graph-based models such as Label Propagation Zhou et al. (2004) and PoissonMBO Calder et al. (2020) outperform the inductive model TransMatch Yu et al. (2020), which is consistent with previous researches Zhu et al. (2005); Liu et al. (2018); Ziko et al. (2020). Compared to directly applying PoissonMBO on few-shot tasks, the proposed DPN (without Unsupervised Embedding Transfer) achieves better performance, which indicates it is necessary to perform the feature calibration to eliminate the cross-class biases between support and query data distributions before label inference.
For investigating the proposed unsupervised embedding transfer in representation learning, we observe that all the graph-based models achieve clear improvement after incorporating the proposed transfer module. For instance, the Label Propagation obtains 1.61%, 1.86% performance gains on 5-way-1-shot, and 5-way-5-shot minImageNet classification. These results indicate the effectiveness of the proposed unsupervised embedding transfer. Finally, by integrating the unsupervised embedding transfer and graph propagation classifier, the PTN model achieves the best performances compared against all other approaches in Table 3.
We conduct inference time experiments to investigate the computation efficiency of the proposed Poisson Transfer Network (PTN) on the miniImageneNet Vinyals et al. (2016) dataset. Same as Ziko et al. (2020), we compute the average inference time required for each 5-shot task. The results are shown in Table 4. Compared with inductive models, the proposed PTN costs more time due to the graph-based Poisson inference. However, our model achieves better classification performance than inductive ones and other transductive models, with affordable inference time.
|SimpleShot Wang et al. (2019)||0.009|
|LaplacianShot Ziko et al. (2020)||0.012|
|Transductive fine-tune Dhillon et al. (2019)||20.7|
|TransMatch Yu et al. (2020)||-||58.430.93||61.211.03||63.021.07||62.931.11|
|Label Propagation Zhou et al. (2004)||69.740.72||71.801.02||72.971.06||73.351.05||74.041.00|
|PoissonMBO Calder et al. (2020)||74.791.06||76.010.99||76.671.02||78.281.02||79.671.02|
Results with Different Extra Unlabeled
We conduct further experiments to investigate the current semi-supervised few-shot methods in mining the value of the unlabeled data. All approaches are based on a pre-trained WRN-28-10 Zagoruyko and Komodakis (2016) model for fair comparisons. As indicated in Table LABEL:unlab, with more unlabeled samples, all the models achieve higher classification performances. However, our proposed PTN model achieves the highest performance among the compared methods, which validates the superior capacity of the proposed model in using the extra unlabeled information for boosting few-shot methods.
|Methods||1-shot||5-shot||1-shot w/D||5-shot w/D|
|Soft K-Means Ren et al. (2018)||50.090.45||64.590.28||48.700.32||63.550.28|
|Soft K-Means+Cluster Ren et al. (2018)||49.030.24||63.080.18||48.860.32||61.270.24|
|Masked Soft k-Means Ren et al. (2018)||50.410.31||64.390.24||49.040.31||62.960.14|
|TPN-semi Liu et al. (2018)||52.780.27||66.420.21||50.430.84||64.950.73|
|TransMatch Yu et al. (2020)||63.021.07||81.190.59||62.321.04||80.280.62|
“w/D” means with distraction classification. In this setting, many extra unlabeled samples are from the distraction classes, which is different from the support labeled classes. All results are averaged over 600 episodes with 95% confidence intervals. The best results are in bold.
|Methods||1-shot||5-shot||1-shot w/D||5-shot w/D|
|Soft K-Means Ren et al. (2018)||51.520.36||70.250.31||49.880.52||68.320.22|
|Soft K-Means+Cluster Ren et al. (2018)||51.850.25||69.420.17||51.360.31||67.560.10|
|Masked Soft k-Means Ren et al. (2018)||52.390.44||69.880.20||51.380.38||69.080.25|
|TPN-semi Liu et al. (2018)||55.740.29||71.010.23||53.450.93||69.930.80|
“w/D” means with distraction classification. In this setting, many extra unlabeled samples are from the distraction classes, which is different from the support labeled classes. All results are averaged over 600 episodes with 95% confidence intervals. The best results are in bold.
Results with Distractor Classification
We report the results of the proposed PTN on both miniImageNet Vinyals et al. (2016) and tieredImageneNet Ren et al. (2018) datasets under different settings in Table 6 and Table 7, respectively. It can be observed that the classification results of all semi-supervised few-shot models are degraded due to the distractor classes. However, the proposed PTN model still outperforms other semi-supervised few-shot methods with a large margin. This also indicates the superiority of the proposed PTN model in dealing with the semi-supervised few-shot classification tasks over previous approaches.
We propose a Poisson Transfer Network (PTN) to tackle the semi-supervised few-shot problem, aiming to explore the value of unlabeled novel-class data from two aspects. We propose to employ the Poisson learning model to capture the relations between the few labeled and unlabeled data, which results in a more stable and informative classifier than previous semi-supervised few-shot models. Moreover, we propose to adopt the unsupervised contrastive learning to improve the generality of the embedding on novel classes, which avoids the possible over-fitting problem when training with few labeled samples. Integrating the two modules, the proposed PTN can fully explore the unlabeled auxiliary information boosting the performance of few-shot learning. Extensive experiments indicate that PTN outperforms state-of-the-art few-shot and semi-supervised few-shot methods.
The authors greatly appreciate the financial support from the Rail Manufacturing Cooperative Research Centre (funded jointly by participating rail organizations and the Australian Federal Government’s Business-Cooperative Research Centres Program) through Project R3.7.3 - Rail infrastructure defect detection through video analytics.
- Mixmatch: a holistic approach to semi-supervised learning. In NeurIPS, pp. 5049–5059. Cited by: Semi-Supervised Few-shot Learning (SSFSL).
- Poisson learning: graph based semi-supervised learning at very low label rates. In ICML, pp. 1306–1316. Cited by: Introduction, Poisson Label Inference, Proposed Algorithm, Ablation Study, Table 5.
Properly-weighted graph laplacian for semi-supervised learning.
Applied Mathematics & Optimization, pp. 1–49. Cited by: Few-Shot Learning.
- A simple framework for contrastive learning of visual representations. ICLR. Cited by: Introduction, Unsupervised Embedding Transfer.
- A closer look at few-shot classification. In ICLR, Cited by: Introduction, Few-Shot Learning, Feature Embedding Pre-training.
- A baseline for few-shot image classification. In ICLR, Cited by: Implementation Details, Table 1, Table 4.
- One-shot learning of object categories. TPAMI 28 (4), pp. 594–611. Cited by: Introduction.
- Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, pp. 1126–1135. Cited by: Few-Shot Learning, Table 1.
- Multiclass data segmentation using diffuse interface methods on graphs. TPAMI 36 (8), pp. 1600–1613. Cited by: The Poisson Merriman–Bence–Osher Model.
- Dynamic few-shot visual learning without forgetting. In CVPR, pp. 4367–4375. Cited by: Few-Shot Learning.
- Momentum contrast for unsupervised visual representation learning. In CVPR, pp. 9729–9738. Cited by: Introduction, Unsupervised Embedding Transfer.
Edge-labeling graph neural network for few-shot learning. In CVPR, pp. 11–20. Cited by: Semi-Supervised Few-shot Learning (SSFSL).
- Robust mil-based feature template learning for object tracking. In AAAI, pp. 4118–4125. Cited by: Few-Shot Learning.
- Meta-learning with differentiable convex optimization. In CVPR, pp. 10657–10665. Cited by: Implementation Details, Table 1.
- Distribution consistency based covariance metric networks for few-shot learning. In AAAI, Vol. 33, pp. 8642–8649. Cited by: Few-Shot Learning.
- Learning to self-train for semi-supervised few-shot classification. In NeurIPS, pp. 10276–10286. Cited by: Introduction, Semi-Supervised Few-shot Learning (SSFSL), Comparison with the State-Of-The-Art, Table 1.
- Prototype rectification for few-shot learning. In ECCV, pp. 741–756. Cited by: Query Feature Calibration, Implementation Details, Comparison with the State-Of-The-Art, Table 1.
- Learning to propagate labels: transductive propagation network for few-shot learning. In ICLR, Cited by: Introduction, Semi-Supervised Few-shot Learning (SSFSL), Poisson Label Inference, Proposed Algorithm, Datasets, Results with Distractor Classes, Ablation Study, Table 1, Table 6, Table 7.
- Learning from one example through shared densities on transforms. In CVPR, Vol. 1, pp. 464–471. Cited by: Introduction.
- A simple neural attentive meta-learner. In ICLR, Cited by: Table 1.
- Tadam: task dependent adaptive metric for improved few-shot learning. In NeurIPS, pp. 721–731. Cited by: Table 1.
- Low-shot learning with imprinted weights. In CVPR, pp. 5822–5830. Cited by: Few-Shot Learning.
- Few-shot image recognition by predicting parameters from activations. In CVPR, pp. 7229–7238. Cited by: Introduction, Few-Shot Learning.
- Meta-learning for semi-supervised few-shot classification. In ICLR, Cited by: Introduction, Semi-Supervised Few-shot Learning (SSFSL), Datasets, Results with Distractor Classes, Results with Distractor Classification, Table 1, Table 6, Table 7.
- Meta-learning with latent embedding optimization. In ICLR, Cited by: Few-Shot Learning, Implementation Details, Table 1.
- Adaptive subspaces for few-shot learning. In CVPR, pp. 4136–4145. Cited by: Few-Shot Learning.
- Prototypical networks for few-shot learning. In NeurIPS, pp. 4077–4087. Cited by: Semi-Supervised Few-shot Learning (SSFSL), Table 1.
- Learning to compare: relation network for few-shot learning. In CVPR, pp. 1199–1208. Cited by: Few-Shot Learning, Table 1.
- Rethinking few-shot image classification: a good embedding is all you need?. In ECCV, pp. 266–282. Cited by: Introduction, Few-Shot Learning, Feature Embedding Pre-training, Datasets, Implementation Details, Implementation Details, Table 1.
- Matching networks for one shot learning. In NeurIPS, pp. 3630–3638. Cited by: Introduction, Few-Shot Learning, Datasets, Inference Time, Results with Distractor Classification.
- Simpleshot: revisiting nearest-neighbor classification for few-shot learning. arXiv preprint arXiv:1911.04623. Cited by: Table 4.
- DPGN: distribution propagation graph network for few-shot learning. In CVPR, pp. 13390–13399. Cited by: Table 1.
- TransMatch: a transfer-learning scheme for semi-supervised few-shot learning. In CVPR, pp. 12856–12864. Cited by: Introduction, Semi-Supervised Few-shot Learning (SSFSL), Feature Embedding Pre-training, Implementation Details, Implementation Details, Comparison with the State-Of-The-Art, Results with Distractor Classes, Ablation Study, Table 1, Table 5, Table 6.
- Wide residual networks. In BMVC, Cited by: Implementation Details, Results with Different Extra Unlabeled.
- Adversarial complementary learning for weakly supervised object localization. In CVPR, pp. 1325–1334. Cited by: Few-Shot Learning.
- Sg-one: similarity guidance network for one-shot semantic segmentation. TCYB 50 (9), pp. 3855–3865. Cited by: Few-Shot Learning.
- Learning with local and global consistency. In NIPS, pp. 321–328. Cited by: The Poisson Merriman–Bence–Osher Model, Poisson Label Inference, Ablation Study, Table 5.
- Semi-supervised learning using gaussian fields and harmonic functions. In ICML, pp. 912–919. Cited by: Introduction, Few-Shot Learning, The Poisson Merriman–Bence–Osher Model, Poisson Label Inference.
- Semi-supervised learning with graphs. Ph.D. Thesis, Carnegie Mellon University, language technologies institute, school of computer science. Cited by: The Poisson Merriman–Bence–Osher Model, Poisson Label Inference, Ablation Study.
- Laplacian regularized few-shot learning. In ICML, pp. 11660–11670. Cited by: Introduction, Introduction, Semi-Supervised Few-shot Learning (SSFSL), Poisson Label Inference, Comparison with the State-Of-The-Art, Ablation Study, Inference Time, Table 1, Table 4.