PTN: A Poisson Transfer Network for Semi-supervised Few-shot Learning

12/20/2020 ∙ by Huaxi Huang, et al. ∙ 0

The predicament in semi-supervised few-shot learning (SSFSL) is to maximize the value of the extra unlabeled data to boost the few-shot learner. In this paper, we propose a Poisson Transfer Network (PTN) to mine the unlabeled information for SSFSL from two aspects. First, the Poisson Merriman Bence Osher (MBO) model builds a bridge for the communications between labeled and unlabeled examples. This model serves as a more stable and informative classifier than traditional graph-based SSFSL methods in the message-passing process of the labels. Second, the extra unlabeled samples are employed to transfer the knowledge from base classes to novel classes through contrastive learning. Specifically, we force the augmented positive pairs close while push the negative ones distant. Our contrastive transfer scheme implicitly learns the novel-class embeddings to alleviate the over-fitting problem on the few labeled data. Thus, we can mitigate the degeneration of embedding generality in novel classes. Extensive experiments indicate that PTN outperforms the state-of-the-art few-shot and SSFSL models on miniImageNet and tieredImageNet benchmark datasets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Few-shot learning Miller et al. (2000); Fei-Fei et al. (2006); Vinyals et al. (2016) aims to learn a model that generalizes well with a few instances of each novel class. In general, a few-shot learner is firstly trained on a substantial annotated dataset, also noted as the base-class set, and then adapted to unseen novel classes with a few labeled instances. During the evaluation, a set of few-shot tasks are fed to the learner, where each task consists of a few support (labeled) samples and a certain number of query (unlabeled) data. This research topic has been proved immensely appealing in the past few years, as a large number of few-shot learning methods are proposed from various perspectives. Mainstream methods can be roughly grouped into two categories. The first one is learning from episodes Vinyals et al. (2016)

, also known as meta-learning, which adopts the base-class data to create a set of episodes. Each episode is a few-shot learning task, with support and query samples that simulate the evaluation procedure. The second type is the transfer-learning based method, which focuses on learning a decent classifier by transferring the domain knowledge from a model pre-trained on the large base-class set 

Chen et al. (2018); Qiao et al. (2018). This paradigm decouples the few-shot learning progress into representation learning and classification, and has shown favorable performance against meta-learning methods in recent works Tian et al. (2020); Ziko et al. (2020). Our method shares somewhat similar motivation with transfer-learning based methods and proposes to utilize the extra unlabeled novel-class data and a pre-trained embedding to tackle the few-shot problem.

Compared with collecting labeled novel-class data, it is much easier to obtain abundant unlabeled data from these classes. Therefore, semi-supervised few-shot learning (SSFSL) Ren et al. (2018); Liu et al. (2018); Li et al. (2019b); Yu et al. (2020) is proposed to combine the auxiliary information from labeled base-class data and extra unlabeled novel-class data to enhance the performance of few-shot learners. The core challenge in SSFSL is how to fully explore the auxiliary information from these unlabeled. Previous SSFSL works indicate that graph-based models Liu et al. (2018); Ziko et al. (2020) can learn a better classifier than inductive ones Ren et al. (2018); Li et al. (2019b); Yu et al. (2020), since these methods directly model the relationship between the labeled and unlabeled samples during the inference. However, current graph-based models adopt the Laplace learning Zhu et al. (2003) to conduct label propagation, the solutions of Laplace learning develop localized spikes near the labeled samples but are almost constant far from the labeled samples, i.e., label values are not propagated well, especially with few labeled samples. Therefore, these models suffer from the underdeveloped message-passing capacity for the labels. On the other hand, most SSFSL methods adapt the feature embedding pre-trained on base-class data (meta- or transfer- pre-trained) as the novel-class embedding. This may lead to the embedding degeneration problem, as the pre-trained model is designed for the base-class recognition, it tends to learn the embedding that represents only base-class information, and lose information that might be useful outside base classes.

To address the above issues, we propose a novel transfer-learning based SSFSL method, named Poisson Transfer Network (PTN). Specifically, to improve the capacity of graph-based SSFSL models in message passing, we propose to revise the Poisson model tailored for few-shot problems by incorporating the query feature calibration and the Poisson MBO model. Poisson learning Calder et al. (2020) has been provably more stable and informative than traditional Laplace learning in low label rate semi-supervised problems. However, directly employing Poisson MBO for SSFSL may suffer from the cross-class bias due to the data distribution drift between the support and query data. Therefore, we improve the Poisson MBO model by explicitly eliminating the cross-class bias before label inference. To tackle the novel-class embedding degeneration problem, we propose to transfer the pre-trained base-class embedding to the novel-class embedding by adopting unsupervised contrastive training He et al. (2020); Chen et al. (2020) on the extra unlabeled novel-class data. Constraining the distances between the augmented positive pairs, while pushing the negative ones distant, the proposed transfer scheme captures the novel-class distribution implicitly. This strategy effectively avoids the possible overfitting of retraining feature embedding on the few labeled instances.

By integrating the Poisson learning and the novel-class specific embedding, the proposed PTN model can fully explore the auxiliary information of extra unlabeled data for SSFSL tasks. The contributions are summarized as follows:

  • We propose a Poisson learning based model to improve the capacity of mining the relations between the labeled and unlabeled data for graph-based SSFSL.

  • We propose to adapt unsupervised contrastive learning in the representation learning with extra unlabeled data to improve the generality of the pre-trained base-class embedding for novel-class recognition.

  • Extensive experiments are conducted on two benchmark datasets to investigate the effectiveness of PTN, and PTN achieves state-of-the-art performance.

Related Work

Few-Shot Learning

As a representative of the learning methods with limited samples, e.g.,

weakly supervised learning 

Lan et al. (2017); Zhang et al. (2018)

, semi-supervised learning 

Zhu et al. (2003); Calder and Slepčev (2019), few-shot learning can be roughly grouped into two categories: meta-learning models and transfer-learning models. Meta-learning models adopt the episode training mechanism Vinyals et al. (2016), of which metric-based models optimize the transferable embedding of both auxiliary and target data, and queries are identified according to the embedding distances Sung et al. (2018); Li et al. (2019a); Simon et al. (2020); Zhang et al. (2020). Meanwhile, meta-optimization models Finn et al. (2017); Rusu et al. (2018) target at designing optimization-centered algorithms to adapt the knowledge from meta-training to meta-testing. Instead of separating base classes into a set of few-shot tasks, transfer-learning methods Qiao et al. (2018); Gidaris and Komodakis (2018); Chen et al. (2018); Qi et al. (2018) utilize all base classes to pre-train the few-shot model, which is then adapted to novel-class recognition. Most recently, Tian et al. Tian et al. (2020)

decouple the learning procedure into the base-class embedding pre-training and novel-class classifier learning. By adopting multivariate logistic regression and knowledge distillation, the proposed model outperforms the meta-learning approaches. Our proposed method is inspired by the transfer-learning framework, where we adapt this framework to the semi-supervised few-shot learning by exploring both unlabeled novel-class data and base-class data to boost the performance of few-shot tasks.

Semi-Supervised Few-shot Learning (SSFSL)

SSFSL aims to leverage the extra unlabeled novel-class data to improve the few-shot learning. Ren et al. Ren et al. (2018) propose a meta-learning based framework by extending the prototypical network Snell et al. (2017) with unlabeled data to refine class prototypes. LST Li et al. (2019b) re-trains the base model using the unlabeled data with generated pseudo labels. During the evaluation, it dynamically adds the unlabeled sample with high prediction confidence into testing. In Yu et al. (2020), TransMatch proposes to initialize the novel-class classifier with the pre-trained feature imprinting, and then employs MixMatch Berthelot et al. (2019) to fine-tune the whole model with both labeled and unlabeled data. As closely related research to SSFSL, the transductive few-shot approaches Liu et al. (2018); Kim et al. (2019); Ziko et al. (2020) also attempt to utilize unlabeled data to improve the performance of the few-shot learning. These methods adopt the entire query set as the unlabeled data and perform inference on all query samples together. For instance, TPN Liu et al. (2018) employs graph-based transductive inference to address the few-shot problem, and a semi-supervised extension model is also presented in their work.

Unlike the above approaches, in this paper, we adopt the transfer-learning framework and propose to fully explore the extra unlabeled information in both classifier learning and embedding learning with different learning strategies.

Figure 1: The overview of the proposed PTN. We first pre-train a feature embedding from the base-class set using standard cross-entropy loss. This embedding is then fine-tuned with the external novel-class unlabeled data by adopting unsupervised transferring loss to generate . Finally, we revise a graph model named PoissonMBO to conduct the query label inference.

Methodology

Problem Definition

In the standard few-shot learning, there exists a labeled support set of different classes, , where is the labeled sample and

denote its label. We use the standard basis vector

represent the -th class, i.e., . Given an unlabeled query sample from the query set , the goal is to assign the query to one of the support classes. The labeled support set and unlabeled query set share the same label space, and the novel-class dataset is thus defined as . If contains labeled samples for each of categories, the task is noted as a -way--shot problem. It is far from obtaining an ideal classifier with the limited annotated . Therefore, few-shot models usually utilize a fully annotated dataset, which has similar data distribution but disjoint label space with as an auxiliary dataset , noted as the base-class set.

For the semi-supervised few-shot learning (SSFSL), we have an extra unlabeled support set . These additional unlabeled samples are usually from each of the support classes in standard-setting, or other novel-class under distractor classification settings. Then the new novel-class dataset is defined as . The goal of SSFSL is maximizing the value of the extra unlabeled data to improve the few-shot methods.

For a clear understanding, the details of proposed PTN are introduced as follows: we first introduce the proposed Representation Learning, and then we illustrate the proposed Poisson learning model for label inference.

Representation Leaning

The representation learning aims to learn a well-generalized novel-class embedding through Feature Embedding Pre-training and Unsupervised Embedding Transfer.

Feature Embedding Pre-training

On the left side of Figure 1, the first part of PTN is the feature embedding pre-training. By employing the cross-entropy loss between predictions and ground-truth labels in , we train the base encoder in a fully-supervised way, which is the same as Chen et al. (2018); Yu et al. (2020); Tian et al. (2020). This stage can generate powerful embedding for the downstream few-shot learner.

Unsupervised Embedding Transfer

Directly employ the pre-trained base-class embedding for the novel-class may suffer from the degeneration problem. However, retraining the base-class embedding with the limited labeled instances is easy to lead to overfitting. How can we train a novel-class embedding to represent things beyond labels when our only supervision is the limited labels? Our solution is unsupervised contrastive learning. Unsupervised learning, especially Contrastive learning 

He et al. (2020); Chen et al. (2020), recently has shown great potential in representation learning for various downstream vision tasks, and most of these works training a model from scratch. However, unsupervised pre-trained models perform worse than fully-supervised pre-trained models. Unlike previous works, we propose to adopt contrastive learning to retrain the pre-trained embedding with the unlabeled novel data. In this way, we can learn a decent novel-class embedding by integrating the fully-supervised pre-trained scheme with unsupervised contrastive fine-tuning.

Specifically, for a minibatch of examples from the unlabeled novel-class subset , randomly sampling two data augmentation operators , we can generate a new feature set , resulting in pairs of feature points. We treat each feature pair from the same raw data input as the positive pair, and the other feature points as negative samples. Then the contrastive loss for the minibatch is defined as

(1)

where denote a positive feature pair from , is a temperature parameter, and

represents the consine similarity. Then, we adopt a Kullback-Leibler divergence (

) between two feature subset and as the regulation term. Therefore, the final unsupervised embedding transfer loss is defined as

(2)

By training the extra unlabeled data with this loss, we can learn a robust novel-class embedding from .

Poisson Label Inference

Previous studies Zhu et al. (2003); Zhou et al. (2004); Zhu et al. (2005); Liu et al. (2018); Ziko et al. (2020) indicate that the graph-based few-shot classifier has shown superior performance against inductive ones. Therefore, we propose constructing the classifier with a graph-based Poisson model, which adopts different optimizing strategy with representation learning. Poisson model Calder et al. (2020) has been proved superior over traditional Laplace-based graph models Zhu et al. (2003); Zhou et al. (2004) both theoretically and experimentally, especially for the low label rate semi-supervised problem. However, directly applying this model to the few-shot task will suffer from a cross-class bias challenge, caused by the data distribution bias between support data (including labeled support and unlabeled support data) and query data.

Therefore, we revise this powerful model by eliminating the support-query bias as the classifier. We explicitly propose a query feature calibration strategy before the final Poisson label inference. It is worth noticing that the proposed graph-based classifier can be directly appended to the pre-trained embedding without adopting the unsupervised embedding transfer training. We dob this baseline model as Decoupled Poisson Network (DPN).

Query Feature Calibration

The support-query data distribution bias, also referred to as the cross-class bias Liu et al. (2020), is one of the reasons for the degeneracy of the few-shot learner. In this paper, we propose a simple but effective method to eliminate this distribution bias for Poisson graph inference. For a SSFSL task, we fuse the labeled support set and the extra unlabeled set as the final support set . We denote the normalized embedded support feature set and query feature set as and , the cross-class bias is defined as

(3)

We then add the bias to query features. To such a degree, support-query bias is somewhat eliminated. After that, a Poisson MBO model is adopted to infer the query label.

The Poisson Merriman–Bence–Osher Model

We denote the embedded feature set as (, where the first feature points belong to the labeled support set, the last feature points belong to the query set, and the remaining points denote the unlabeled support set. We build a graph with the feature points as the vertices, and the edge weight is the similarity between feature point and , defined as , where is the distance between and its -th nearest neighbor. We set and . Correspondingly, we define the weight matrix as , the degree matrix as , and the unnormalized Laplacian as . As the first feature points have the ground-truth label, we use to denote the average label vector, and we let indicator if , else . The goal of this model is to learn a classifier . By solving the Poisson equation:

(4)

satisfying , we can then result in the label prediction function . The predict label of vertex is then determined as . Let denote the set of matrix, which is the prediction label matrix of the all data. We concatenate the support label to form a label matrix . Let denotes the initial label of all the data, in which all unlabeled data’s label is zero. The query label of Eq. (4) can be determined by:

(5)

where denotes the predicted labels of all data at the timestamp . We can get a stable classifier with a certain number of iteration using Eq. (5). After that, we adopt a graph-cut method to improve the inference performance by incrementally adjusting the classifier’s decision boundary. The graph-cut problem is defined as

(6)

where denotes the annotated samples’ label set, is the fraction of vertices to each of classes, and is the piror knowledge of the class size distribution that is the fraction of data belonging to class . With the constraint , we can encode the prior knowledge into the Poisson Model. , this term is the graph-cut energy of the classification given by , widely used in semi-supervised graph models Zhu et al. (2003, 2005); Zhou et al. (2004).

In Eq. (6), the solution will get discrete values, which is hard to solve. To relax this problem, we use the Merriman-Bence-Osher (MBO) scheme Garcia-Cardona et al. (2014) by replacing the graph-cut energy with the Ginzburg-Landau approximation:

(7)

In Eq. (7), represents the space of projections , which allow the classifier to take on any real values, instead of the discrete value from in Eq. (6). More importantly, this leads to a more efficiently computation of the Poisson model. The Eq. (7) can be efficiently solved with alternates gradient decent strategy, as shown in lines 9-20 of Algorithm 1.

Input : , ,
, ,
Output : Query samples’ label prediction
1 Train a base model with all samples and labels from ; Apply unsupervised embedding transfer method to fine-tune the with novel unlabeled data by using in Eq. (2), and result in ; Apply to extract features on as ; Apply query feature calibration using Eq. (3); Compute according to ,  
2       Update uisng Eq. (5) with given steps , for  to  do
3             for  to  do
4                  
5             end for
6            
7             for  to  do
8                  
9                  
10             end for
11            
12       end for
13      
14 ;
Algorithm 1 PTN for SSFSL

Proposed Algorithm

The overall proposed algorithm is summarized in Algorithm 1. Inputting the base-class set , novel-class set , prior classes’ distribution , and other parameters, PTN will predict the query samples’ label . The query label is then determined as . More specifically, once the encoder is learned using the base set , we employ the proposed unsupervised embedding transfer method in step 2 in Algorithm 1. After that, we build the graph with the feature set and compute the related matrices in step 3-5. In the label inference stage in steps 6-20, we first apply Poisson model to robust propagate the labels in step 7, and then solve the graph-cut problem by using MBO scheme in several steps of gradient-descent to boost the classification performance. The stop condition in step 7 follow the constraint: , where is a all-ones column vector, , is a -column vector with ones in the first positions and zeros elsewhere. Steps 9-19 are aimed to solve the graph-cut problem in Eq. (7), To solve the problem, we first divide the Eq. (7) into and , and then employing the gradient decent alternative on these two energy functions. Steps 10-12 are used to optimize the . We optimize the in steps 14-17, is the closet point projection, (), is the time step, and are the clipping values, By adopting the gradient descent scheme in steps 14-17, the vector is generated that also satisfies the constraint in Eq.(7). After obtaining the PoissonMBO’s solution , the query samples’ label prediction matrix is resolved by step 20.

The main inference complexity of PTN is , where is the number of edges in the graph. As a graphed-based model, PTN’s inference complexity is heavier than inductive models. However, previous studies Liu et al. (2018); Calder et al. (2020) indicate that this complexity is affordable for few-shot tasks since the data scale is not very big. Moreover, we do not claim that our model is the final solution for SSFSL. We aim to design a new method to make full use of the extra unlabeled information. We report inference time comparison experiments in Table 4. The average inference time of PTN is 13.68s.

Experiments

Datasets

We evaluate the proposed PTN on two few-shot benchmark datasets: miniImageNet and tieredImageNet. The miniImageNet dataset Vinyals et al. (2016)

is a subset of the ImageNet, consisting of 100 classes, and each class contains 600 images of size 84

84. We follow the standard split of 64 base, 16 validation , and 20 test classes Vinyals et al. (2016); Tian et al. (2020). The tieredImageNet Ren et al. (2018) is another subset but with 608 classes instead. We follow the standard split of 351 base, 97 validation, and 160 test classes for the experiments Ren et al. (2018); Liu et al. (2018). We resize the images from tieredImageNet to 8484 pixels, and randomly select classes from the novel class to construct the few-shot task. Within each class, examples are selected as the labeled data, and examples from the rest as queries. The extra unlabeled samples are selected from the classes or rest novel classes. We set and study different sizes of

. We run 600 few-shot tasks and report the mean accuracy with the 95% confidence interval.

Implementation Details

Same as previous works Rusu et al. (2018); Dhillon et al. (2019); Liu et al. (2020); Tian et al. (2020); Yu et al. (2020), we adopt the wide residual network (WRN-28-10) Zagoruyko and Komodakis (2016) as the backbone of our base model , and we follow the protocals in Tian et al. (2020); Yu et al. (2020) fusing the base and validation classes to train the base model from scratch. We set the batch size to 64 with SGD learning rate as 0.05 and weight decay as

. We reduce the learning rate by 0.1 after 60 and 80 epochs. The base model is trained for 100 epochs.

Methods Type Backbone miniImageNet
1-shot 5-shot
Prototypical-Net Snell et al. (2017) Metric, Meta ConvNet-256 49.420.78 68.200.66
Relation Network Sung et al. (2018) Metric, Meta ConvNet-64 50.440.82 65.320.70
TADAM Oreshkin et al. (2018) Metric, Meta ResNet-12 58.500.30 76.700.30
DPGN Yang et al. (2020) Metric, Meta ResNet-12 67.770.32 84.600.43
RFS Tian et al. (2020) Metric, Transfer ResNet-12 64.820.60 82.140.43
MAML Finn et al. (2017) Optimization, Meta ConvNet-64 48.701.84 63.110.92
SNAIL Mishra et al. (2018) Optimization, Meta ResNet-12 55.710.99 68.880.92
LEO Rusu et al. (2018) Optimization, Meta WRN-28-10 61.760.08 77.590.12
MetaOptNet Lee et al. (2019) Optimization, Meta ResNet-12 64.090.62 80.000.45
TPN Liu et al. (2018) Transductive, Meta ConvNet-64 55.510.86 69.860.65
BD-CSPN Liu et al. (2020) Transductive, Meta WRN-28-10 70.310.93 81.890.60
Transductive Fine-tuning Dhillon et al. (2019) Transductive, Transfer WRN-28-10 65.730.68 78.400.52
LaplacianShot Ziko et al. (2020) Transductive, Transfer DenseNet 75.570.19 84.720.13

Masked Soft k-Means

Ren et al. (2018)
Semi, Meta ConvNet-128 50.410.31 64.390.24
TPN-semi Liu et al. (2018) Semi, Meta ConvNet-64 52.780.27 66.420.21
LST Li et al. (2019b) Semi, Meta ResNet-12 70.101.90 78.700.80
TransMatch Yu et al. (2020) Semi, Transfer WRN-28-10 62.931.11 82.240.59
DPN (Ours) Semi, Transfer WRN-28-10 79.671.06 86.300.95
PTN (Ours) Semi, Transfer WRN-28-10 82.660.97 88.430.67
Methods Type Backbone tieredImageNet
1-shot 5-shot
Prototypical-Net Snell et al. (2017) Metric, Meta ConvNet-256 53.310.89 72.690.74
Relation Network Sung et al. (2018) Metric, Meta ConvNet-64 54.480.93 71.320.78
DPGN Yang et al. (2020) Metric, Meta ResNet-12 72.450.51 87.240.39
RFS Tian et al. (2020) Metric, Transfer ResNet-12 71.520.69 86.030.49
MAML Finn et al. (2017) Optimization, Meta ConvNet-64 51.671.81 70.301.75
LEO Rusu et al. (2018) Optimization, Meta WRN-28-10 66.330.05 81.440.09
MetaOptNet Lee et al. (2019) Optimization, Meta ResNet-12 65.810.74 81.750.53
TPN Liu et al. (2018) Transductive, Meta ConvNet-64 59.910.94 73.300.75
BD-CSPN Liu et al. (2020) Transductive, Meta WRN-28-10 78.740.95 86.920.63
Transductive Fine-tuning Dhillon et al. (2019) Transductive, Transfer WRN-28-10 73.340.71 85.500.50
LaplacianShot Ziko et al. (2020) Transductive, Transfer DenseNet 80.300.22 87.930.15
Masked Soft k-Means Ren et al. (2018) Semi, Meta ConvNet-128 52.390.44 69.880.20
TPN-semi Liu et al. (2018) Semi, Meta ConvNet-64 55.740.29 71.010.23
LST Li et al. (2019b) Semi, Meta ResNet-12 77.701.60 85.200.80
DPN (Ours) Semi, Transfer WRN-28-10 82.181.06 88.020.72
PTN (Ours) Semi, Transfer WRN-28-10 84.701.14 89.140.71
Table 1: The 5-way, 1-shot and 5-shot classification accuracy (%) on the two datasets with 95% confidence interval. Tne best results are in bold. The upper and lower parts of the table show the results on miniImageNet and tieredImageNet, respectively.

In unsupervised embedding transfer, the data augmentation is defined same as Lee et al. (2019); Tian et al. (2020). For fair comparisons against TransMatch Yu et al. (2020), we also augment each labeled image 10 times by random transformations and generate the prototypes of each class as labeled samples. We apply SGD optimizer with a momentum of 0.9. The learning rate is initialized as , and the cosine learning rate scheduler is used for 10 epochs. We set the batch size to 80 with in Eq. (2). For Poisson inference, we construct the graph by connecting each sample to its -nearest neighbors with Gaussian weights. We set and the weight matrix is summarized with , which accelerates the convergence of the iteration in Algorithm 1 without change the solution of the Equation 4. We set the max in step 7 of Algorithm 1 by referring to the stop constraint discussed in the Proposed Algorithm section. We set hyper-parameters and empirically. Moreover, we set .

Experimental Results

Comparison with the State-Of-The-Art

In our experiments, we group the compared methods into five categories, and the experimental results on two datasets are summarized in Table 1. With the auxiliary unlabeled data available, our proposed PTN outperforms the metric-based and optimization-based few-shot models by large margins, indicating that the proposed model effectively utilizes the unlabeled information for assisting few-shot recognition. By integrating the unsupervised embedding transfer and PoissonMBO classifier, PTN achieves superior performance over both transductive and existing SSFSL approaches. Specifically, under the 5-way-1-shot setting, the classification accuracies are 81.57% vs. 63.02% TransMatch Yu et al. (2020), 84.70% vs. 80.30% LaplacianShot Ziko et al. (2020) on miniImageNet and tieredImageNet, respectively; under the 5-way-5-shot setting, the classification accuracies are 88.43% vs. 78.70% LST Li et al. (2019b), 89.14% vs. 81.89% BD-CSPN Liu et al. (2020) on miniImageNet and tieredImageNet, respectively. These results demonstrate the superiority of PTN for SSFSL tasks.

Different Extra Unlabeled Samples

Methods Num_U 1-shot 5-shot
  PTN 0 76.200.82 84.250.61
PTN 0 77.010.94 85.320.68
PTN 20 77.200.92 85.930.82
PTN 50 79.921.06 86.090.75
PTN 100 81.570.94 87.170.58
PTN 200 82.660.97 88.430.76
Table 2: The 5-way, 1-shot and 5-shot classification accuracy (%) with different number of extra unlabeled samples on miniImageNet. PTN denotes that we adopt PTN as the transductive model without fine-tune embedding. Best results are in bold.

We show the results of using different numbers of extra unlabeled instances in Table 2. For Num_U = 0, PTN can be viewed as the transductive model without extra unlabeled data, where we treat query samples as the unlabeled data, and we do not fine-tune the embedding with query labels for fair comparisons. Contrary to PTN, the proposed PTN model utilize the query samples to fine-tune the embedding when Num_U=0. It can be observed that our PTN model achieves better performances with more extra unlabeled samples, which indicates the effectiveness of PTN in mining the unlabeled auxiliary information for the few-shot problem.

Results with Distractor Classes

Figure 2: The 5-way, 1-shot and 5-shot classification accuracy (%) with different number of extra unlabeled samples on miniImageNet. w/D means with distractor classes.

Inspired by Ren et al. (2018); Liu et al. (2018); Yu et al. (2020), we further investigate the influence of distractor classes, where the extra unlabeled data are collected from classes with no overlaps to labeled support samples. We follow the settings in Ren et al. (2018); Liu et al. (2018). As shown in Figure 2, even with distractor class data, the proposed PTN still outperforms other SSFSL methods by a large margin, which indicates the robustness of the proposed PTN in dealing with distracted unlabeled data.

Ablation Study

Methods 1-shot 5-shot TransMatch 62.931.11 82.240.59 Label Propagation (LP) 74.041.00 82.600.68 PoissonMBO 79.671.02 86.300.65 DPN 80.000.83 87.170.51 Unsup Trans+LP 111Unsup Trans means Unsupervised Embedding Transfer. 75.651.06 84.460.68 Unsup Trans+PoissonMBO 80.731.11 87.410.63 Unsup Trans+PTN 222PTN consists of Unsup Trans and DPN. 82.660.97 88.430.76
Table 3: Ablation studies about the proposed PTN, all methods are based on a pretrained embedding with 200 extra unlabeled samples each class on miniImageNet for 5-way, 1-shot and 5-shot classification (%). Best results are in bold.

We analyze different components of the PTN and summarize the results in Table  3. All compared approaches are based on the pre-trained WRN-28-10 embedding.

First of all, we investigate the graph propagation component (classifier). It can be observed that graph-based models such as Label Propagation Zhou et al. (2004) and PoissonMBO Calder et al. (2020) outperform the inductive model TransMatch Yu et al. (2020), which is consistent with previous researches Zhu et al. (2005); Liu et al. (2018); Ziko et al. (2020). Compared to directly applying PoissonMBO on few-shot tasks, the proposed DPN (without Unsupervised Embedding Transfer) achieves better performance, which indicates it is necessary to perform the feature calibration to eliminate the cross-class biases between support and query data distributions before label inference.

For investigating the proposed unsupervised embedding transfer in representation learning, we observe that all the graph-based models achieve clear improvement after incorporating the proposed transfer module. For instance, the Label Propagation obtains 1.61%, 1.86% performance gains on 5-way-1-shot, and 5-way-5-shot minImageNet classification. These results indicate the effectiveness of the proposed unsupervised embedding transfer. Finally, by integrating the unsupervised embedding transfer and graph propagation classifier, the PTN model achieves the best performances compared against all other approaches in Table 3.

Inference Time

We conduct inference time experiments to investigate the computation efficiency of the proposed Poisson Transfer Network (PTN) on the miniImageneNet Vinyals et al. (2016) dataset. Same as Ziko et al. (2020), we compute the average inference time required for each 5-shot task. The results are shown in Table 4. Compared with inductive models, the proposed PTN costs more time due to the graph-based Poisson inference. However, our model achieves better classification performance than inductive ones and other transductive models, with affordable inference time.

Methods Inference Time
SimpleShot Wang et al. (2019) 0.009
LaplacianShot Ziko et al. (2020) 0.012
Transductive fine-tune Dhillon et al. (2019) 20.7
PTN(Ours) 13.68
Table 4: Average inference time (in seconds) for the 5-shot tasks in miniImageneNet dataset.
miniImageNet 5-way-1-shot
0 20 50 100 200
TransMatch Yu et al. (2020) - 58.430.93 61.211.03 63.021.07 62.931.11
Label Propagation Zhou et al. (2004) 69.740.72 71.801.02 72.971.06 73.351.05 74.041.00
PoissonMBO Calder et al. (2020) 74.791.06 76.010.99 76.671.02 78.281.02 79.671.02
DPN (Ours) 75.850.97 76.101.06 77.010.92 79.551.13 80.000.83
PTN (Ours) 77.010.94 77.200.92 79.921.06 81.570.94 82.660.97
miniImageNet 5-way-5-shot
0 20 50 100 200
TransMatch - 76.430.61 79.300.59 81.190.59 82.240.59
Label Propagation 75.500.60 78.470.60 80.400.61 81.650.59 82.600.68
PoissonMBO 83.890.66 84.430.67 84.940.82 85.510.81 86.300.65
DPN (Ours) 84.740.63 85.040.66 85.360.60 86.090.63 87.170.51
PTN (Ours) 85.320.68 85.930.82 86.090.75 87.170.58 88.430.76
Table 5: Accuracy with various extra unlabeled samples for different semi-supervised few-shot methods on the miniImageNet dataset. All results are averaged over 600 episodes with 95% confidence intervals. The best results are in bold.

Results with Different Extra Unlabeled

We conduct further experiments to investigate the current semi-supervised few-shot methods in mining the value of the unlabeled data. All approaches are based on a pre-trained WRN-28-10 Zagoruyko and Komodakis (2016) model for fair comparisons. As indicated in Table LABEL:unlab, with more unlabeled samples, all the models achieve higher classification performances. However, our proposed PTN model achieves the highest performance among the compared methods, which validates the superior capacity of the proposed model in using the extra unlabeled information for boosting few-shot methods.

Methods 1-shot 5-shot 1-shot w/D 5-shot w/D
Soft K-Means Ren et al. (2018) 50.090.45 64.590.28 48.700.32 63.550.28
Soft K-Means+Cluster Ren et al. (2018) 49.030.24 63.080.18 48.860.32 61.270.24
Masked Soft k-Means Ren et al. (2018) 50.410.31 64.390.24 49.040.31 62.960.14
TPN-semi Liu et al. (2018) 52.780.27 66.420.21 50.430.84 64.950.73
TransMatch Yu et al. (2020) 63.021.07 81.190.59 62.321.04 80.280.62
PTN (Ours) 82.660.97 88.430.67 81.921.02 87.590.61
  • “w/D” means with distraction classification. In this setting, many extra unlabeled samples are from the distraction classes, which is different from the support labeled classes. All results are averaged over 600 episodes with 95% confidence intervals. The best results are in bold.

Table 6: Semi-supervised comparison on the miniImageNet dataset.
Methods 1-shot 5-shot 1-shot w/D 5-shot w/D
Soft K-Means Ren et al. (2018) 51.520.36 70.250.31 49.880.52 68.320.22
Soft K-Means+Cluster Ren et al. (2018) 51.850.25 69.420.17 51.360.31 67.560.10
Masked Soft k-Means Ren et al. (2018) 52.390.44 69.880.20 51.380.38 69.080.25
TPN-semi Liu et al. (2018) 55.740.29 71.010.23 53.450.93 69.930.80
PTN (Ours) 84.701.14 89.140.71 83.841.07 88.060.62
  • “w/D” means with distraction classification. In this setting, many extra unlabeled samples are from the distraction classes, which is different from the support labeled classes. All results are averaged over 600 episodes with 95% confidence intervals. The best results are in bold.

Table 7: Semi-supervised comparison on the tieredImageNet dataset.

Results with Distractor Classification

We report the results of the proposed PTN on both miniImageNet Vinyals et al. (2016) and tieredImageneNet Ren et al. (2018) datasets under different settings in Table 6 and Table 7, respectively. It can be observed that the classification results of all semi-supervised few-shot models are degraded due to the distractor classes. However, the proposed PTN model still outperforms other semi-supervised few-shot methods with a large margin. This also indicates the superiority of the proposed PTN model in dealing with the semi-supervised few-shot classification tasks over previous approaches.

Conclusion

We propose a Poisson Transfer Network (PTN) to tackle the semi-supervised few-shot problem, aiming to explore the value of unlabeled novel-class data from two aspects. We propose to employ the Poisson learning model to capture the relations between the few labeled and unlabeled data, which results in a more stable and informative classifier than previous semi-supervised few-shot models. Moreover, we propose to adopt the unsupervised contrastive learning to improve the generality of the embedding on novel classes, which avoids the possible over-fitting problem when training with few labeled samples. Integrating the two modules, the proposed PTN can fully explore the unlabeled auxiliary information boosting the performance of few-shot learning. Extensive experiments indicate that PTN outperforms state-of-the-art few-shot and semi-supervised few-shot methods.

Acknowledgment

The authors greatly appreciate the financial support from the Rail Manufacturing Cooperative Research Centre (funded jointly by participating rail organizations and the Australian Federal Government’s Business-Cooperative Research Centres Program) through Project R3.7.3 - Rail infrastructure defect detection through video analytics.

References