Ensemble Model with Batch Spectral Regularization and Data Blending for Cross-Domain Few-Shot Learning with Unlabeled Data

06/08/2020 ∙ by Zhen Zhao, et al. ∙ 0

Deep learning models are difficult to obtain good performance when data is scarce and there are domain gaps. Cross-domain few-shot learning (CD-FSL) is designed to improve this problem. We propose an Ensemble Model with Batch Spectral Regularization and Data Blending for the Track 2 of the CD-FSL challenge. We use different feature mapping matrices to obtain an Ensemble framework. In each branch, Batch Spectral Regularization is used to suppress the singular values of the feature matrix to improve the model's transferability. In the fine-tuning stage a data blending method is used to fuse the information of unlabeled data with the support set. The prediction result is refined with label propagation. We conduct experiments on the CD-FSL benchmark tasks and the results demonstrate the effectiveness of the proposed method.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Despite the success of deep neural networks on large-scale labelled data, their performance degrades severely on datasets with only a few labeled instances. In order to advance the research progress in this area, the Cross-Domain Few-Shot Learning (CD-FSL) challenge  

[3] has been initiated, where miniImageNet is used as the source domain, and four other datasets including the plant disease images (CropDiseases [5]), satellite images (EuroSAT [4]), dermoscopic images of skin lesions (ISIC2018 [6, 2]), and X-ray images (ChestX [7]) are used as the target domains. In this challenge, we tackle the task of track2, cross-domain few-shot learning with unlabeled data, where a separate unlabeled subset in each target domain can be used during training.

One key challenge of the cross-domain few-shot learning task lies in the large cross-domain gaps, which limits the effective generalization ability of classic few-shot learning methods. In this paper, we propose to use a batch spectral feature regularization (BSR) mechanism to prevent over-fitting of the prediction model to the source domain and increases the generalization capacity of the model across large domain gaps. Moreover, we deploy a feature transformation based ensemble model that constructs and learns multiple prediction networks in multiple diverse feature spaces to improve the model’s robustness. To exploit the unlabeled data in the target domain, we propose a data blending strategy that combines the unlabeled data and the support set to augment the sparse labeled support instances during a fine-tuning stage in the target domain. We also apply the idea of label propagation [8] to refine the classification results on the target query instances. Our overall model demonstrates effective performance on the track 2 CD-FSL tasks.

2 Proposed Method

Figure 1: An overview of the proposed approach.

We consider the following problem setting. We have a set of labeled images in the source domain. In the target domain, we have a set of few-shot learning tasks. In each task, we have a labeled support set, , which contains a few (N) labeled images from each of the K classes (K-way N-shot), and a set of query images, , that are used to evaluate a CD-FSL method’s performance. In addition, we also have a set of unlabeled data, , in the target domain. Specifically, we follow the learning setting and evaluation strategy in [3]. The overall architecture of our proposed model is illustrated in Fig. 1. We present each component of the model below.

2.1 Ensemble Model

As shown in Fig. 1, we build an ensemble model with multiple prediction branch networks in diverse feature spaces. We increase the diversity of the feature spaces of the model by applying a different feature projection matrix on each branch network. With branches (e.g., ), we use randomly generated symmetric matrices to produce orthogonal matrices , where each

is the orthogonal matrix composed of the eigenvectors of

. The feature vector

extracted by the convolutional neural network

can be transformed to a new feature representation vector via

, and then sent to the classifier

for classification. The diverse models can be trained in the source domain using the

generated orthogonal matrices. The training of the network is conducted by minimizing a standard cross-entropy loss. The batch-wise loss function for a single network can be written as:

(1)

where is the cross entropy loss function, and is the current batch size.

2.2 Batch Spectral Regularization

Inspired by  [1], we introduce a batch spectral regularization (BSR) mechanism to suppress all singular values of the batch feature matrix during pre-training, which can avoid the problem of overfitting to the source domain. In a single model, given a batch with size , and its feature matrix can be obtained. The BSR can then be written as:

(2)

where are the singular values of the batch feature matrix A. The spectral regularized training loss for each batch will be:

(3)

2.3 Data Blending

Inspired by the mixup methodology [9], we introduce a data blending strategy to exploit the information of the additional unlabeled data in the target domain and improve the model performance. Specifically, we propose to generate a pseudo support set using the mixup method. For a pair of random instances in a given mini-batch, where and are the instances from the support set and the unlabeled set respectively, we can create a new data instance

(4)

where is parameter controls the degree of blending, and obtain a trainable “pseudo-labeled” instance , which maintains the information from both the support set and the unlabeled set. By fine-tuning the pre-trained prediction model on the support set and the new generated data together, we expect to enhance the robustness and capacity of the prediction network. The fine-tuning loss function with data blending can be written as :

(5)

where and are the cross-entropy losses on the support set and the generated pseudo support set respectively, and is a trade-off parameter.

In addition to the components above, we also apply a label propagation (LP) procedure to refine the prediction results on the query set, following the LP procedure in [8].

Methods
ChestX ISIC
5-way 5-shot 5-way 20-shot 5-way 50-shot 5-way 5-shot 5-way 20-shot 5-way 50-shot
Fine-tuning [3] 25.97%0.41% 31.32%0.45% 35.49%0.45% 48.11%0.64% 59.31%0.48% 66.48%0.56%
BSDB 27.50%0.45% 34.62%0.50% 37.80%0.53% 53.52%0.63% 65.12%0.62% 70.76%0.64%
BSDB+LP 27.47%0.46% 34.93%0.49% 38.46%0.55% 54.95%0.68% 66.55%0.61% 71.87%0.62%

BSDB (Ensemble)
28.38%0.47% 37.75%0.51% 42.10%0.54% 54.54%0.65% 67.66%0.62% 74.27%0.56%
BSDB+LP (Ensemble) 28.40%0.46% 38.17%0.53% 42.73%0.53% 56.17%0.66% 68.95%0.60% 75.08%0.54%

Methods
EuroSAT CropDiseases
5-way 5-shot 5-way 20-shot 5-way 50-shot 5-way 5-shot 5-way 20-shot 5-way 50-shot
Fine-tuning [3] 79.08%0.61% 87.64%0.47% 90.89%0.36% 89.25%0.51% 95.51%0.31% 97.68%0.21%
BSDB 83.14%0.61% 90.63%0.38% 94.03%0.28% 93.48%0.42% 98.19%0.18% 99.20%0.11%
BSDB+LP 85.43%0.58% 92.30%0.34% 95.06%0.27% 95.31%0.37% 98.90%0.16% 99.54%0.09%

BSDB (Ensemble)
84.50%0.55% 92.20%0.33% 95.17%0.25% 94.05%0.41% 98.47%0.18% 99.30%0.10%
BSDB+LP (Ensemble) 86.66%0.54% 93.57%0.31% 96.07%0.24% 95.93%0.37% 99.16%0.13% 99.62%0.08%

Table 1: Results on the CD-FSL benchmark tasks with unlabeled data.
Methods Average
Fine-tuning [3] 67.23% (0.46%)
BSDB 70.67% (0.46%)
BSDB+LP 71.73% (0.44%)

BSDB (Ensemble)
72.36% (0.43%)
BSDB+LP (Ensemble) 73.38% (0.42%)

Table 2: Average results across all datasets and shot levels.

3 Experiments

3.1 Experiment Details

In the experiments, we follow the protocol of [3], using 15 images of each category as the query set, and using 600 randomly sampled few-shot learning tasks in each target domain. The average accuracy and 95confidence interval are reported. We use Resnet-10 as the backbone and the fully connected layer as the classifier

. In the per-training process, models are trained for 400 epoch, and the hyperparameter is set as

. The networks are trained by SGD with an initial learning rate of 0.001, a momentum of 0.9, and a weight decay of 0.0005. In the fine-tuning process, we set and , set the learning rate to 0.01 and conduct fine-tuning for 100 epochs. In the label propagation step, we use for the K-NN graph construction, set its hyperparameter .

3.2 Experiment Results

We compare the proposed model with the strong fine-tuning baseline method reported in  [3]. We report the results of several variants of the proposed model, including a single prediction network with BSR and Data Blending without ensemble, denoted as BSDB, and its extension with the label propagation (LP). Then, we extend these variants to the ensemble framework with 10 branches. The CD-FSL results in four different target domains are reported in Table 1, and the average accuracy across all datasets and shot levels are shown in Table 2.

We can see that the single model BSDB has already outperformed the fine-tuning baseline (70.67 vs 67.23). With the ensemble model with 10 branches, the model performance can be further improved (72.36). By further adding the LP refinement, we obtain the best performance with BSDB+LP (ensemble) (73.38).

4 Conclusion

In this paper, we presented an ensemble model with batch spectral regularization and data blending for Cross-Domain Few-Shot Learning With Unlabeled Data. In the pre-training process, batch spectral regularization is deployed, and in the fine-tuning process, the information of the unlabeled data is exploited through data blending. We also further refined the prediction results with label propagation. The overall method exhibited excellent CD-FSL performance.

References

  • [1] X. Chen, S. Wang, B. Fu, M. Long, and J. Wang (2019)

    Catastrophic forgetting meets negative transfer: batch spectral shrinkage for safe transfer learning

    .
    In NeurIPS, Cited by: §2.2.
  • [2] N. Codella, V. Rotemberg, P. Tschandl, M. E. Celebi, S. Dusza, D. Gutman, B. Helba, A. Kalloo, K. Liopyris, M. Marchetti, et al. (2019) Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368. Cited by: §1.
  • [3] Y. Guo, N. C. Codella, L. Karlinsky, J. R. Smith, T. Rosing, and R. Feris (2019) A new benchmark for evaluation of cross-domain few-shot learning. arXiv preprint arXiv:1912.07200. Cited by: §1, Table 1, Table 2, §2, §3.1, §3.2.
  • [4] P. Helber, B. Bischke, A. Dengel, and D. Borth (2019) Eurosat: a novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12 (7), pp. 2217–2226. Cited by: §1.
  • [5] S. P. Mohanty, D. P. Hughes, and M. Salathé (2016) Using deep learning for image-based plant disease detection. Frontiers in plant science 7, pp. 1419. Cited by: §1.
  • [6] P. Tschandl, C. Rosendahl, and H. Kittler (2018) The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data 5, pp. 180161. Cited by: §1.
  • [7] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers (2017) Chestx-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In CVPR, Cited by: §1.
  • [8] M. Ye and Y. Guo (2017)

    Labelless scene classification with semantic matching.

    .
    In BMVC, Cited by: §1, §2.3.
  • [9] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz (2017) Mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412. Cited by: §2.3.