Cross-Domain Few-Shot Classification via Adversarial Task Augmentation

04/29/2021 ∙ by Haoqing Wang, et al. ∙ Peking University 0

Few-shot classification aims to recognize unseen classes with few labeled samples from each class. Many meta-learning models for few-shot classification elaborately design various task-shared inductive bias (meta-knowledge) to solve such tasks, and achieve impressive performance. However, when there exists the domain shift between the training tasks and the test tasks, the obtained inductive bias fails to generalize across domains, which degrades the performance of the meta-learning models. In this work, we aim to improve the robustness of the inductive bias through task augmentation. Concretely, we consider the worst-case problem around the source task distribution, and propose the adversarial task augmentation method which can generate the inductive bias-adaptive 'challenging' tasks. Our method can be used as a simple plug-and-play module for various meta-learning models, and improve their cross-domain generalization capability. We conduct extensive experiments under the cross-domain setting, using nine few-shot classification datasets: mini-ImageNet, CUB, Cars, Places, Plantae, CropDiseases, EuroSAT, ISIC and ChestX. Experimental results show that our method can effectively improve the few-shot classification performance of the meta-learning models under domain shift, and outperforms the existing works. Our code is available at https://github.com/Haoqing-Wang/CDFSL-ATA.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Few-shot classification [10]

aims to classify instances from unseen classes with few labeled samples in each class. To this end, many meta-learning based models elaborately design various task-shared inductive bias (e.g., the metric function

[18], the inference mechanism [4, 14]) to solve few-shot classification tasks. They demonstrate promising performance when evaluated on the tasks from the same domain with the training tasks (e.g., both training and testing are on the mini-ImageNet classes). However, some works [3, 8]

have shown that the existing meta-learning models perform undesirably when there exists domain shift between training tasks and test tasks (e.g., training on the mini-ImageNet classes and testing on the ISIC classes), and even underperform compared to traditional pre-training and fine-tuning. As a result, the cross-domain few-shot classification problem has attracted considerable attention from the machine learning community, especially the difficult

single domain generalization problem [19, 8].

Figure 1: Compared with (a) generalizing from the single source task distribution , (b) the worst-case problem considers the wider task distribution space . , and represent the unknown task distributions.

To generalize to unseen domains without accessing any data from those domains, some domain generalization models have been proposed [20, 12]. They learn the classifiers that generalize to the unseen domains, and assume that the source and unseen domains share the same classes. However, in the few-shot classification problem, the classes in the target tasks are unseen before. The most similar works to ours are [19] and [17] which aim to improve the performance of meta-learning models in cross-domain tasks. [19] introduces the feature-wise transformation layers for the metric-based meta-learning models which modulate the feature activation with affine transformation to improve the robustness of the metric functions. But as mentioned above, the different meta-learning models have various inductive bias, not just the metric functions. [17] uses explanation-guided training to prevent the feature extractor from overfitting to specific classes, but it needs to manually derive the explanations for different meta-learning models.

We aim to find a method that is general, easy to implement and can improve the robustness of various inductive bias. To this end, we resort to the task augmentation techniques which constructs ’challenging’ virtual tasks to increase the diversity of training tasks. For image classification, various hand-crafted data augmentation techniques (e.g., horizontal flip, random crop and color jitter) can be used for task augmentation. However, they have limited effect and cannot perform adaptive augmentation for different inductive bias. Recently, some works [16, 20] proposed adaptive sample (e.g., images) augmentation methods to improve the robustness of the model. Inspired by these works, we propose an inductive bias-adaptive task augmentation method to improve the cross-domain generalization ability of the meta-learning models.

Concretely, we consider the worst-case problem around the source task distribution

(1)

where represents the model parameters,

is the loss function which depends on the model’s inductive bias, and

is the distance metric between task distributions. Compared with minimizing the loss function on the source task distribution , the solution to the worst-case problem (1) guarantees good performance on the wider space of task distributions which are distance away from , as illustrated in Figure 1. By solving the worst-case problem (1), we propose a task augmentation method. Since the loss function depends on the inductive bias, our method can adaptively generate ’challenging’ tasks according to the different inductive bias and increase the diversity of training tasks which improves the robustness of the model under domain shift. What’s more, our method can be used as a plug-and-play module for various meta-learning models.

The main contributions of this work are as follows:

  • To the best of our knowledge, this is the first work that introduces task augmentation into cross-domain few-shot classification to improve the generalization ability of meta-learning models under domain shift.

  • We consider the worst-case problem around the source task distribution , and propose a plug-and-play inductive bias-adaptive task augmentation method, which can be conveniently used for various meta-learning models.

  • We evaluate our method on the RelationNet [18], the GNN [4] and one of the state-of-the-art models TPN [14] with extensive experiments under the cross-domain setting. Experimental results show our method can significantly improve the cross-domain generalization performance of these models and outperforms [19] and [17]. And under the same settings, the meta-learning models with our adversarial task augmentation module can outperform the traditional pre-training and fine-tuning under domain shift.

2 Related Work

Cross-domain few-shot classification.

Although various meta-learning models for few-shot classification have achieved impressive performance, they fail to generalize to unseen domains. To this end, [19] uses the feature-wise transformation layers to simulate various distributions of image features during training and thus improve the generalization capability of the metric function. [17] uses the explanation methods to upscale the features which are more relevant to the prediction, and penalize them more when overfitting occurs to avoid the intermediate features from specializing towards fixed classes. Different from them, we focus on improving the robustness of various inductive bias. Other models [13, 21] that appear in the CVPR 2020 Cross-Domain Few-Shot Learning Challenge use various techniques to solve cross-domain few-shot classification tasks, e.g., batch spectral regularization, model ensemble and large margin mechanism.

Domain generalization.

Domain generalization methods [20, 12] have been developed to generalizing from single or multiple seen domains to the unseen domains without accessing samples from them. However, these models consider the setting that the seen and unseen domains share the same categories. In contrast, in the cross-domain few-shot classification problem, the seen and the unseen domains have completely disjoint categories.

Adversarial training.

Adversarial training [7]

aims to make deep neural networks be capable of resistant to adversarial attacks.

[16] proposes principled adversarial training through distributionally robust optimization, where virtual images are model-adaptively generated by maximize some risk and the models learned with these new images become more robust. In this work, we introduce a similar model-adaptive augmentation method into the meta-learning models, and propose a plug-and-play module to generate virtual ’challenging’ tasks to improve the robustness of various meta-learning models.

3 Method

3.1 Preliminaries

3.1.1 Few-Shot Classification

Each few-shot classification task consists of a support set and a query set . If the support set contains classes with samples in each class, the few-shot classification task is called -way -shot. The query set contains the samples from the same classes with the support set . Formally, a few-shot task can be defined as , where and . Given the support set , our goal is to classify the samples in the query set correctly to one of the classes. Typically, the base learner is needed to output the optimal classifier of the task basing on the support set , i.e., and it depends on the inductive bias.

The main difference among meta-learning models for few-shot classification lies in the design choices for the inductive bias. For examples, the RelationNet [18]

chooses the metric function based on convolutional neural networks (CNNs), the GNN

[4] applies generic message-passing inference mechanism on a partially observed graphical model, and the TPN [14] utilizes the transductive label propagation. Meta-learning models aim to learn these inductive bias over a collection of tasks which are assumed to be sampled from the task distribution , and the learning objective is

(2)

where is the loss function, such as the classification loss of the samples in the query set , and represents the model parameters.

3.1.2 Cross-Domain Setting

Generally, the target tasks are assumed to come from the source task distribution . However, in this work we consider the few-shot classification under domain shift. Concretely, we focus on the single domain generalization problem because the data from multiple training domains may not always be available due to data acquiring budget or privacy issue. We denote the domain as the distribution of the few-shot classification tasks. The target tasks come from several unknown domains . The goal is to learn a meta-learning model using the single source domain , such that the model can generalize to the several unseen domains.

Input: Source task distribution ; initialized parameters
Require: Learning rate and ; iteration number for early stopping

; probability of using original data

; candidate pool of filter sizes
Output: learned parameters

1:  Initialize:
2:  while training do
3:     Randomly sample source task from
4:      ( with probability )
5:     for  do
6:        
7:     end for
8:     
9:  end while
Algorithm 1 Adversarial Task Augmentation
Model CUB Cars Places Plantae
1-shot 5-shot 1-shot 5-shot 1-shot 5-shot 1-shot 5-shot
RelationNet
+FT [19]
+LRP [17]
+ATA (Ours)
GNN
+FT [19]
+LRP [17]
+ATA (Ours)
TPN
+FT [19]
+ATA (Ours)
CropDiseases EuroSAT ISIC ChestX
1-shot 5-shot 1-shot 5-shot 1-shot 5-shot 1-shot 5-shot
RelationNet
+FT [19]
+LRP [17]
+ATA (Ours)
GNN
+FT [19]
+LRP [17]
+ATA (Ours)
TPN
+FT [19]
+ATA (Ours)
Table 1: Few-shot classification accuracy of 5-way 1-shot/5-shot tasks trained with the mini-ImageNet dataset. +FT means using the feature-wise transformation layers, +LRP means using the explanation-guided training, +ATA

means using our adversarial task augmentation. Marked in bold are the best results in each block, as well as other results with an overlapping confidence interval.

3.2 Adversarial Task Augmentation

Next, we solve the worst-case problem (1) to get a plug-and-play model-adaptive task augmentation module. In order to make the loss function depending on the inductive bias of the meta-learning models, inspired by Equation (2), we define it as

(3)

To allow task distributions that have different support to that of the source task distribution , we use the Wasserstein distances as the metric . Concretely, for task distribution and both supported on the task space , let denotes their couplings, meaning measures on with and . The Wasserstein distance between and is

(4)

where is the transportation cost from to , satisfying and .

Basing on the Proposition 1 in [16] and the Theorem 1 in [1], we have the following duality result.

Lemma 1.

Let and be continuous. Let be the cross domain surrogate. For any distribution and any ,

(5)

and for any , we have

(6)

Thus, the continuity of the loss function and the transportation function with respect to needs to be satisfied to solve the worst-case problem (1). For this, we model the task

as the vector with the fixed dimension. A common approach is to use task embedding to model the tasks, but it is not applicable here. The reasons are as follows: 1) it conflicts with the definition of the loss function

, i.e., calculating requires the support set and query set , not the task embedding; 2) we expect and to be the distribution of the tasks to generate virtual tasks not the task embedding. We treat each task as the vector concatenated by the samples and labels it contains, i.e.,

(7)

where

denotes the concatenation operation. This definition is equivalent to treating the distribution of the tasks as the joint distribution of samples and labels within the task, i.e.

(8)

Meanwhile, we assume that the number of samples in the task is fixed, so as the dimension of . The change of the elements of the samples in task leads to the change of , so the continuity of and can be satisfied. Another consideration for assuming a fixed number of samples in a task is that we want to generate the virtual task containing the same number of samples with source task .

In the worst-case problem (1), the supremum over task distributions is intractable, so we consider its Lagrangian relaxation with penalty parameter

(9)

Applying Lemma 1, our optimization problem becomes

(10)

Further, applying the Theorem 4.13 in [2], we have the following result to solve the problem (10).

Lemma 2.

Let be -Lipschitz smooth and be -strongly convex for each . If , there is unique satisfying

(11)

and

(12)

In the Lemma 2, ensures that the function is -strongly concave in , so that there exists the unique .

According to Lemma 2, we can solve the Equation (11) to generate the virtual cross-domain task and use it to update the model parameters

(13)

where is the learning rate. From the Equation (11), we can make two insights: 1) for the meta-learning models, the virtual task is more ’challenging’ than the source task and the loss function satisfies , so the model learned with it tends to be more robust; 2) since the loss function depends on the inductive bias, solving the Equation (11) is equivalent to adaptively generating the virtual task that is more ’challenging’ to the currently learned inductive bias.

For deep networks and other complex models, the supremum problem in Equation (11) cannot be solved accurately, so we use the gradient ascent process with early stopping to solve it. Concretely, let the set of all samples in a task be and their corresponding labels be , i.e.,

(14)
(15)

then and . We use the source task as the initialization of , and the task vector defined in Equation (7) as the optimization variable. Considering that in different few-shot classification tasks, samples with the same labels can correspond to different real category (e.g., cat, dog), so the change of label is not considered here, i.e., keeping . In the -th iteration, the update is

(16)

Here the regularization term is removed from the iteration goal and the reasons are as follows: 1) this term is used to constrain the proximity of the virtual task to the source task , but using the source task as the initialization and early stopping can achieve the same effect, see the Section 4.3 for the detailed discussion; 2) it reduces the computational overhead and hyper-parameters requiring hand-tuning. After iterations, we get the virtual ’challenging’ task and update the model parameters with it. See Algorithm 1 for the full description of the training process. Given an unseen task, the inference process is the same as the original meta-learning model. Note that if , Algorithm 1 becomes the original meta-learning training process, so our method is a plug-and-play module.

In this paper, we mainly consider the cross-domain few-shot image classification, and the convolutional neural networks (CNNs) are the necessary tools. However, CNNs tend to overfit on superficial local textures [5], so we use the random convolutions [11] that can change the local textures and keep the shape unchanged as the auxiliary augmentation technique for our adversarial task augmentation. Concretely, given an input image , where and are the height and width and is the number of feature channels, the filter size is first randomly sampled from the candidate pool

, then the Xavier normal distribution

[6]

is used to initialize the convolution weights. The stride and padding size are determined to make the transformed image having the same size with

. In practice, for each task sampled from , we keep its all samples unchanged with probability , or use the same random convolution on its all samples to get a new task for training, as shown in the fourth line of Algorithm 1.

(a) 5-way 1-shot setting
(b) 5-way 5-shot setting
Figure 2: Average classification accuracy on eight unseen domains (CUB, Cars, Places, Plantae, CropDiseases, EuroSAT, ISIC and ChestX) under 5-way 1-shot/5-shot setting. It respectively shows the results without the random convolution (’w/o RC’) and that obtained by our complete task augmentation module. The results of the base meta-learning models (’base’), the models with the explanation-guided training (’+LRP’) and the models with the feature-wise transformation layers (’+FT’) are also shown for a clearer comparison.

4 Experiments

In this section, we evaluate the adversarial task augmentation method on the RelationNet [18], the GNN [4] and one of the state-of-the-art meta-learning models TPN [14], and compare it with [19] and [17]. These meta-learning models have different kinds of inductive bias so as to verify the versatility and effectiveness of our method.

4.1 Experimental Settings

Datasets.

We conduct extensive experiments under cross-domain settings, using nine few-shot classification datasets: mini-ImageNet [15], CUB, Cars, Places, Plantae, CropDiseases, EuroSAT, ISIC and ChestX, which are introduced by [19] and [8]. Each dataset consists of train/val/test splits and please refer to these references for more details. We use the mini-ImageNet domain as the single source domain, and evaluate the trained model on the other eight domains. We select the model parameters with the best accuracy on the validation set of the mini-ImageNet for model evaluation.

Implementation details.

In all experiments, we use the ResNet-10 [9] as the feature extractor and use the Adam optimizer with the learning rate . We find that setting or is sufficient to obtain satisfactory results, and we choose the learning rate of the gradient ascent process from . We set for all experiments and choose from . We evaluate the model in the 5-way 1-shot/5-shot settings using 2,000 randomly sampled episodes with 16 query samples per class, and report the average accuracy () as well as confidence interval.

Pre-trained feature extractor.

Instead of optimizing from scratch, we apply an additional pre-training strategy as in [19] which pre-trains the feature extractor by minimizing the standard cross-entropy classification loss on the 64 training classes in the mini-ImageNet dataset.

(a) 5-way 1-shot setting
(b) 5-way 5-shot setting
Figure 3: Average classification accuracy on eight unseen domains (CUB, Cars, Places, Plantae, CropDiseases, EuroSAT, ISIC and ChestX) under 5-way 1-shot/5-shot setting. It respectively shows the results of the iteration goal without the regularization term (’w/o Reg’), with the sample-wise Euclidean distance regularization term (’Euclid’) and with the maximum mean discrepancy (MMD) distance regularization term (’MMD’).
Model CUB Cars Places Plantae
1-shot 5-shot 1-shot 5-shot 1-shot 5-shot 1-shot 5-shot
Fine-tuning
RelationNet+ATA
GNN+ATA
TPN+ATA
CropDiseases EuroSAT ISIC ChestX
1-shot 5-shot 1-shot 5-shot 1-shot 5-shot 1-shot 5-shot
Fine-tuning
RelationNet+ATA
GNN+ATA
TPN+ATA
Table 2: Few-shot classification accuracy() of 5-way 1-shot/5-shot tasks trained with the mini-ImageNet dataset and fine-tuned with the augmented support dataset from the unseen tasks. stands for using the fine-tuning method described in the Section 4.4.

4.2 Evaluation for Adversarial Task Augmentation

We apply the adversarial task augmentation module to the RelationNet, the GNN, and the TPN models to evaluate its effect on improving the cross-domain generalization ability of the meta-learning models, and compare it with [19] which adds the feature-wise transformation layers to the feature extractor and [17] which uses explanation-guided training. All models are trained and tested in the same environment for the fair comparison and the results are shown in Table 1.

We can observe that with our adversarial task augmentation module, the cross-domain few-shot classification accuracy of the meta-learning models is consistently and significantly improved. And compared with [19] and [17], our method achieves comparable or more significant improvement, which means that adaptively enhancement of different inductive bias is more effective than enhancing artificially determined inductive bias. Moreover, applying the feature-wise transformation layers even harms the cross-domain generalization performance of the TPN model, while our method is still effective, which means that our method is more general, not just suitable for the metric-based meta-learning models (the RelationNet and the GNN models).

4.3 Ablation Study

Effect of the random convolution.

As aforementioned, we use the random convolution for auxiliary task augmentation. Here we study the effect it brings. Figure 2 shows the average few-shot classification accuracy on eight unseen domains without random convolution and that obtained by complete method. As we can see, without the random convolution, our method still improves the cross-domain generalization ability of the meta-learning models, and outperforms ’+FT’ [19] and ’+LRP’ [17]. Using the random convolution can achieve further improvements.

Is the regularization term useful?

In the Section 3.2, we remove the regularization term from the iteration goal and here we will show it is reasonable. We consider two common candidates for distance and find that they do not bring benefits. As we assumed, the label composition of few-shot classification tasks is the same as each other, so the distance between task and depends on the samples and . Let the feature vectors of and are and with . The first candidate is the direct sample-wise Euclidean distance, i.e., and the second candidate is the maximum mean discrepancy (MMD) distance, i.e., . Figure 3 shows the average few-shot classification accuracy on eight unseen domains without the regularization term, with the sample-wise Euclidean distance regularization term and with the maximum mean discrepancy (MMD) distance regularization term. We set the hyper-parameter and do not use the random convolution for the clear comparison. As we can see, using the regularization term does not bring obvious benefits or even is harmful, which shows that early stopping has already imposed enough constraints, and using the regularization term leads to excessive limits.

4.4 Comparison with Fine-tuning

[8] shows that in the cross-domain few-shot classification problem, traditional pre-training and fine-tuning outperform the meta-learning models. Here we re-examine this phenomenon through a different fair comparison, i.e., using data augmentation while solving an unseen task. Given an unseen task consisting of support samples and

query samples, for the fine-tuning, we use the pre-trained feature extractor as initialization and a fully connected layer as the classification head. For each epoch, we generate

pseudo samples for each class based on the support samples using the data augmentation method from [21] and use these pseudo samples and the support samples for fine-tuning where we use the SGD optimizer with the learning rate 0.01 and the momentum 0.9 as in [8]. For the meta-learning models, we use the parameters trained on the mini-ImageNet with our adversarial task augmentation method as the initialization and adapt the meta-learning models to the same samples as above at each iteration where the pseudo samples are used as the pseudo query set, and we use the Adam optimizer with the learning rate 0.001. All models are fine-tuned for 30 (or 50) epochs in the 5-way 1-shot (or 5-shot) tasks. Since all models use the same amount of target domain data when solving each unseen task, it is a fair comparison. The results are shown in Table 2. As we can see, the meta-learning models with our adversarial task augmentation module significantly outperform the traditional pre-training and fine-tuning even under domain shift.

5 Conclusion

In this paper, we aim to design a new method that can improve the cross-domain generalization capability of meta-learning models in the cross-domain few-shot learning. For this, we consider the worst-case problem around the source task distribution , and propose a plug-and-play inductive bias-adaptive task augmentation method, which significantly improves the cross-domain few-shot classification capability of various meta-learning models, and outperforms the existing works. This is the first work to achieve the above objective by generating ‘challenging’ virtual tasks. We also compare the meta-learning models with pre-training and fine-tuning under the same settings, and find that the meta-learning models with our method outperform the fine-tuning under domain shift.

References

  • [1] J. Blanchet and K. Murthy (2019) Quantifying distributional model risk via optimal transport. Mathematics of Operations Research 44 (2), pp. 565–600. Cited by: §3.2.
  • [2] J. F. Bonnans and A. Shapiro (2013) Perturbation analysis of optimization problems. Springer Science & Business Media. Cited by: §3.2.
  • [3] W. Chen, Y. Liu, Z. Kira, Y. F. Wang, and J. Huang (2019) A closer look at few-shot classification. In 7th International Conference on Learning Representations, ICLR 2019, External Links: Link Cited by: §1.
  • [4] V. Garcia and J. B. Estrach (2018) Few-shot learning with graph neural networks. In 6th International Conference on Learning Representations, ICLR 2018, Cited by: 3rd item, §1, §3.1.1, §4.
  • [5] R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel (2019) ImageNet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. In 7th International Conference on Learning Representations, ICLR 2019, External Links: Link Cited by: §3.2.
  • [6] X. Glorot and Y. Bengio (2010) Understanding the difficulty of training deep feedforward neural networks. In

    Proceedings of the thirteenth international conference on artificial intelligence and statistics

    ,
    pp. 249–256. Cited by: §3.2.
  • [7] I. J. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. In 3rd International Conference on Learning Representations, ICLR 2015, Y. Bengio and Y. LeCun (Eds.), External Links: Link Cited by: §2.
  • [8] Y. Guo, N. C. Codella, L. Karlinsky, J. V. Codella, J. R. Smith, K. Saenko, T. Rosing, and R. Feris (2020) A broader study of cross-domain few-shot learning. In ECCV, Vol. , pp. . Cited by: §1, §4.1, §4.4.
  • [9] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 770–778. Cited by: §4.1.
  • [10] B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum (2015) Human-level concept learning through probabilistic program induction. Science 350 (6266), pp. 1332–1338. Cited by: §1.
  • [11] K. Lee, K. Lee, J. Shin, and H. Lee (2020)

    Network randomization: A simple technique for generalization in deep reinforcement learning

    .
    In 8th International Conference on Learning Representations, ICLR 2020, External Links: Link Cited by: §3.2.
  • [12] Y. Li, Y. Yang, W. Zhou, and T. M. Hospedales (2019) Feature-critic networks for heterogeneous domain generalization. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, K. Chaudhuri and R. Salakhutdinov (Eds.), , Vol. , pp. . External Links: Link Cited by: §1, §2.
  • [13] B. Liu, Z. Zhao, Z. Li, J. Jiang, Y. Guo, H. Shen, and J. Ye (2020) Feature transformation ensemble model with batch spectral regularization for cross-domain few-shot classification. arXiv preprint arXiv:2005.08463. Cited by: §2.
  • [14] Y. Liu, J. Lee, M. Park, S. Kim, E. Yang, S. J. Hwang, and Y. Yang (2019) Learning to propagate labels: transductive propagation network for few-shot learning. In 7th International Conference on Learning Representations, ICLR 2019, Cited by: 3rd item, §1, §3.1.1, §4.
  • [15] S. Ravi and H. Larochelle (2017) Optimization as a model for few-shot learning. In 5th International Conference on Learning Representations, ICLR 2017, External Links: Link Cited by: §4.1.
  • [16] A. Sinha, H. Namkoong, and J. C. Duchi (2018) Certifying some distributional robustness with principled adversarial training. In 6th International Conference on Learning Representations, ICLR 2018, External Links: Link Cited by: §1, §2, §3.2.
  • [17] J. Sun, S. Lapuschkin, W. Samek, Y. Zhao, N. Cheung, and A. Binder (2020) Explanation-guided training for cross-domain few-shot classification. arXiv preprint arXiv:2007.08790. Cited by: 3rd item, §1, §2, Table 1, §4.2, §4.2, §4.3, §4.
  • [18] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales (2018) Learning to compare: relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1199–1208. Cited by: 3rd item, §1, §3.1.1, §4.
  • [19] H. Tseng, H. Lee, J. Huang, and M. Yang (2020) Cross-domain few-shot classification via learned feature-wise transformation. In 8th International Conference on Learning Representations, ICLR 2020, External Links: Link Cited by: 3rd item, §1, §1, §2, Table 1, §4.1, §4.1, §4.2, §4.2, §4.3, §4.
  • [20] R. Volpi, H. Namkoong, O. Sener, J. C. Duchi, V. Murino, and S. Savarese (2018) Generalizing to unseen domains via adversarial data augmentation. In Advances in neural information processing systems, pp. 5334–5344. Cited by: §1, §1, §2.
  • [21] J. Yeh, H. Lee, B. Tsai, Y. Chen, P. Huang, and W. H. Hsu (2020) Large margin mechanism and pseudo query set on cross-domain few-shot learning. arXiv preprint arXiv:2005.09218. Cited by: §2, §4.4.