General deep neural networks (DNNs)
for computer vision problems heavily rely on a large amount of labeled training data and need to be re-trained or fine-tuned when encountering different problems. Besides, the generalization ability of these networks is highly correlated with the diversity and size of the training set. However, collecting adequate amounts of data for practical problems is usually difficult and expensive.
Therefore, learning to characterize different classes with few labeled samples, known as few-shot learning, is necessary. The problem is set under the meta-learning setting, which contains meta-training and meta-testing phases. In the first phase, models are trained with classification tasks where data composed of labeled samples, known as support sets, and unlabeled samples, known as query sets, both sampled from base classes. In the second phase, both sets are sampled from novel classes. Note that base classes and novel classes are disjoint, and thus quick adaptation from base classes to novel classes is indispensable and challenging. As mentioned before, with increasing popularity of few-shot learning problems, myriads of works [7, 8] try designing different metrics to distinguish novel classes. However, a recent work  claims that a general deep neural network, denoted as Baseline, trained under a standard practice shows competitive results with several fine-tuning iterations. With benchmarks well built, they also conduct experiments with few-shot learning methods under the cross-domain setting where a domain shift existed between base classes and novel
classes, such as meta-training on mini-ImageNet and meta-testing on CUB dataset . As these two datasets both contain photos of common objects, such domain difference might be inadequate because humans, in practice, often cope with problems and knowledge dissimilar from ones’ background experience. Therefore, it is worthy of further investigation of cross-domain few-shot learning problem, especially when huge difference lies between domains of base classes and novel classes.
The biggest difference between general DNNs (e.g. Baseline) and few-shot methods is that the former trains and updates parameters with training data at every iteration, whereas the latter makes inference based on support sets and updates parameters with loss over query sets. However, in the meta-testing phase, query sets are reserved for evaluation and thus restrain few-shot methods from adapting to novel domains while general DNNs can still be fine-tuned with support sets. Therefore, few-shot methods without further fine-tuning generally shows inferior performance compared to general DNNs [1, 3].
In this paper, we tackle the cross-domain few-shot learning problem, and propose a fine-tuning method for few-shot models which generates pseudo query images as an alternative and without access to the query set during fine-tuning. Moreover, we propose a prototypical triplet loss and apply the large margin mechanism from face recognition models to improve the performance.
Our main contributions are threefold. First, we propose the pseudo query set and analyze its importance. Second, inspired by the similarity between face recognition and few-shot learning problem, we apply the large margin mechanism to solve the few-shot learning problem. Third, a performance comparison between backbone from Baseline and few-shot models is provided.
2 Related Work
Few-shot image classification is a task to recognize novel classes with only few examples. ProtoNet 
aims to learn representative embedding of data, in order to classify the query samples by measuring euclidean distance. RelationNet is based on the same concept and introduces a learnable similarity metric.
Cross-domain few-shot learning is a branch of few-shot image classification, where base and novel classes are sampled from different domains. We can make use of these data of different domains to make the few-shot classifier more robust. Tseng et al.  use feature-wise transformation layer to generalize image features for simulating different domains. Guo et al.  propose a new benchmark for this problem and provide a detailed comparison between various classifiers.
Large margin mechanism is popular in the open-set face recognition problem. Many models apply large margin mechanism to learn highly discriminative features by maximizing the inter-class cosine margin, CosFace  and ArcFace  are two representative models. How to set the margin is the focus of this series of researches. We consider that cross-domain few-shot learning problem has similar property with the open-set face recognition problem.
3.1 Cross-domain few-shot learning problem
In general few-shot scenario, a model with parameter is meta-trained on the tasks sampled from base classes data and meta-tested on the tasks from novel classes data. A task contains a support set and a query set . This is known as an ”N-way K-shot” few-shot learning problem, as the support set has N classes and each class contains K labeled samples. In addition, the query set is used to evaluate the models inference performance, but the inference results can be used to help model training during the meta-training phase.
To formalize the cross-domain few-shot learning problem, we follow the definition in . The domain
is defined as a joint probability distributionover the input space and label space . The pair represents a sample and its corresponding label sampled from . We denote the marginal distribution of as . The base classes data are sampled from the source domain () with join probability distribution , and novel classes data are sampled from the target domain () with join probability distribution , and specially . In the meta-testing phase, models are allowed to be fine-tuned before inferring query samples. During the fine-tuning, models adapts to the task by training with support set. After that, using the accuracy of inferring the category of query samples to evaluate the models performance. Notably, both the input space and the label space of the source domain and the target domain are disjoint in the cross-domain few shot learning problem.
Our motivation is to provide a fine-tuning method which solves the problem that few-shot models inferring the query set is forbidden during fine-tuning. By generating a pseudo query set in the fine-tuning, few-shot models can be executed the same as in the meta-training phase.
In addition, we try fine-tuning the models in a different way, inspired by [1, 3]. In , Baseline model is trained under a standard way. Then its trained backbone is extracted and concatenated with a new linear classifier in the meta-testing phase, using the support set to fine-tune the backbone and train the linear classifier. Furthermore,  concatenates the backbone with various classifiers, such as cosine classifier or mean-centroid classifier.
We are interested in whether the backbone from different models will have different performance. Thus, the trained backbone from Baseline and ProtoNet is extracted, and a cosine mean-centroid classifier is applied in the meta-testing phase. More precisely, the backbone computes the feature embeddings of each query and support sample, then the class prototype is calculated the same as ProtoNet. At the end, the classifier compares the cosine similarities between the embeddings of query samples and class prototypes to make inference about the category. Moreover, additional operations are applied to assist the adaptation of the backbone during the fine-tuning stage, including prototypical triplet loss (PTLoss) and large margin mechanism (LMM). For PTLoss, we integrate the “prototype” concept and the original triplet loss. On the other hand, the lmm is inspired by the similarity between the few-shot learning problem and the face recognition problem. The detailed explanation is given in the following sections.
3.3 Fine-tune with pseudo query set
As mentioned before, few-shot models need both support sets and query sets to update their parameters, but inferring the query set is prohibited during fine-tuning. Therefore, we leverage support sets and a sequence of digital image processing operations to generate pseudo query sets, such as gamma correction, random erasing, color channel shuffle, flip and rotation. The applying probability of gamma correction and color channel shuffle is set to 0.3, and the probability of the rest operations is set to 0.5. The behavior of each operation is described below.
For the gamma correction, the gamma is uniformly selected from the range [1.0, 1.5] each time. The color channel shuffle randomizes the order of RGB channels of input images. The flip operation applies horizontal or vertical flip to images. For the rotation operation, the rotating degree is randomly chosen from [, , ], and the random erasing operation replaces a random block in an image with its mean RGB values.
Each support image may be used to generate single or multiple pseudo query image(s) according to the number of support images, detailed number will be demonstrated in the experiment section. With pseudo query sets, few-shot models can update the parameters during the fine-tuning stage, as they do with normal query sets in the meta-training phase.
3.4 Prototypical triplet loss (PTLoss)
Inspired by the ProtoNet  and triplet loss , we pull data of the same class closer to their prototype, and push them away from prototypes of other classes. To be more specific, we first calculate the prototypes for each class in the support set. Then each sample (anchor) and its class prototype (positive) are paired with prototypes of all other classes (negative). Then the original triplet loss is applied. Because there are N classes in the support set, each sample will get N-1 triplet loss values. Afterwards, we loop over the samples and sum up all the values as the PTLoss, which can be formalized as,
where is the sample index and is the class index, respectively, and is the class index of sample . The function is the original triplet loss, and is the prototype of the class of , is the prototype from other classes. By applying the PTLoss, we expect that the model can recognize different categories more easily.
3.5 Large margin mechanism (LMM)
In face recognition, models are evaluated under closed-set setting or open-set setting. For the closed-set setting, all testing identities are predefined in the training data. Hence, it can be regard as a classification problem. On the other hand, under the open-set setting, the test identities and the training identities are usually disjoint. Thus, face recognition models need to learn how to recognize the unseen identities in the testing phase.
Many face recognition models [11, 2] apply large margin mechanisms to promote their performance. We argue that the cross-domain few-shot learning problem is similar to the open-set face recognition problem. Consequently, those large margin mechanisms should also work in the cross-domain few-shot learning problem. We apply CosFace  as one of the losses during the fine-tuning stage, helping the model to recognize different novel categories.
|# of iterations (tasks)||600|
# of fine tune epoch
|margin in PTLoss||1.0|
|s in lmm (cosface)||30.0|
|m in lmm (cosface)||0.35|
4.1 Experiment setting
We use the benchmark proposed in , models need to be trained or meta-trained on mini -ImageNet dataset and meta-tested on four different domains datasets, including CropDisease, EuroSAT, ISIC and ChestX datasets. All rules are followed, the models will be evaluated on 5-way 5-shot, 5-way 20-shot, and 5-way 50 shot settings for each dataset. We refer to  for the detailed introduction.
For the backcone architecture, we choose ResNet10 for a fair comparison. Meanwhile, we also provide results with other backbones. The size of the pseudo query set differs in different evaluation setting. For 5-shot, each support sample generates 4 pseudo query images, and the size of the pseudo query set is 100. For 20-shot, each support sample produces 2 pseudo query images, and the size of the pseudo query set is 200. Finally, for the 50-shot, we select 40 support samples from each class and each sample generates 1 pseudo query image, and thus the size of the pseudo query set is also 200. In addition, the transductive inference mentioned in  are applied in all experiments.
For the hyper-parameters used in the training (or meta-training) phase, we follow all the setting in , except for the number of task per epoch. We modify the value from 100 to 300, increasing tasks for few-shot models to solve in the meta-training phase. And for the hyper-parameters used in the meta-testing phase, we illustrate all values in Table 1.
4.2 Results and analysis
We first discuss the importance of the pseudo query set. The results of two few-shot models, ProtoNet(PN)  and RelationNet(RN) , are shown in Table 2. The check mark in the pqs columns indicates that the model is fine-tuned with pseudo query set. It’s obvious from the results that few-shot models fine-tuning with pseudo query set gets better performance rather than inferring the task directly.
Now we compare the performance of backbones from Baseline and few-shot models. The column fine-tune demonstrates the fine-tuning method from  or ours is applied. Our method contains all components introduced in Section 3. Notably, we select the highest result in , they may come from different classifiers. According to the results in Table 3, the backbone from Baseline get higher performance in most cases under the 5-shot setting, but there is no significant difference between the results under 20-shot or 50-shot setting. When comparing fine-tuning method between ours and , our method improves the performance significantly in first three datasets under all setting. For ChestX dataset, we can get competitive performance under 20-shot and 50-shot setting. Moreover, we further investigate the performance from different backbone architectures. Not surprisingly, ResNet18 obtains better results against ResNet10 in general.
In this paper, we tackle the cross-domain few-shot learning problem and propose the pseudo query sets for few-shot models to solve the problem that models can’t infer query set during fine-tuning. We observe that few-shot models can get outstanding results after several fine-tuning iterations. According to Table 2, it shows that few-shot models still need appropriate fine-tuning when there is a large domain shift between the domains of base classes and novel classes. In addition, we try fine-tuning the backbones extracted from the models with PTLoss and large margin mechanisms, and surprisingly found that the backbone performance from Baseline and few-shot models are competitive under same network architecture. We conclude that the backbone trained in a standard way has robustness. Even if the parameter updating switched to the few-shot style, the backbone can still adapts to the tasks rapidly. Experiment results illustrate our fine-tuning methods have a significant performance improvement compared to the baseline methods. Moreover, how to promote the performance on ChestX dataset needs further survey.
-  (2019) A closer look at few-shot classification. Note: In ICLR External Links: Cited by: §1, §1, §3.2.
-  (2019-06) ArcFace: additive angular margin loss for deep face recognition. Note: In CVPR Cited by: §2, §3.5.
-  (2019) A new benchmark for evaluation of cross-domain few-shot learning. Note: arXiv preprint arXiv:1912.07200 Cited by: §1, §2, §3.1, §3.2, §4.1, §4.1, §4.1, §4.2, Table 3.
-  (2018) Few-shot learning with metric-agnostic conditional embeddings. Note: arXiv preprint arXiv:1802.04376 External Links: Cited by: §1.
F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.) (2012)
ImageNet classification with deep convolutional neural networks. Curran Associates, Inc.. Note: In NeurIPS External Links: Cited by: §1.
-  (2015) FaceNet: a unified embedding for face recognition and clustering. IEEE Computer Society. Note: In CVPR External Links: Cited by: §3.2, §3.4.
-  I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.) (2017) Prototypical networks for few-shot learning. Curran Associates, Inc.. Note: In NeurIPS External Links: Cited by: §1, §2, §3.4, Table 2, §4.2.
-  (2018) Learning to compare: relation network for few-shot learning. IEEE. Note: In CVPR External Links: Cited by: §1, §2, Table 2, §4.2.
-  (2020) Cross-domain few-shot classification via learned feature-wise transformation. Note: In ICLR Cited by: §2.
-  D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.) (2016) Matching networks for one shot learning. Curran Associates, Inc.. Note: In NeurIPS External Links: Cited by: §1.
-  (2018) CosFace: large margin cosine loss for deep face recognition. Note: In CVPR External Links: Cited by: §2, §3.5.