1 Introduction
FewShot Learning (FSL) promises to allow a machine to learn novel concepts from limited experience, i.e. few novel target data and datarich source data. Typically, the defaulted FSL assumes that the source and target data is in the same domain, but belong to different classes. In practice, FSL is required to generalise to different target domains. CrossDomain FewShot Learning (CDFSL) [tseng2020cross_fwt, guo2020broader, liang2021boosting_cdfsl, fu2021metamixup] has been studied more recently. In CDFSL, the target data not only has a different label space but also are from a different domain to the source data.
It is nontrivial to directly extend the general FSL approach to address the CDFSL challenges. In fact, many promising FSL methods [snell2017prototypical, finn2017model, sung2018learning, satorras2018fewgnn] performed poorly in CDFSL [guo2020broader, tseng2020cross_fwt]. The central idea of these general FSL methods is to transfer and generalise the visual representations learned from source data to target data. However, the significant visual domain gap between the source and target data in CDFSL makes it fundamentally difficulty to learn a shared visual representation across different domains.
A few recent CDFSL studies [tseng2020cross_fwt, liu2020urt, liang2021boosting_cdfsl, wang2021cross_ata] try to learn a generalisable feature extractor to improve model transferability, which is a popular idea in domain generalisation and domain adaptation [volpi2018generalizing_dataaug, NEURIPS2020_adver_dataaug, zhou2021dg_mixstyle, li2021simple, wu2021striking] where the source and target domains share the same label space. Empirically, this approach shows some improvement on CDFSL but it does not model any visual and label characteristics of the target domain and more importantly their crossdomain impact on the pretrained source domain representation. We argue this crossdomain mapping between the source domain representation and its interpretion in the context of the target domain data characteristics is essential for effective CDFSL. From a related perspective, other CDFSL studies have considered finetuning the source domain feature representation from augmenting additional support data in the target domain, e.g. either explicitly augmenting the support data by adversarial training [wang2021cross_ata] and image transformations [guo2020broader], or implicitly augmenting the support data by training an autoencoder [liang2021boosting_cdfsl]. However, these methods for CDFSL are straightforward dataaugmentation methods for increasing training data in target domain model finetuning, without considering how to quantify crossdomain relevance of the pretrained source domain representation.
In this work, we consider an alternative approach with a new perspective to treat crossdomain fewshot learning as an image retrieval task. We wish to optimise model adaptation by leveraging target domain retrieval task context, that is, not only the labelled support data but also the unlabelled query data. To that end, we use a generic representation pretrained by a cross entropy loss and a simple distancebased classifier as a baseline, then employ a reciprocal neighbour discovery (as in Fig. 1) and encoding process to calibrate pairwise distances between each unlabelled query image and its likely matches. Our idea is both orthogonal and complementary to other generalisable model learning methods [guo2020broader, liu2020feature, liang2021boosting_cdfsl]. It can be flexibly used in either finetuning or without finetuning based model learning.
Generally, the distance matrix for CDFSL task contains many incorrect results as this distance is built on a potentially biased pretrained source domain representational space. To calibrate this distance matrix towards the target domain so to reduce its bias to the source domain, we explore the reranking concept in the target domain by considering CDFSL optimisation as reranking in a retrieval task given fewshots as anchor points. As in Fig. 1, reranking first computes a nearest neighbour ranking list. This is further expanded by discovering the reciprocal nearest neighbours in the target domain. The expanded ranking list is used for recomputing a Jaccard distance to measure the difference between the original ranking list and the expanded ranking list, achieving a more robust and accurate distance matrix.
Critically, a pretrained representation from source domain is biased and poor for generalisation crossdomains in CDFSL. The reason is that conventional FSL methods assume implicitly linear transformations mostly between the source and target data as they are sampled from the same domain. This becomes invalid in CDFSL with mostly nonlinear transformations across source and target domains. To address this problem, we propose a taskaware subspace mapping to minimise transferring taskirrelevant representational information from the source domain. In particular, instead of mapping to a linear projection space, we explore a hyperbolic tangent function to project the source domain representation to a nonlinear space
. Compared to the linear Euclidean space, this nonlinear space performs a dimensionality reduction to optimise the retention of transferrable information from the source to the target domain. Moreover, we explore the idea of reranking to calibrate and align two distance matrices in two representational spaces between the original pretrained source domain linear space and the new nonlinear subspace. The calibrated matrices are combined to construct a single distance matrix for the target domain in CDFSL. We call this Ranking Distance Calibration (RDC).To impose the above distance calibration into the representational space transform, we approximate the distance matrices by their corresponding distributions, and then a KullbackLeibler (KL) divergence loss function is optimised for iteratively mapping the original distance distribution from the source domain towards the calibrated space. This provides an additional RDC FineTuning (RDCFT) model optimisation.
Our contributions from this work are threefold: (1) To transform the biased distance matrix in the source domain representational space towards the target domain in CDFSL, we use a reranking method to recompute a Jaccard distance for distance calibration by discovering the reciprocal nearest neighbours within the task. We call this Ranking Distance Calibration. (2) We propose a nonlinear subspace to shadow the pretrained source domain representational space. This is designed to model any inherent nonlinear transform in CDFSL and used to facilitate the distance calibration process between the source and target domains. By modelling explicitly this nonlinearity, we formulate a more robust and generalisable Ranking Distance Calibration (RDC) model for CDFSL. (3) We further impose RDC as a constraint to the model optimisation process. This is achieved by a RDC with FineTuning (RDCFT) for iteratively mapping the original source domain distance distribution to a calibrated target domain distance distribution for a more stable and improved CDFSL.
We evaluated the proposed RDC and RDCFT methods for CDFSL on eight target domains. The results show that RDC can improve notably the conventional distancebased classifier, and RDCFT can improve the representation for target domain to achieve competitive or better performance than the stateoftheart CDFSL models.
2 Related Work
Fewshot learning The approaches for general FSL can be broadly divided into two categories: optimisationbased methods [finn2017model, ravi2016optimization, oh2021boil] which learn a generalisable model initialisation and then adapt the model on a novel task with limited labelled data, and metric learning methods [snell2017prototypical, sung2018learning, zhang2020deepemd, li2021plain] that metalearn a discriminative embedding space where the sample in novel task can be wellclassified by a common or learned distance metric. Recently, some researches [chen2019closer, tian2020rethinking, li2020few_self] show that a simple pretraining method followed by a finetuning stage can achieve competitive or better performance than the metric learning methods. This observation also seems to be true in CDFSL [guo2020broader].
Crossdomain fewshot learning The problem of CDFSL was preliminarily studied in FSL [chen2019closer, tian2020rethinking, pan2021mfl], then [tseng2020cross_fwt, guo2020broader] expanded this setup and proposed two benchmarks to train a model on a single source domain and then generalise it to other domains. Some CDFSL works [tseng2020cross_fwt, wang2021cross_ata, liang2021boosting_cdfsl] focus on learning a generalisable model by explicit or implicit data augmentation [tseng2020cross_fwt, wang2021cross_ata, liang2021boosting_cdfsl]. These approaches improve the model generalisation ability but easily result in ambiguous optimisation result since they ignore the adaption process for the target domain. Another methods [phoo2021STARTUP, das2021distractor_cdfsl, fu2021metamixup] target to the adaptation on the target domain by leveraging additional unlabelled data [phoo2021STARTUP], labelled data [fu2021metamixup] or the base data [das2021distractor_cdfsl]. In practice, the increased data can help model adaptation on the target domain but these information are not easy to obtain. In this work, we address the CDFSL problem with an image retrieval view and mine the intratask information to guide a ranking distance calibration process.
Ranking in image retrieval Image Retrieval (IR) is a classical vision task that aims to search from unlabelled gallery data to find the images that are most relevant to the probe image. Ranking is a classical approach in the IR field [he2004manifold_rank, huang2015cross_retrivel, liu2013pop_rank, loy2013person_rank]. Generally, the ranking method computes a ranking list based on a distance metric. And a series of reranking method were proposed as a postprocessing step to improve the initial ranking result. One typical reranking based method [zhong2017rerank] used a concept of reciprocal nearest neighbors [qin2011hello_kreciprocal] to explore more hard positive samples, then the enlarged nearest neighbors list is used to recompute a Jaccard distance as an auxiliary distance matrix. A number of works [liu2018adaptivererank, sarfraz2018posererank] extended this idea to further benefit the retrieval performance. In this work, we reuse the reranking approach to address the CDFSL problem.
3 Methodology
Problem formulation. We start by defining a general FSL problem: given a source dataset and a target dataset , where the classes in and are disjoint. FSL aims to address the way shot classification task in by leveraging the limited data in and the prior knowledge learned from containing lots of labelled images. In specific, a FSL task contains a labelled support set and an unlabelled query set , where the images in and are both from the same classes and / are the number of images per class in /. The goal of FSL is to recognise the unlabelled query set when is small. Noticeable, the and in CDFSL are from different domains. For instance, is a dataset containing lots of natural images while is a dataset collected from the remote sensing field.
Nearest prototype classifier. Before fully developing our main backbone, we define the prototype classifier we used here. Given a feature extractor , we can extract the embedding for image in a FSL task . Nearest Prototype Classifier (NPC) first computes the prototypes for the classes, where the prototype for class is:
(1) 
With the prototypes, the labels for in is assigned by:
(2) 
where is a distance metric, e.g. Euclidean distance in this work, and is the distance between and .
Overview. The key insight of this paper is to formulate the FSL as the Image Retrieval (IR) task by sharing the same angle with [triantafillou2017fewretrivel]. In [triantafillou2017fewretrivel], the authors propose to optimise the mean average precision for FSL. Furthermore, our view of “FSL as IR” also emphasises the importance of maximally leveraging all the available information in this lowdata regime whilst concerns on the calibration of pairwise distances in FSL. In particular, this work follows this view for FSL and consider each sample in FSL as the probe data in IR and treat the whole FSL data as the gallery data. To this end, we propose a ranking distance calibration process for CDFSL, and our key methodology is to repurpose the reranking to find the relevant images from the FSL task for a given image. We overview the proposed method in Fig. 2.
3.1 Ranking Distance Calibration (RDC)
Motivation. Previous works [zhong2017rerank, qin2011hello_kreciprocal] have suggested that discovering the reciprocal nearest neighbors within the gallery data can benefit the reranking result for image retrieval. This observation encourages us, when considering fewshot as image retrieval, to reuse this reciprocal nearest neighbor discovery process to calibrate the pairwise distances within FSL task. We give an intuitive example in Fig. 1 by mining the relative relationships among the samples by reciprocal nearest neighbors to cluster the potential positive instances in the same class for reweighting the pairwise distances. By this manner, the pairwise distances in FSL task can be encoded as a new distance matrix.
Here, we briefly describe the reranking process for our ranking distance calibration and detail the Jaccard distance computing process in Sec. D of the supplementary material. For a FSL task, we start by computing an original pairwise Euclidean distance matrix:
(3) 
is the Euclidean distance between and , and represents the normalisation. Referring to , we can obtain the nearest neighbors set , and is the nearest neighbor set of . The reranking idea [zhong2017rerank] is to expand the by discovering more hardpositive samples for . The expand process for is guided by a reciprocal nearest neighbors algorithm [qin2011hello_kreciprocal] and the expanded ranking list
is used to estimate a calibrated distance matrix.
reciprocal discovery and encoding. The principle of reciprocal nearest neighbor discovery is that if is in , then should also occur in [qin2011hello_kreciprocal]. With this assumption, the can be expanded as by:
(4)  
where is the sample in and is the number of neighbors. Moreover, for manyshot cases, we further expand by uniting , where and are both in the support set as well same class. By this manner, finally the expanded nearest neighbors set can be expanded as . To assign larger weights to closer neighbors while smaller weights to farther ones, the is further used to encode the
into a vector
, where is defined as the Gaussian kernel of the pairwise distance as:(5) 
After that, a query expansion strategy is employed to integrate mostlikely samples to update the feature of by: where .
Jaccard distance. Referring to [bai2016sparse, zhong2017rerank], the expanded ranked list is used as contextual knowledge to compute a Jaccard distance matrix by:
(6) 
Following the reweighting method in [zhong2017rerank], the number of candidates in the intersection and union set can be calculated as and , where and operate the elementbased minimisation and maximisation for two input vectors, and is norm. Then the Jaccard distance in Eq.(6) can be reformulated as:
(7) 
Distance calibration. Then this is used to calibrate the original distance matrix by a weighting strategy
(8) 
where is a tradeoff scalar to balance the two matrices.
3.2 RDC in Taskadaptive Nonlinear Subspace
To further bridge the domain gap, we propose further improving the RDC in a nonlinear subspace. We particular tailor a discriminative subspace to help calibrate the ranking in our CDFSL task. The subspace is built upon the Principal Component Analysis (PCA) to extract crucial features from the original space. In specific, given the feature representations
, we have(9) 
is a transformation matrix mapping the feature with dimensions to a reduced feature with dimensions.
Hyperbolic tangent transformation. Generally, the PCA method can be directly used on the original embedding space. However, the original representation is scattered due to the biased and lessdiscriminative embedding; thus the dimensional reduction easily causes the information loss problem. To remit this issue, we consider to transform the original representations to a compact and representative nonlinear space. By using the idea of kernels, we use a hyperbolic tangent function to construct a taskadaptive nonlinear subspace. Our nonlinear PCA method first computes a featuretoward kernel function by:
(10) 
Then we use Singular Value Decomposition (SVD) to compute the eigenvalues
of and select the most relevant eigenvalues , formulating the transformation matrix . To this end, a taskadaptive nonlinear subspace is construct by Eq.(9) and Eq.(10).Complementary distance calibration. Our distance calibration process can be applied in the original linear embedding space (in Sec. 3.1) and a nonlinear subspace (in Sec. 3.2). The original space has higher dimensions consist of full information but also disturbed by noisy taskirrelevant features, while the nonlinear subspace reduce some taskirrelevant signal but loss some information. To this end, our RDC method coleverages the calibrated distances in the two spaces to capture a robust and complementary distance matrix . The computing process of RDC method is in the line 48 of Alg. 1.
Remark. As the hyperbolic nonlinear space has larger capability than Euclidean space [khrulkov2020hyperbolic, fang2021kernel_hyperbolic, yan2021unsupervised_hyperbolic], it can alleviate the information loss caused by the dimensionality reduction. Therefore, we use a hyperbolic tangent transformation to map the source domain linear space to a nonlinear space. We note that the subspace learning has been preliminary explored in FSL work [yoon2019tapnet, simon2020adaptive] to learn taskadaptive or classadaptive subspace. Critically, our subspace construction method is different from [yoon2019tapnet, simon2020adaptive], and our method does not need the sophisticated episode training process.
3.3 Finetuning with RDC
As provides a more robust and discriminative distance matrix, it is natural to ask whether this type of calibration knowledge can be used to optimise the feature extractor. To achieve this, we finetune the feature extractor by iteratively mapping the original distance distribution to the calibrated distance distribution, formulating the RDC with FineTuning (RDCFT) method as in Alg. 1.
Expanded reciprocal list as attention. As in Eq.(6), the expanded ranking list is used to recompute the pairwise distances. The calibrated pairwise distances in are more robust than these not in . Thus the can naturally be used as an attention mask . In particular, a is computed by
(11) 
where is an attention scalar. During the finetuning process, the is used to reweight the distance matrices and as and , respectively.
Choices of loss functions. To achieve the distance distribution alignment, Mean Squared Error (MSE) loss and KullbackLeibler (KL) divergence loss are alternatives. The MSE loss prefers to directly learn towards the target distance while KL divergence loss focuses on the distribution matching [kim2021comparing]. As KL loss learns this mapping process in a softening way, it is a better way to embed the calibration knowledge into the representations. Here we use:
(12) 
where is the temperaturescaling hyperparameter, and are the softened distributions of the reweighted distances matrices and . Given a vector in the distance matrix , the softened distribution is denoted by , where is the th value of .
Method  5way 1shot  

CUB  Cars  Places  Plantae  CropDisease  EuroSAT  ISIC  ChestX  Ave.  
ProtoNet [snell2017prototypical]  38.660.4  31.340.3  47.890.5  31.750.4  51.220.5  52.930.5  29.200.3  21.570.2  38.07 
NPC  37.720.4  32.660.3  41.080.4  33.370.4  63.060.5  53.950.5  29.390.3  22.470.2  39.21 
NPC+ norm  43.880.4  35.910.4  48.160.4  38.610.5  66.530.5  63.150.5  31.340.3  22.490.2  43.76 
RDC (ours)  47.770.5  38.740.5  58.820.5  41.880.5  80.880.5  67.580.5  32.290.3  22.660.2  48.83 
Method  5way 5shot  
CUB  Cars  Places  Plantae  CropDisease  EuroSAT  ISIC  ChestX  Ave.  
ProtoNet [snell2017prototypical]  57.550.4  43.980.4  68.050.4  46.180.4  79.980.3  75.360.4  39.980.3  24.190.2  54.41 
NPC  59.360.4  51.280.4  67.610.4  51.410.4  86.370.3  75.940.4  38.720.3  25.550.2  57.03 
NPC+ norm  62.000.4  52.140.4  70.180.4  53.870.4  87.390.3  78.810.4  40.750.3  25.980.2  58.89 
RDC (ours)  63.390.4  52.750.4  72.830.4  55.300.4  88.030.3  79.120.4  42.100.3  25.100.2  59.83 
4 Experiments
Dataset. Following the benchmarks in [wang2021cross_ata, liang2021boosting_cdfsl], we used miniImageNet as the source domain and another eight datasets , i.e. CUB, Cars, Places Plantae, CropDisease, EuroSAT, ISIC and ChestX, as target domains. In specific, miniImageNet [vinyals2016matching] is a subset of of ILSVRC2012. CUB, Cars, Places and Plantae are the target domains proposed in [tseng2020cross_fwt] for the evaluation on natural image domains, while CropDisease, EuroSAT, ISIC and ChestX are four domains proposed in [guo2020broader] for generalising the model to domains with different visual characteristics. For all experiments, we resized all the images to 224224 pixels and used data augmentations in [wang2021cross_ata, tseng2020cross_fwt] as image transformation.
Evaluation protocol. We followed the evaluation protocols in [wang2021cross_ata] to evaluate our method on CDFSL. In specific, for each target domain, we randomly selected 2000 FSL tasks and each task contains 5 different classes. Each class has 1/5 support labelled data and additional 15 unlabelled data for evaluation the performance, formulating the 5way 1/5shot CDFSL problem.
In all experiments, we reported the mean classification accuracy as well as 95% confidence interval on the query set of each domain. For comprehensive comparison, we listed the average accuracy (shown as Ave. in Tab.
1 2 and 5) of 8 domains.Implementation details. Following previous works [tseng2020cross_fwt, wang2021cross_ata, guo2020broader], we used a ResNet10 as feature extractor. Further, we used the same hyperparameters for the experiments on different domains to fairly validate the generalisation ability. In specific, the feature extractor are pretrained for 400 epochs on the base classes of miniImageNet with an Adam optimizer. We set the learning rate as 0.001 and the batch size as 64. For our RDC method, we set , and , and we set the reduced dimensions for the nonlinear subspace. For the finetuning stage in RDCFT, we set the attention scalar , temperature and set epochs for model training using an Adam optimizer with learning rate as 0.001.
4.1 Comparison with baselines
As our methods are based on a simple NPC classifier, here we start by comparing our RDC method with some baseline methods which also use a NPC classifier and do not need finetuning on a target domain. These baselines are: NPC that uses a NPC classifier on the pretrained embedding, NPC+ norm which utilises a NPC classifier on a normed feature embeddings, and ProtoNet [snell2017prototypical] that metalearns a taskagnostic NPC classifier on miniImageNet. The results in Tab. 1 show that RDC largely outperforms these baselines, boosting the simple NPC classifier to a strong one. In particular, the performance on 1shot learning is improved notably with increases on the Ave. accuracy compared to the baselines. This observation indicates that RDC is efficient to calibrate the distances by fullyleveraging the task information. We also note that the improvement on 5shot is not as large as that on 1shot. The reason is that the prototypes for the NPC classifier is more robust under manyshot setting, thus the original distances are lessbiased and this calibration process improves less when the embedding is fixed. This limitation can be remitted by using the finetuning stage of our RDCFT method.
Method  5way 1shot  

CUB  Cars  Places  Plantae  CropDisease  EuroSAT  ISIC  ChestX  Ave.  
GNN+FT [tseng2020cross_fwt]  45.500.5  32.250.4  53.440.5  32.560.4  60.740.5  55.530.5  30.220.3  22.000.2  41.53 
GNN+LRP [sun2021explanationcdfsl]  43.890.5  31.460.4  52.280.5  33.200.4  59.230.5  54.990.5  30.940.3  22.110.2  41.01 
TPN+ATA [wang2021cross_ata]  50.260.5  34.180.4  57.030.5  39.830.4  77.820.5  65.940.5  34.700.4  21.670.2  47.68 
Finetuning [guo2020broader]  43.530.4  35.120.4  50.570.4  38.770.4  73.430.5  66.170.5  34.600.3  22.130.2  45.54 
ConFT [das2021distractor_cdfsl]  45.570.8  39.110.7  49.970.8  43.090.8  69.710.9  64.790.8  34.470.6  23.310.4  46.25 
RDCFT (ours)  50.090.5  39.040.5  61.170.6  41.300.6  85.790.5  70.510.5  36.280.4  22.320.2  50.82 
Method  5way 5shot  
CUB  Cars  Places  Plantae  CropDisease  EuroSAT  ISIC  ChestX  Ave.  
GNN+FT [tseng2020cross_fwt]  64.970.5  46.190.4  70.700.5  49.660.4  87.070.4  78.020.4  40.870.4  24.280.2  57.72 
GNN+LRP [sun2021explanationcdfsl]  62.860.5  46.070.4  71.380.5  50.310.4  86.150.4  77.140.4  44.140.4  24.530.3  57.82 
TPN+ATA* [wang2021cross_ata]  65.310.4  46.950.4  72.120.4  55.080.4  88.150.5  79.470.3  45.830.3  23.600.2  59.57 
Finetuning [guo2020broader]  63.760.4  51.210.4  70.680.4  56.450.4  89.840.3  81.590.3  49.510.3  25.370.2  61.06 
ConFT [das2021distractor_cdfsl]  70.530.7  61.530.7  72.090.7  62.540.7  90.900.6  81.520.6  50.790.6  27.500.5  64.68 
NSAE(CE+CE) [liang2021boosting_cdfsl]  68.510.8  54.910.7  71.020.7  59.550.8  93.140.5  83.960.6  54.050.6  27.100.4  64.03 
RDCFT (ours)  67.230.4  53.490.5  74.910.4  57.470.4  93.300.3  84.290.3  49.910.3  25.070.2  63.21 
4.2 Comparison with stateoftheart methods
We further compare our RDCFT method with StateofTheArt (SoTA) methods: 1) metalearners: GNNFT [tseng2020cross_fwt] that metatrains a GNN [satorras2018fewgnn] model with an additional Feature Transformation layer, GNNLRP which uses a Layerwise Relevance Propagation to guide the GNN model training, and TPN+ATA [wang2021cross_ata] that metalearns TPN [liu2019fewTPN] with Adversarial Task Augmentation. 2) finetuning methods: a general Finetuning [guo2020broader] method, ConFT [das2021distractor_cdfsl] that finetunes model reusing the base classes, and NSAE [liang2021boosting_cdfsl]
which pretrains and finetunes model with an additional autoencoder task to improve the model generalisation. From Tab.
4.2, we observe that our RDCFT method is superior to the SoTA methods on the 1shot learning and competitive to SoTAs on the 5shot learning. Also, we notice that the performance is not superior to ConFT and NSAE methods for the 5shot learning. The behind reasons are: 1) our method explores the task information in an unsupervised way while the others focus on finetuning with more labelled data; thus these methods benefit a lot from the manyshot setting. 2) ConFT reuses more data from base classes for model finetuning. Thus the similar classes between source and target domain, e.g. birds, cars, help to build more robust decision boundaries when model learning on related target domains, e.g. CUB, Cars. But this approach requires more data and expensive computing resources. 3) NASE adopted an autoencoder to implicitly augment data to pretrain a generalisable model, and our method is theoretically orthogonal to this method for solving the CDFSL problem.b4.3 Ablation study
Component analysis. To investigate the efficacy of different components in RDCFT, we ablate the contribution of each element in RDCFT: RDC w/o subspace, RDC (in two spaces), RDCFT w/o subspace and RDCFT (in two spaces). As in Tab. 3, a simple RDC process without subspace learning, which calibrates the distances only on the pretrained representation, largely boosts the baseline NPC classifier by i.e. 8.72% (1.87 %) improvement on 1shot (5shot). And the finetuning process, as in results of RDCFT w/o subspace, can enlarge the improvement during the iteratively mapping process, achieving 11.15% (5.82 %) improvement on 1shot (5shot). Interestingly, we observe that the contribution of subspace for RDC (0.90% on 1shot) is larger than that for RDCFT (0.46% on 1shot). This indicates that finetuning process can gradually alleviate the bias of pretrained representations, thus the benefit of subspace becomes less in RDCFT.
Method  5way 1shot  5way 5shot 

Baseline NPC  39.21  57.03 
+RDC w/o subspace  47.93(8.72%)  58.90(1.87%) 
+RDC  48.83(9.62%)  59.83(2.80%) 
+RDCFT w/o subspace  50.36(11.15%)  62.85(5.82%) 
+RDCFT  50.82(11.61%)  63.21(6.18%) 
Method  5way 1shot  

N/A  linear  Gaussian  Poly.  Sigmoid  Ours  
NPC+ norm  43.76  45.30  45.29  45.23  45.46  46.31 
RDC  47.70  47.75  47.15  47.14  47.20  48.42 
Method  5way 5shot  
N/A  linear  Gaussian  Poly.  Sigmoid  Ours  
NPC+ norm  58.89  58.17  58.18  58.09  58.23  59.50 
RDC  58.92  58.48  58.49  58.42  58.52  59.68 
Comparison of different PCA methods. In this part, we compare our nonlinear subspace to the subspaces constructed by Kernel PCA (KPCA) methods with different kernel types (linear, Gaussian, Polynomial and Sigmoid). For fair comparison, all the dimensions of subspace are set as 64 and we used the default parameters of KPCA methods in scikitlearn. Table 4 shows that our nonlinear subspace performs better than other KPCA methods. Interestingly, without RDC method, the KPCA methods can largely improve the performance (compared to N/A) on 1shot learning, but the performances of KPCA methods are just competitive to the original space when applying the RDC method on different subspaces. However, our nonlinear subspace achieves consistent and stable improvement both with and without RDC method, verifying the robustness and superiority of our nonlinear subspace.
Effect of loss choices. We evaluated the performance of RDCFT with different loss functions. The results in Fig. 3 show that these three losses achieve competitive performance on CUB, Cars, ISIC and chestX, while the KL loss performs mostly better than MSE loss on Places, Plantae, EuroSAT. These observations suggest the superiority of mapping the distance matrix in softened distributions. We conjecture this should attribute to the softening process which can alleviate the negative effect of the calibrated distances. Moreover, we note that the performance of KL loss can be further improved by an attention strategy on the distance matrices, verifying the efficacy of employing the expanded nearest neighbors list as an attention reference.
Visualisation. To qualitatively show the effectiveness of our RDC and RDCFT methods. We first show a case study of a FSL task from CUB by comparing the original ranking list and the ranking list with RDC. As in Fig. 7, for a given query data, our RDC method pulls the groundtruth support data closer to the query data, arriving at a more accurate position. This process is achieved by the calibration process of our RDC method. For the RDCFT method, we use tSNE [van2008visualizing] to visualise the feature embeddings of FSL tasks randomly selected from target domains, i.e. CUB, CropDisease and EuroSAT. As in Fig. 5, the feature representations with RDCFT (in the 2nd row plots) have less withinclass variations and large class margins compared to these without RDCFT process (in the 1st row plots), showing that the RDCFT method can guild a taskspecific embedding where the samples can easily be classified by a simple NPC classifier. Moreover, our RDCFT method, as expected, is functioning as an implicit clustering process for FSL task. This can be qualitatively verified by the observation of the clustering effect as in the 2nd row plots of Fig. 5.
Incorporate with other method. As RDC is a postprocessing method, it can flexibly combine with other methods. Here we employed RDC on a general data augmentation method [yeh2020large]. The results in Tab. 5 indicate that RDC can achieve consistent improvement on other method, showing its generalisable ability. Currently we cannot evaluate our method on [liang2021boosting_cdfsl, liu2020feature] until their code is released.
Method  5way 1shot  

CUB  Cars  Places  Plantae  Crop.  Euro.  ISIC  ChestX  
NPC+DA  42.63  36.16  48.04  38.20  71.61  66.79  34.49  22.38 
RDC+DA  46.81  38.51  58.00  40.18  81.93  71.76  35.67  22.27 
Method  5way 1shot  
CUB  Cars  Places  Plantae  Crop.  Euro.  ISIC  ChestX  
NPC+DA  63.63  53.16  69.57  55.06  89.88  81.61  50.25  25.56 
RDC+DA  65.39  54.10  72.82  56.46  91.27  81.88  50.91  25.21 
5 Conclusions
In this paper, we proposed a Ranking Distance Calibration (RDC) method to calibrate the biased distances in CDFSL. The calibration process is achieved by a reranking method with a reciprocal discovery and encoding process. As the pretrained linear embedding is biased for target domain, we further proposed a nonlinear subspace followed by a calibration process on it. Our RDC method averages the calibrated distances on the two spaces to a robust distance matrix. Moreover, we introduced a RDCFT method to finetune the embedding with the calibrated distances, yielding a discriminative representation for CDFSL task.
Limitation and discussion. As the image retrieval perspective of our approach is to discover the task information unsupervised, exploring comprehensive leveraging of the label information and the task information should be considered, especially in the manyshot cases, e.g. 5shot.
We notice that a concurrent work [xi2021_reranking] also uses the reranking process for fewshot learning. Although our work and [xi2021_reranking] share the image retrieval angle for fewshot learning, their key differences are summarised as follows: 1) Our work concerns on the reciprocal process to calibrate the pairwise distance both in the pretrained linear space and a taskadaptive nonlinear space, while [xi2021_reranking] uses a graph view to improve the subgraph similarity on the pretrained space; 2) For the representation optimisation stage, we use the calibrated distance matrix as a guidance to finetune the pretrained feature extractor with a KullbackLeibler divergence loss, whilst [xi2021_reranking] designs a MultiLayer Perception to metalearn a subgragh similarity refiner to optimise the feature extractor with a Cross Entropy loss. In particular, the metalearning strategy in [xi2021_reranking] and the finetuning strategy in our work are orthogonal as they focus on the learning on source data and target data, respectively.
References
Appendix A Overview
In this supplementary material, we present:

To validate the robustness of the proposed RDC and RDCFT methods, we analyse the sensitivities of different hyperparameters, i.e. , and (in Sec. B);

For better understanding the computing process of the Jaccard distance, we illustrate the algorithm of the Jaccard distance in Sec. D.

The notations for all symbols and hyperparameters used in the main paper are defined (in Sec. E);
Appendix B Sensitivity analysis of the hyperparameters
In all experiments of the main paper, we reported the results on 8 target domains with the same hyperparameters. In practice, our method is robust to the hyperparameters selection as shown in Fig. 6. Further, we analyse in depth three key hyperparameters, , and .
b.1 Effect of the tradeoff scalar
The tradeoff scalar is used to balance the original distance and the Jaccard distance for the proposed RDC method, thus it is a critical hyperparameter for RDC. We conducted experiments to test RDC with on the pretrained space.
The results are shown in Tab. 6 and Fig. 6(a), from which we can see that assigning smaller weights to the original distance (smaller ) is a better choice for RDC. In particular, the best for 1shot is 0.3 while that for 5shot is 0.5. This indicates that the original distance becomes more robust when the shot increases, thus the original space should occupy larger weights in the calibrated distance. Besides, when is between 0.1 to 0.5, the average accuracies of RDC are stable, verifying the robustness of . Therefore, we conducted all experiments by setting in the main paper.
5way 1shot  
CUB  Car  Places  Plantae  Crop  Euro.  ISIC  Chestx  Ave.  
0.1  46.53  36.59  56.84  40.91  79.93  68.68  31.04  22.22  47.84 
0.3  46.92  37.16  57.03  41.15  79.62  68.67  31.32  22.36  48.03 
0.5  46.89  37.52  56.64  41.14  78.62  68.21  31.58  22.48  47.89 
0.7  46.44  37.53  55.29  40.72  76.35  67.11  31.63  22.48  47.19 
0.9  45.02  37.03  51.95  39.77  71.59  64.99  31.34  22.49  45.52 
5way 5shot  
CUB  Car  Places  Plantae  Crop  Euro.  ISIC  Chestx  Ave.  
0.1  60.16  49.99  70.02  54.69  88.39  79.14  40.34  24.61  58.42 
0.3  61.04  51.17  70.58  55.08  88.37  79.27  41.16  24.77  58.93 
0.5  61.42  51.85  70.54  54.86  87.80  78.95  41.39  24.86  58.96 
0.7  61.11  51.96  69.68  54.07  86.41  77.96  41.11  24.82  58.39 
0.9  59.90  51.25  67.27  52.65  83.31  76.17  40.46  24.64  56.96 
b.2 Influence of the reduced dimensions
The dimensions in the subspace is a key parameter to build our nonlinear space. Typically, we choose ( represents the original space) to test the effects of different dimension .
Table 7 and Fig. 6(b) show that the performance on different subspaces are stable when is smaller than 128. This observation shows that the subspaces constructed by the hyperbolic tangent transformation are not sensitive to the reduced dimensions. In particular, the subspace with is the best dimension for 1shot learning and that with is the best dimension for 5shot learning. To make a balance among different shot learning, we set in all experiments of the main paper.
5way 1shot  
CUB  Car  Places  Plantae  Crop  Euro.  ISIC  Chestx  Ave.  
16  47.05  37.10  53.39  39.89  71.31  63.52  31.81  22.56  45.83 
32  46.13  37.40  51.69  39.37  68.77  62.54  31.62  22.54  45.01 
64  46.30  37.86  52.16  39.49  69.09  62.41  31.69  22.52  45.19 
128  46.56  38.31  53.23  39.95  69.97  63.07  31.95  22.63  45.71 
256  43.77  36.02  48.16  39.31  68.18  63.45  32.01  22.56  44.18 
512  43.29  35.96  48.32  38.69  67.43  63.18  31.45  22.54  43.86 
5way 5shot  
CUB  Car  Places  Plantae  Crop  Euro.  ISIC  Chestx  Ave.  
16  63.69  50.11  70.29  53.72  88.32  78.35  40.42  25.75  58.83 
32  64.22  52.08  70.17  54.36  88.20  78.65  41.28  25.98  59.37 
64  64.05  52.64  70.46  54.68  88.20  78.74  41.84  26.18  59.60 
128  64.03  53.29  71.11  54.91  88.22  78.75  41.87  26.16  59.79 
256  63.16  52.49  69.66  55.09  88.43  79.35  41.89  26.15  59.53 
512  62.38  52.37  69.74  54.16  87.86  78.79  41.05  25.92  59.03 
5way 1shot  
CUB  Car  Places  Plantae  Crop  Euro.  ISIC  Chestx  Ave.  
0  50.18  39.30  60.13  41.55  85.52  68.14  35.17  22.39  50.30 
0.1  49.86  38.85  58.94  41.61  85.54  68.84  35.36  22.27  50.16 
0.3  50.00  39.07  59.12  42.05  85.46  69.30  35.52  22.48  50.38 
0.5  50.09  39.04  61.17  41.30  85.79  70.51  36.28  22.32  50.82 
0.7  49.90  38.65  59.37  41.54  85.36  69.83  35.68  22.23  50.32 
0.9  50.31  38.01  59.75  41.22  85.35  70.05  36.00  22.35  50.38 
5way 5shot  
CUB  Car  Places  Plantae  Crop  Euro.  ISIC  Chestx  Ave.  
0  66.37  53.49  71.81  56.52  93.15  81.47  47.38  25.28  62.01 
0.1  66.34  53.91  72.65  57.11  93.12  82.59  47.61  25.08  62.30 
0.3  66.13  52.97  73.05  57.89  93.22  83.72  48.94  25.06  62.62 
0.5  67.23  53.49  74.91  57.47  93.30  84.29  49.91  25.07  63.21 
0.7  67.04  53.45  73.73  57.83  93.56  83.91  49.53  25.04  63.01 
0.9  66.96  53.20  73.74  57.68  93.44  84.07  49.69  25.18  63.00 
b.3 Effect of the attention scalar
The attention scalar is used to increase the weights of the calibrated distance occurred in , here we investigate the effectiveness of different and show the results in Tab. 8 and Fig. 6(c).
The results show that this attention strategy can benefit the representation adaptation for FSL task in the target domain. In specific, moderately increasing the attention scalar ( from 0.1 to 0.5) can improve the effectiveness of the attention strategy. To the contrary, overly increasing the attention scalar will introduce negative effect, resulting the decrease of the performance ( from 0.5 to 0.9). Therefore, the choice of in the main paper is a moderate and robust parameter for the attention strategy.
Appendix C Visualisation
To visualise the advantages and limitations of the proposed RDC and RDCFT methods, we use tSNE [maaten2008visualizing] to display a 5way 1shot FSL task from CropDisease as in Fig. 7. From the figure we observe that:

RDC can correct some misclassified samples that are closed to the support exemplars, i.e. the samples in red solid rectangles in plot(II) and plot(III). However, RDC cannot welladdress the misclassified samples between different support exemplars, i.e. the failure cases in plot(III).

From the samples in red dashed rectangles of plot(I) and plot(IV), we note that RDCFT can calibrate the distancebased distributions in the representational space, encouraging the feature representations to have less withinclass variations and large class margins. Thus the finetuned representations are more discriminative for classification.

The failure cases of RDC, MR1, MR3, MR24, and MR5 in plot(III), can be correctly classified by RDCFT with a simple NPC classifier in plot(V). This verifies the superiority of RDCFT that gradually embeds the calibration information from RDC to the representational space.
Appendix D Details of Jaccard distance
The Jaccard distance computing is an important part of RDC. In specific, the concept of Jaccard distance derives from [bai2016sparse] and the reweighting strategy for Jaccard distance is also used in [zhong2017rerank]. We briefly introduce the computing process of Jaccard distance in the main paper. Here, we further illustrate more details for clearer description as in Algorithm 1. In this pseudocode, we illustrate the computing process of reciprocal discovery and encoding in line 39, and the discovery process and encoding process are presented in line 58 and line 9, respectively. Then, the query expansion and Jaccard distance computing process are illustrated in line 1114 of Algorithm 1.
Appendix E Symbols and hyperparameters
To clearly and fast understand the equations in the main paper, we list the symbols and hyperparameters in the Tab. 9 and Tab. 10, respectively.
Symbol  Meaning 

FSL task in the target domain  
Feature of th sample in  
Euclidean distance matrix in the original space  
Jaccard distance matrix  
Calibrated distance matrix in the original space  
Calibrated distance matrix in the subspace  
Complementary calibrated distance matrix  
Pairwise distance between and  
Pairwise distances between and  
Jaccard distance between and  
nearest neighbors ranking list of  
Expanded nearest neighbors ranking list of  
Gaussian kernel of pairwise distance between and 
Hyperparameter  Meaning 

Number of candidates in  
Number of samples for updating  
Tradeoff scalar to balance and  
Dimensions of feature in the subspace  
Number of epochs in finetuning stage  
Temperaturescaling hyperparameter  
Attention scalar 
algocf[t]
Comments
There are no comments yet.