Official code for the paper "Self-Supervised Prototypical Transfer Learning for Few-Shot Classification"
Most approaches in few-shot learning rely on costly annotated data related to the goal task domain during (pre-)training. Recently, unsupervised meta-learning methods have exchanged the annotation requirement for a reduction in few-shot classification performance. Simultaneously, in settings with realistic domain shift, common transfer learning has been shown to outperform supervised meta-learning. Building on these insights and on advances in self-supervised learning, we propose a transfer learning approach which constructs a metric embedding that clusters unlabeled prototypical samples and their augmentations closely together. This pre-trained embedding is a starting point for few-shot classification by summarizing class clusters and fine-tuning. We demonstrate that our self-supervised prototypical transfer learning approach ProtoTransfer outperforms state-of-the-art unsupervised meta-learning methods on few-shot tasks from the mini-ImageNet dataset. In few-shot experiments with domain shift, our approach even has comparable performance to supervised methods, but requires orders of magnitude fewer labels.READ FULL TEXT VIEW PDF
Transfer-learning and meta-learning are two effective methods to apply
The representations of the Earth's surface vary from one geographic regi...
Recent papers have suggested that transfer learning can outperform
Exploiting known semantic relationships between fine-grained tasks is
Given new tasks with very little data–such as new classes in a
Turn-level user satisfaction is one of the most important performance me...
In this paper, we propose Domain Agnostic Meta Score-based Learning (DAM...
Official code for the paper "Self-Supervised Prototypical Transfer Learning for Few-Shot Classification"
Few-shot classification (Fei-Fei et al., 2006)
is a learning task in which a classifier must adapt to distinguish novel classes not seen during training, given only a few examples (shots) of these classes. Meta-learning(Finn et al., 2017; Ren et al., 2018) is a popular approach for few-shot classification by mimicking the test setting during training through so-called episodes of learning with few examples from the training classes. However, several works (Chen et al., 2019b; Guo et al., 2019) show that common (non-episodical) transfer learning outperforms meta-learning methods on the realistic cross-domain setting, where training and novel classes come from different distributions.
Nevertheless, most few-shot classification methods still require much annotated data for pre-training. Recently, several unsupervised meta-learning approaches, constructing episodes via pseudo-labeling (Hsu et al., 2019; Ji et al., 2019) or image augmentations (Khodadadeh et al., 2019; Antoniou and Storkey, 2019; Qin et al., 2020), have addressed this problem. To our knowledge, unsupervised non-episodical techniques for transfer learning to few-shot tasks have not yet been explored.
Our approach ProtoTransfer performs self-supervised pre-training on an unlabeled training domain and can transfer to few-shot target domain tasks. During pre-training, we minimize a pairwise distance loss in order to learn an embedding that clusters noisy transformations of the same image around the original image. Our pre-training loss can be seen as a self-supervised version of the prototypical loss in Snell et al. (2017) in line with contrastive learning, which has driven recent advances in self-supervised representation learning (Ye et al., 2019; Chen et al., 2020; He et al., 2019). In the few-shot target task, in line with pre-training, we summarize class information in class prototypes for nearest neighbor inference similar to ProtoNet (Snell et al., 2017) and we support fine-tuning to improve performance when multiple examples are available per class.
We highlight our main contributions and results:
We show that our approach outperforms state-of-the-art unsupervised meta-learning methods by 4% to 8% on mini-ImageNet few-shot classification tasks and has competitive performance on Omniglot.
Compared to the fully supervised setting, our approach achieves competitive performance on mini-ImageNet and multiple datasets from the cross-domain transfer learning CDFSL benchmark, with the benefit of not requiring labels during training.
In an ablation study and cross-domain experiments we show that using a larger number of equivalent training classes than commonly possible with episodical meta-learning, and parametric fine-tuning are key to obtaining performance matching supervised approaches.
Section 2.1 introduces the few-shot classification setting and relevant terminology. Further, we describe ProtoTransfer’s pre-training stage, ProtoCLR, in Section 2.2 and its fine-tuning stage, ProtoTune, in Section 2.3. Figure 1 illustrates the procedure.
The goal of few-shot classification is to predict classes for a set of unlabeled points (the query set) given a small set of labeled examples (the support set) from the same classes. Few-shot classification approaches commonly consist of two subsequent learning phases, each using its own set of classes.
The first learning phase utilizes samples from base (training) classes contained within a training set , where is a sample with label in label set
. An important aspect of our specific unsupervised learning setting is that the first phase has no access to the per-sample label information, the distribution of classes, nor the size of the label set, for pre-training. This first phase serves as a preparation for the actual few-shot learning in the target domain, i.e. the second learning phase. This second supervised learning phase contains novel (testing) classes as , where only few examples for each of the classes in are available. Concretely, an -way -shot classification task consists of labeled examples for each of the novel classes. In the few-shot learning literature a task is also commonly referred to as an episode.
Similar to the few-shot target tasks, we frame every ProtoCLR pre-training learning step as an -way
-shot classification task optimized by a contrastive loss function as described below. In this, we draw inspiration from recent progress in unsupervised meta-learning(Khodadadeh et al., 2019) and self-supervised visual contrastive learning of representations (Chen et al., 2020; Ye et al., 2019).
Algorithm 1 details ProtoCLR and it comprises the following parts:
Batch generation (Algorithm 1 lines 4-10): Each mini-batch contains random samples from the training set. As our self-supervised setting does not assume any knowledge about the base class labels , we treat each sample as it’s own class. Thus, each sample serves as a 1-shot support sample and class prototype. For each prototype , different randomly augmented versions are used as query samples.
Contrastive prototypical loss optimization (Algorithm 1 lines 11-13): The pre-training loss encourages clustering of augmented query samples around their prototype in the embedding space through a distance metric . The softmax cross-entropy loss over classes is minimized with respect to the embedding parameters
with mini-batch stochastic gradient descent (SGD).
Commonly, unsupervised pre-training approaches for few-shot classification (Hsu et al., 2019; Khodadadeh et al., 2019; Antoniou and Storkey, 2019; Qin et al., 2020; Ji et al., 2019) rely on meta-learning. Thus, they are required to create small artificial -way (-shot) tasks identical to the downstream few-shot classification tasks. Our approach does not use meta-learning and can use any batch size . Larger batch sizes have been shown to help self-supervised representation learning (Chen et al., 2020) and supervised pre-training for few-shot classification (Snell et al., 2017). We also find that larger batches yield a significant performance improvement for our approach (see Section 3.3). To generate the query examples, we use image augmentations similar to (Chen et al., 2020) and adjust them for every dataset. The exact transformations are listed in Appendix A.3. Following Snell et al. (2017), we use the Euclidean distance, but our method is generic and works with any metric.
After pre-training the metric embedding , we address the target task of few-shot classification. For this, we extend the prototypical nearest-neighbor classifier ProtoNet (Snell et al., 2017) with prototypical fine-tuning of a final classification layer, which we refer to as ProtoTune. First, the class prototypes are computed as the mean of the class samples in the support set of the few-shot task:
ProtoNet uses non-parametric nearest-neighbor classification with respect to and can be interpreted as a linear classifier applied to a learned representation . Following the derivation in Snell et al. (2017), we initialize a final linear layer with weights and biases . Then, this final layer is fine-tuned with a softmax cross-entropy loss on samples from , while keeping the embedding function parameters fixed. Triantafillou et al. (2020) proposed a similar fine-tuning approach with prototypical initialization, but their approach always fine-tunes all model parameters.
We carry out several experiments to benchmark and analyze ProtoTransfer. In Section 3.1, we conduct in-domain classification experiments on the Omniglot (Lake et al., 2011) and mini-ImageNet (Vinyals et al., 2016) benchmarks to compare to state-of-the-art unsupervised few-shot learning approaches and methods with supervised pre-training. In Section 3.2, we test our method on a more challenging cross-domain few-shot learning benchmark (Guo et al., 2019). Section 3.3 contains an ablation study showing how the different components of ProtoTransfer contribute to its performance. In Section 3.4, we study how pre-training with varying class diversities affects performance. In Section 3.5, we give insight in generalization from training classes to novel classes from both unsupervised and supervised perspectives. Experimental details can be found in Appendix A and code is made available111Our code and pre-trained models are available at https://www.github.com/indy-lab/ProtoTransfer.
For our in-domain experiments, where the disjoint training class set and novel class set come from the same distribution, we used the popular few-shot datasets Omniglot (Lake et al., 2011) and mini-ImageNet (Vinyals et al., 2016). For comparability we use the Conv-4 architecture proposed in Vinyals et al. (2016). Specifics on the datasets, architecture and optmization can be found in Appendices A.1 and A.2
. We apply limited hyperparameter tuning, as suggested inOliver et al. (2018), and use a batch size of and number of query augmentations for all datasets.
In Table 1, we report few-shot accuracies on the mini-ImageNet and Omniglot benchmarks. We compare to unsupervised clustering based methods CACTUs (Hsu et al., 2019) and UFLST (Ji et al., 2019) as well as the augmentation based methods UMTRA (Khodadadeh et al., 2019), AAL (Antoniou and Storkey, 2019) and ULDA (Qin et al., 2020). More details on how these approaches compare to ours can be found in Section 4
. Pre+Linear represents classical supervised transfer learning, where a deep neural network classifier is (pre)trained on the training classes and then only the lastlinear layer is fine-tuned on the novel classes. On mini-ImageNet, ProtoTransfer outperforms all other state-of-the-art unsupervised pre-training approaches by at least 4% up to 8% and mostly outperforms the supervised meta-learning method MAML (Finn et al., 2017), while requiring orders of magnitude fewer labels ( vs ). On Omniglot, ProtoTransfer shows competitive performance with most unsupervised meta-learning approaches.
For our cross-domain experiments, where training and novel classes come from different distributions, we turn to the CDFSL benchmark (Guo et al., 2019). This benchmark specifically tests how well methods trained on mini-ImageNet can transfer to few-shot tasks with only limited similarity to mini-ImageNet. In order of decreasing similarity, the four datasets are plant disease images from CropDiseases (Mohanty et al., 2016), satellite images from EuroSAT (Helber et al., 2019), dermatological images from ISIC2018 (Tschandl et al., 2018; Codella et al., 2019) and grayscale chest X-ray images from ChestX (Wang et al., 2017). Following Guo et al. (2019), we use a ResNet-10 neural network architecture. As there is no validation data available for the target tasks in CDFSL, we keep the same ProtoTransfer hyperparameters , as used in the mini-ImageNet experiments. Experimental details are listed in Appendices A.1.2 and A.2.2.
For comparison to unsupervised meta-learning, we include our results on UMTRA-ProtoNet and its fine-tuned version UMTRA-ProtoTune (Khodadadeh et al., 2019). Both use our augmentations instead of those from (Khodadadeh et al., 2019). For further comparison, we include ProtoNet (Snell et al., 2017) for supervised few-shot learning and Pre+Mean-Centroid and Pre+Linear as the best-on-average performing transfer learning approaches from Guo et al. (2019). As the CDFSL benchmark presents a large domain shift with respect to mini-ImageNet, all model parameters are fine-tuned in ProtoTransfer during the few-shot fine-tuning phase with ProtoTune.
We report results on the CDFSL benchmark in Table 2. ProtoTransfer consistently outperforms its meta-learned counterparts by at least 0.7% up to 19% and performs mostly on par with the supervised transfer learning approaches. Comparing the results of UMTRA-ProtoNet and UMTRA-ProtoTune, starting from 5 shots, parametric fine-tuning gives improvements ranging from 1% to 13%. Notably, on the dataset with the largest domain shift (ChestX), ProtoTransfer outperforms all other approaches.
We conduct an ablation study of ProtoTransfer’s components to see how they contribute to its performance. Starting from ProtoTransfer we successively remove components to arrive at the equivalent UMTRA-ProtoNet which shows similar performance to the original UMTRA approach (Khodadadeh et al., 2019) on mini-ImageNet. As a reference, we provide results of a ProtoNet classifier on top of a fixed randomly initialized network.
Table 3 shows that increasing the batch size from for UMTRA-ProtoNet to 50 for ProtoCLR-ProtoNet, keeping everything else equal, is crucial to our approach and yields a 5% to 9% performance improvement. Importantly, UMTRA-ProtoNet uses our augmentations instead of those from (Khodadadeh et al., 2019). Thus, this improvement cannot be attributed to using different augmentations than UMTRA. Increasing the training query number to gives better gradient information and yields a relatively small but consistent performance improvement. Fine-tuning in the target domain does not always give a net improvement. Generally, when many shots are available, fine-tuning gives a significant boost in performance as exemplified by ProtoCLR-ProtoTune and UMTRA-MAML in the 50-shot case. Interestingly, our approach reaches competitive performance in the few-shot regime even before fine-tuning.
While ProtoTransfer already does not require any labels during pre-training, for some applications, e.g. rare medical conditions, even the collection of sufficiently similar data might be difficult. Thus, we test our approach when reducing the total number of available training images under the controlled setting of mini-ImageNet. Moreover, not all training datasets will have such a diverse set of classes to learn from as the different animals, vehicles and objects in mini-ImageNet. Therefore, we also test the effect of reducing the number of training classes and thereby the class diversity. To contrast the effects of reducing the number of classes or reducing the number of samples, we either remove whole classes from the mini-ImageNet training set or remove the corresponding amount of samples randomly from all classes. The number of samples are decreased in multiples of 600 as each mini-ImageNet class contains exactly 600 samples. We compare the mini-ImageNet few-shot classification accuracies of ProtoTransfer to the popular supervised transfer learning baseline Pre+Linear in Figure 2.
As expected, when uniformily reducing the number of images from all classes (Figure 2a), the few-shot classification accuracy is reduced as well. The performance of ProtoTransfer and the supervised baseline closely match in this case. When reducing the number of training classes in Figure 2b, ProtoTransfer consistently and significantly outperforms the supervised baseline when the number of mini-ImageNet training classes drops below 16. For example in the 20-shot case with only two training classes, ProtoTransfer outperforms the supervised baseline by a large margin of 16.9% (64.59% vs 47.68%). Comparing ProtoTransfer in Figures 2a and 2b, there is only a small difference between reducing images randomly from all classes or taking entire classes away. In contrast, the supervised baseline performance suffers substantially from having fewer classes.
To validate these in-domain observations in a cross-domain setting, following Devos and Grossglauser (2019), we compare few-shot classification performance when training on CUB (Welinder et al., 2010; Wah et al., 2011) and testing on mini-ImageNet (Vinyals et al., 2016). CUB consists of 200 classes of birds, while only three of the 64 mini-ImageNet training classes are birds (see A.1.3, A.2.3 for details on CUB). Thus CUB possesses a lower class diversity than mini-ImageNet. Table 4 confirms our previous observation numerically and shows that ProtoTransfer has a superior transfer accuracy of 2% to 4% over the supervised approach when limited diversity is available in the training classes.
We conjecture that this difference is due to the fact that our self-supervised approach does not make a difference between samples coming from the same or different (latent) classes during training. Thus, we expect it to learn discriminative features despite a low training class diversity. In contrast, the supervised case forces multiple images with rich features into the same classes. We thus expect the generalization gap between tasks coming from training classes and testing classes to be smaller with self-supervision. We provide evidence to support this conjecture in Section 3.5.
|ProtoCLR||ProtoNet||34.56 0.61||52.76 0.63||62.76 0.59||66.01 0.55|
|ProtoCLR||ProtoTune||35.37 0.63||52.38 0.66||63.82 0.59||68.95 0.57|
|Pre(training)||Linear||33.10 0.60||47.01 0.65||59.94 0.62||65.75 0.63|
To compare the generalization of ProtoCLR with its supervised embedding learning counterpart ProtoNet (Snell et al., 2017), we visualize the learned embedding spaces with t-SNE (Maaten and Hinton, 2008) in Figure 3. We compare both methods on samples from 5 random classes from the training and testing sets of mini-ImageNet. In Figures 3a and 3b we observe that, for the same training classes, ProtoNet shows more structure. Comparing all subfigures in Figure 3, ProtoCLR shows more closely related embeddings in Figures 3a and 3c than ProtoNet in Figures 3b and 3d.
These visual observations are supported numerically in Table 5. Self-supervised embedding approaches, such as UMTRA and our ProtoCLR approach, show a much smaller task generalization gap than supervised ProtoNet. ProtoCLR shows virtually no classification performance drop. However, supervised ProtoNet suffers a significant accuracy reduction of 6% to 12%.
|ProtoNet||ProtoNet||Train||53.74 0.95||79.09 0.69||85.53 0.53||86.62 0.48|
|ProtoNet||ProtoNet||Val||46.62 0.82||67.34 0.69||76.44 0.57||79.00 0.53|
|ProtoNet||ProtoNet||Test||46.44 0.78||66.33 0.68||76.73 0.54||78.91 0.57|
|UMTRA||ProtoNet||Train||41.03 0.79||56.43 0.78||64.48 0.71||66.28 0.66|
|UMTRA||ProtoNet||Test||38.92 0.69||53.37 0.68||61.69 0.66||65.12 0.59|
|ProtoCLR||ProtoNet||Train||45.33 0.63||63.47 0.58||71.51 0.51||73.99 0.49|
|ProtoCLR||ProtoNet||Test||44.89 0.58||63.35 0.54||72.27 0.45||74.31 0.45|
Both CACTUs (Hsu et al., 2019) and UFLST (Ji et al., 2019) alternate between clustering for support and query set generation and employing standard meta-learning. In contrast, our method unifies self-supervised clustering and inference in a single model. Khodadadeh et al. (2019) propose an unsupervised model-agnostic meta-learning approach (UMTRA), where artifical -way -shot tasks are generated by randomly sampling support examples from the training set and generating corresponding queries by augmentation. Antoniou and Storkey (2019) (AAL) generalize this approach to more support shots by randomly grouping augmented images into classes for classification tasks. ULDA (Qin et al., 2020) induce a distribution shift between the support and query set by applying different types of augmentations to each. In contrast, ProtoTransfer uses a single un-augmented support sample, similar to Khodadadeh et al. (2019), but extends to several query samples for better gradient signals and steps away from artificial few-shot task sampling by using larger batch sizes, which is key to learning stronger embeddings.
Several works have proposed to use a self-supervised loss either alongside supervised meta-learning episodes (Gidaris et al., 2019; Liu et al., 2019) or to initialize a model prior to supervised meta-learning on the source domain (Chen et al., 2019a; Su et al., 2019). In contrast, we do not require any labels during training.
Chen et al. (2019b) show that adaptation on the target task is key for good cross-domain few-shot classification performance. Similar to ProtoTune, Triantafillou et al. (2020) also initialize a final layer with prototypes after supervised meta-learning, but always fine-tune all parameters of the model.
Contrastive losses have fueled recent progress in learning strong embedding functions (Ye et al., 2019; Chen et al., 2020; He et al., 2019; Tian et al., 2020; Li et al., 2020). Most similar to our approach is Ye et al. (2019). They propose a per-batch contrastive loss that minimizes the distance between an image and an augmented version of it. Different to us, they do not generalize to using multiple augmented query images per prototype and use 2 extra fully connected layers during training. Concurrently, Li et al. (2020) also use a prototype-based contrastive loss. They compute the prototypes as centroids after clustering augmented images via -Means. They also separate learning and clustering procedures, which ProtoTransfer achieves in a single procedure.
In this work, we proposed ProtoTransfer for few-shot classification. ProtoTransfer performs transfer learning from an unlabeled source domain to a target domain with only a few labeled examples. Our experiments show that on mini-ImageNet it outperforms all prior unsupervised few-shot learning approaches by a large margin. On a more challenging cross-domain few-shot classification benchmark, ProtoTransfer shows similar performance to fully supervised approaches. Our ablation studies show that large batch sizes are crucial to learning good representations for downstream few-shot classification tasks and that parametric fine-tuning on target tasks can significantly boost performance.
This work received support from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 754354.
Improved Regularization of Convolutional Neural Networks With Cutout. arXiv preprint arXiv:1708.04552. Cited by: §A.3.3.
Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1126–1135. Cited by: §1, §3.1.
Eurosat: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12 (7), pp. 2217–2226. Cited by: §A.1.2, §3.2.
Realistic Evaluation of Deep Semi-Supervised Learning Algorithms. In Advances in Neural Information Processing Systems (NeurIPS), pp. 3235–3246. Cited by: §A.2, §3.1.
AAAI 2020 : The Thirty-Fourth AAAI Conference on Artificial Intelligence, Cited by: item 6.
Omniglot consists of 1623 handwritten characters from 50 alphabets and 20 examples per character. Identical to Vinyals et al. (2016), the grayscale images are resized to 28x28. Following Santoro et al. (2016), we use 1200 characters for training and 423 for testing.
Mini-ImageNet is a subset of the ILSVRC-12 dataset (Russakovsky et al., 2015), which contains 60,000 color images that we resized to 84x84. For comparability, we use the splits introduced by Ravi and Larochelle (2017) over 100 classes with 600 images each. 64 classes are used for pre-training and 20 for testing. We only use the 16 validation set classes for limited hyperparameter tuning of batch size , number of queries and the augmentation strengths.
We evaluate all cross-domain experiments the CDFSL-benchmark (Guo et al., 2019). It comprises four datasets with decreasing similarity to mini-ImageNet. In order of similarity, they are plant disease images from CropDiseases (Mohanty et al., 2016), satellite images from EuroSAT (Helber et al., 2019), dermatological images from ISIC2018 (Tschandl et al., 2018; Codella et al., 2019) and grayscale chest x-ray images from ChestX (Wang et al., 2017).
We use the Caltech-UCSD Birds-200-2011 (CUB) dataset Welinder et al. (2010); Wah et al. (2011) in our ablation studies. It is composed of 11,788 images from 200 different bird species. We follow the splits proposed by Hilliard et al. (2018) with 100 training, 50 validation and 50 test classes. We do not use the validation set classes.
In the following, we describe the experimental details for the individual experiments. We deliberately stay close to the parameters reported in prior work and do not perform an extensive hyperparameter search for our specific setup, as this can easily lead to performance overestimation compared to simpler approaches (Oliver et al., 2018)). Table 6 summarizes the hyperparameters we used for ProtoTransfer.
|Image input size|
|Learning rate decay factor||0.5||0.5||/||0.5|
|Learning rate decay period||25,000||25,000||/||25,000|
|Augmented queries ()||3||3||3||3|
|Training batch size ()||50||50||50||50|
|Fine-tuning learning rate||0.001||0.001||0.001||0.001|
|Fine-tuning batch size||5||5||5||5|
|Fine-tune last layer||✓||✓||✓||✓|
Our mini-ImageNet and Omniglot experiments use the Conv-4 architecture proposed in Vinyals et al. (2016)et al. (2017) and uses Adam (Kingma and Ba, 2015) with an initial learning rate of 0.001, which is multiplied by a factor of 0.5 every 25000 iterations. We use a batch size of 50. We do not use the validation set to select the best training epoch. Instead training stops after 20.000 iterations without improvement in training accuracy.
Our experiments on the CDFSL-Challenge are based on the code provided by Guo et al. (2019). Following Guo et al. (2019), we use a ResNet10 architecture that is pre-trained on mini-Imagenet images of size 224x224 for 400 epochs with Adam (Kingma and Ba, 2015) and the default learning rate of 0.001 for best comparability with the results reported in Guo et al. (2019). The batch size for self-supervised pre-training is 50. We do not use a validation set.
The CUB training is identical in terms of architecture (Conv-4) and optimization to the setup for our in-domain experiments.
During the fine-tuning stage we add a fully connected classification layer after the embedding function and initialize as described in Section 2.3. We split the support examples into batches of 5 images each and perform 15 fine-tuning epochs with Adam (Kingma and Ba, 2015) and an initial learning rate of 0.001. For target datasets mini-ImageNet and Omniglot only the last fully connected layer is optimized, while for the CDFSL benchmark experiments the embedding network is adapted as well.
For the CDFSL-benchmark (Guo et al., 2019) experiments we employ the same augmentations as Chen et al. (2020), as these have proven to work well for ImageNet (Russakovsky et al., 2015) images of size 224x224. They are as follows:
Random crop and resize: scale , aspect ratio , Bilinear filter with interpolation = 2
Random horizontal flip
Random () color jitter: brightness = contrast = saturation = 0.8, hue=0.2
Random () grayscale
Gaussian blur, random radius
For the mini-Imagenet and CUB experiments we used lighter versions of the Chen et al. (2020) augmentations, namely no Gaussian blur, lower color jitter strengths and smaller rescaling and cropping ranges. They are as follows:
* Random crop and resize: scale , aspect ratio , Bilinear filter with interpolation = 2
Random horizontal flip
* Random vertical flip
* Random () color jitter: brightness = contrast = saturation = 0.4, hue=0.2
Random () grayscale
For Omniglot we use a set of custom augmentations, namely random resizing and cropping, horizontal and vertical flipping, Image-Pixel Dropout (Krizhevsky et al., 2012) and Cutout (DeVries and Taylor, 2017). They are as follows:
Resize to a size of 28x28 pixels
Random and resize: scale , aspect ratio , Bilinear filter with interpolation = 2
Random horizontal flip
Random vertical flip
Random () dropout
Random erasing of a rectangular region in an image (Zhong et al., 2020), setting pixel values to 0: scale , aspect ratio
The classes in the t-SNE plots are a random subset of classes from the mini-ImageNet base classes (classes 1-5) and the mini-ImageNet novel classes (classes 6-10). Their corresponding labels are the following:
n02687172 aircraft carrier
n02823428 beer bottle
n03400231 frying pan
n03272010 electric guitar
n03775546 mixing bowl
n04146614 school bus
Each of the t-SNE plots in Figure 3 shows 500 randomly selected embedded images from within those classes.
|Training (scratch)||52.50 0.84||74.78 0.69||24.91 0.33||47.62 0.44|
|CACTUs-MAML11footnotemark: 1||68.84 0.80||87.78 0.50||48.09 0.41||73.36 0.34|
|CACTUs-ProtoNet11footnotemark: 1||68.12 0.84||83.58 0.61||47.75 0.43||66.27 0.37|
|UMTRA22footnotemark: 2||83.80 -||95.43 -||74.25 -||92.12 -|
|AAL-ProtoNet33footnotemark: 3||84.66 0.70||89.14 0.27||68.79 1.03||74.28 0.46|
|AAL-MAML++33footnotemark: 3||88.40 0.75||97.96 0.32||70.21 0.86||88.32 1.22|
|UFLST44footnotemark: 4||97.03 -||99.19 -||91.28 -||97.37 -|
|ProtoTransfer (ours)||88.00 0.64||96.48 0.26||72.27 0.47||89.08 0.23|
|MAML11footnotemark: 1||94.46 0.35||98.83 0.12||84.60 0.32||96.29 0.13|
|ProtoNet||97.70 0.29||99.28 0.10||94.40 0.23||98.39 0.08|
|Pre+Linear||94.30 0.43||99.08 0.10||86.05 0.34||97.11 0.11|
|Training (scratch)||27.59 0.59||38.48 0.66||51.53 0.72||59.63 0.74|
|CACTUs-MAML11footnotemark: 1||39.90 0.74||53.97 0.70||63.84 0.70||69.64 0.63|
|CACTUs-ProtoNet11footnotemark: 1||39.18 0.71||53.36 0.70||61.54 0.68||63.55 0.64|
|UMTRA22footnotemark: 2||39.93 -||50.73 -||61.11 -||67.15 -|
|AAL-ProtoNet33footnotemark: 3||37.67 0.39||40.29 0.68||-||-|
|AAL-MAML++33footnotemark: 3||34.57 0.74||49.18 0.47||-||-|
|UFLST44footnotemark: 4||33.77 0.70||45.03 0.73||53.35 0.59||56.72 0.67|
|ULDA-ProtoNet55footnotemark: 5||40.63 0.61||55.41 0.57||63.16 0.51||65.20 0.50|
|ULDA-MetaOptNet55footnotemark: 5||40.71 0.62||54.49 0.58||63.58 0.51||67.65 0.48|
|ProtoTransfer (ours)||45.67 0.79||62.99 0.75||72.34 0.58||77.22 0.52|
|MAML11footnotemark: 1||46.81 0.77||62.13 0.72||71.03 0.69||75.54 0.62|
|ProtoNet||46.44 0.78||66.33 0.68||76.73 0.54||78.91 0.57|
|Pre+Linear||43.87 0.69||63.01 0.71||75.46 0.58||80.17 0.51|
Hsu et al. (2019)
Khodadadeh et al. (2019)
Antoniou and Storkey (2019)
Ji et al. (2019)
Qin et al. (2020)
|ProtoCLR||ProtoNet||34.56 0.61||52.76 0.63||62.76 0.59||66.01 0.55|
|ProtoCLR||ProtoTransfer||35.37 0.63||52.38 0.66||63.82 0.59||68.95 0.57|
|Pre(training)||Linear||33.10 0.60||47.01 0.65||59.94 0.62||65.75 0.63|