ReLaB: Reliable Label Bootstrapping for Semi-Supervised Learning

07/23/2020 ∙ by Paul Albert, et al. ∙ Insight Centre for Data Analytics 9

Reducing the amount of labels required to trainconvolutional neural networks without performance degradationis key to effectively reduce human annotation effort. We pro-pose Reliable Label Bootstrapping (ReLaB), an unsupervisedpreprossessing algorithm that paves the way for semi-supervisedlearning solutions, enabling them to work with much lowersupervision. Given a dataset with few labeled samples, we firstexploit a self-supervised learning algorithm to learn unsupervisedlatent features and then apply a label propagation algorithm onthese features and select only correctly labeled samples using alabel noise detection algorithm. This enables ReLaB to createa reliable extended labeled set from the initially few labeledsamples that can then be used for semi-supervised learning.We show that the selection of the network architecture andthe self-supervised method are important to achieve successfullabel propagation and demonstrate that ReLaB substantiallyimproves semi-supervised learning in scenarios of very lim-ited supervision in CIFAR-10, CIFAR-100, and mini-ImageNet. Code:



There are no comments yet.


page 3

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Convolutional neural networks (CNNs) are now the established standard for visual representation learning [10, 26, 60], yet one of their most prevalent limitations is the large quantity of labeled data required to better exploit them. Although enormous quantities of unlabeled data are now accessible and can be collected with minimal effort, the annotation process remains limited by human intervention [12, 30, 58, 67]. Representation learning has great potential to address this and the research community is actively developing new algorithms to train CNNs with little to no supervision [7, 19].

In absence of labels, the self-supervised paradigm for unsupervised visual representation learning has recently become popular [3, 14, 18, 19, 65]

. Self-supervised learning defines a pretext task where labels are automatically generated and serve as supervisory training signal. By solving pretext tasks such as colorization of grayscale images 

[65], predicting image rotations [19]

, or automatically estimated clusters assignments 

[9], CNNs can learn general representations that reduce the amount of supervision needed for downstream tasks.

Despite improvements in methods for learning general representations using self-supervision, labels are required to solve tasks [36]. Automatic annotation of data becomes a plausible answer [32] that unavoidably infers some incorrect or noisy labels. To prevent harming the representations learned [39], label noise-resistant training of CNNs is often necessary [13, 40, 44, 49]. In particular, the small loss trick [39] associates examples with a low (high) training loss to samples with clean (noisy) labels. Distinguishing between clean and noisy samples helps with discarding noisy labels [13, 39], correcting labels [1, 44], or reducing their effect on parameter updates [20].

Aiming to reduce the labeling effort, semi-supervised learning jointly exploits a small set of labeled samples and large quantities of unlabeled ones. In particular, consistency regularization methods (e.g. [7, 50]) encourage consistency in the predictions for the same sample under different perturbations while pseudo-labeling methods (e.g. [46, 2]) directly generate labels for the unlabeled samples. Recent work [56, 6] has allowed semi-supervised algorithms to work with very few labels, aiming to minimize human annotation. Berthelot et al. [6] use self-supervised regularization based on [19] to stabilize network training in cases of extremely few labels and Wang et al. [56] use the self-supervision approach from [64] to regularize the MixMatch algorithm [7]. Finally, Rebuffi et al. [43] make use of self-supervision [19] to initialize the network before a two-stage semi-supervised training, achieving substantial improvements over a random initialization.

This paper contributes to a further reduction of human supervision by proposing Reliable Label Bootstrapping (ReLaB), a novel approach to exploiting knowledge transfer from self-supervised learning and paving the way for semi-supervised learning with very scarce annotation. We exploit synergies between label noise, self-supervised, and semi-supervised learning to bootstrap additional reliable labels from a small set of seed samples. In particular, we leverage label propagation algorithms in a self-supervised feature space to extend the provided labels to the entirety of the samples, select a trusted clean subset from this noisy dataset and use the selected subset for semi-supervised training. This enables strong performance for very limited supervision, where we outperform direct training of recent semi-supervised methods and reduce the sensitivity to the initial labeled samples.

Ii Related Work

There have been many attempts in the literature to reduce the amount of strong supervision required to train deep neural networks. These include tasks such as transfer learning 

[61] or few-shot learning [17], where supervised pre-trained features are exploited, and semi-supervised learning [38], self-supervised learning [29], or label noise [1], where all features are learned on the same dataset. This paper focuses on the latter; the following reviews some closely related literature.

Semi-supervised learning

seeks to reduce human supervision by jointly learning from sparsely labeled data and extensive unlabeled data. Semi-supervised learning has evolved rapidly in recent years by exploiting two main strategies [38]: consistency regularization and pseudo-labeling. Consistency regularization promotes consistency in the network’s predictions for the same unlabeled sample altered by different perturbations. Notable examples of consistency regularization algorithms are [35] where samples are perturbed by virtual adversarial attacks, [50] where a teacher network is built from the exponential moving average of a student network weights to produce perturbed predictions, and [52]

, which encourages predictions of interpolated samples to be consistent with the interpolation of the predictions. Recently, Berthelot et al. 

[7] proposed MixMatch, where perturbed predictions are generated by means of data-augmented sharpened labels and labeled and unlabeled examples are mixed together using mixup [63]. MixMatch was extended in ReMixMatch [6] by exploiting distribution alignment [8] and an augmentation anchoring policy. Pseudo-labeling on the other hand directly exploits the network predictions on unlabeled samples by using them as labels (pseudo-labels) to regularize training. [15] is an early attempt at pseudo-labeling but is limited to a finetuning stage on a pre-trained network. [23] implements a graph-based, weighted pseudo-label generation based on a label propagation algorithm and [45] derive certainty weights for unlabeled samples from their distance to neighboring samples in the feature space. Recently, Arazo et al. [2] have shown that a pure pseudo-labeling without using consistency regularization can reach competitive performance when addressing confirmation bias [33].

Self-supervised learning

defines proxy or pretext tasks to learn useful representations without human intervention [29]. Context prediction [14], colorization [65], puzzle solving [37], instance discrimination [55], image rotation prediction [19], and image transformation prediction [64] are some examples of pretext tasks. Some recent efforts on self-supervised learning generate meta-labels via -means clustering in the feature space [9] or by solving an optimal transport problem [3]. Conversely, [25] explore the feature space by iteratively constructing local neighborhoods with a high instance discrimination consistency to learn useful representations.

Recent contributions shows that coupling self-supervised and semi-supervised learning can increase accuracies with fewer labels. Rebuffi et al. [43] use RotNet [19] as a network initialization strategy, ReMixMatch [6] exploits RotNet [19] together with their semi-supervised algorithm to achieve stability with few labels, and EnAET [56] leverage transformation encoding from AET [64] to improve the consistency of predictions on transformed images.

Label propagation

transfers the information from labeled data to an unlabeled dataset [5]

. The process stems from random walk diffusion for image retrieval 

[16, 48, 66]

where a pairwise affinity matrix is constructed, relating images to each other before diffusing the affinity values to the entirety of the graph. The diffusion result can be directly used to estimate labels and finetune pre-trained networks in few-shot learning 

[17] or to define a pseudo-labeling for semi-supervised learning [23]. Other attempts at semi-supervised learning exploit label propagation to dynamically capture the manifold’s structure and regularize it to form compact clusters that facilitate class separation [27] or to encourage random walks that end up in the same class from which they started while penalizing different class endings [22].

Label noise

is a topic of increasing interest for the community [39] that aims at limiting degradation of CNNs representations when learning in label noise conditions [62]. Label noise algorithms can be categorized in three different approaches: loss correction [20, 40, 44], relabeling [49, 59], and semi-supervised [13, 39]. Loss correction seeks to reduce the contribution of the incorrect or noisy labels in the training objective. The authors of [44] define per-sample losses based on combining both the potentially noisy label and the potentially clean network prediction and [1]

extend this idea by dynamically defining such combinations in an attempt to fully dismiss the noisy labels contribution. Other loss correction approaches multiply the softmax probability by a label noise transition matrix

that specifies the probability of one label being flipped to another ([21, 40]) whereas per-sample weights to reduce the influence of noisy samples has also been addressed [20, 54]. Relabeling approaches propose to avoid fitting noisy labels by relabeling all samples using either the network predictions  [49] or estimated label distributions [59] as soft-labels. Semi-supervised learning methods detect the noisy samples before discarding their harmful labels and exploiting their content in a semi-supervised setup [13, 28, 39]. Finally, a recurrent observation to identify clean samples is the small loss trick [1, 20, 39, 49] where clean samples exhibit a lower loss as they represent easier patterns. It is worth mentioning that mixup data augmentation [63] has shown good performance when dealing with label noise in real scenarios [39] without explicitly addressing it.

Fig. 1: Reliable Label Bootstrapping (ReLaB) overview (best viewed in color). Unlike traditional SSL (bottom) that directly uses the labeled examples provided (airplane

), ReLaB bootstraps additional labels before applying SSL (top). Unsupervised learning using labeled (black) and unlabeled (gray) samples is done to obtain discriminative representations, and label propagation jointly exploits unsupervised representations and labeled examples to label all data, which leads to both correct (green) and incorrect (red) labels. A sample selection is finally performed to avoid noisy labels and create a reliable extended labeled set.

Iii Reliable label bootstrapping for semi-supervised learning

We formulate a semi-supervised classification task for classes as learning a model given a training set of samples. The dataset consists of the labeled set

with corresponding one-hot encoded labels

and the unlabeled set , being . We consider a CNN for , where denotes the model parameters. The network comprises a feature extractor with parameters , which maps the input space into the feature space

, and a classifier

with parameters . Substantially decreasing the number of labels significantly decreases semi-supervised learning performance [6]. We therefore propose to bootstrap additional labels for unlabeled samples from . First, label propagation [16, 23, 24, 48, 51] is performed using self-supervised visual representations to estimate labels for the unlabeled set and create a extended dataset . Second, the small loss trick from the label noise literature [39] is used to select reliable samples from whose label can be trusted (i.e. it is not noisy) to create a reliable extended labeled set . Finally, semi-supervised learning is applied to the extended labeled set and the unlabeled set . Figure 1 presents an overview of the proposed approach.

Iii-a Leveraging self-supervised representations for label propagation

Knowledge transfer from the labeled set to the unlabeled set is implicitly done by semi-supervised learning approaches as network predictions for can be seen as estimated labels . With few labeled samples, however, it is difficult to learn useful initial representations from and performance is substantially degraded [6] (see Subsection IV-E).

Although label propagation for semi-supervised learning has previously been studied as a regularisation or as a semi-supervised objective [23], we propose here to follow an alternative direction as our goal is first to leverage self-supervised features and second to only label a reliable subset. Given a set of descriptors learnt in an unsupervised manner, we seek to use an efficient label propagation algorithm capable of efficiently fitting to the data manifold. Diffusion [16, 23, 24, 48, 51] is a well documented label propagation algorithm that provides a good solution to our problem. We reformulate under the diffusion algorithm in a similar fashion than [23]. Here we study the estimation of as a label propagation task using unsupervised visual representations learned from all data samples . In particular, we fit a feature extractor using self-supervision to obtain class-discriminative image representations [29] and subsequently propagate labels from the labeled images to estimate labels for the unlabeled samples. We do so by solving a label propagation problem based on graph diffusion [23]. First, the set of descriptors are used to define the affinity matrix:


where is the degree matrix of the graph and the adjacency matrix is computed as if and otherwise. weighs the affinity term to controls the sensitivity to far neighbors and is set to 3 as in [23]. The diffusion process estimates the matrix as:


where denotes the probability of jumping to adjacent vertices in the graph and is the label matrix defined such that if sample and (i.e. belongs to the class), where () indexes the rows (columns) in Finally, the estimated one-hot label is:

for each unlabeled sample . This estimated labels allows the creation of the extended dataset with estimated labels , where , . Note that we follow common practices for image retrieval [4, 41] and perform PCA whitening as well as normalization on the features .

Iii-B Reliable sample selection: dealing with noisy labels

Propagating existing labels using self-supervised representations as described in Subsection III-A, results in estimated labels that might be incorrect, i.e. label noise. Using noisy labels as a supervised objective on leads to performance degradation due to label noise memorization [39, 62] (see Table III in Subsection IV-C). Since the label noise present in

comes from features extracted from the data, noisy samples tend to be visually similar to the seed sample. Consequently, robust state-of-the-art algorithms for supervised learning with label noise 

[1, 49, 63], principally designed to work on symmetric noise distributions, underperform (see Subsection IV-C). The small-loss trick [1, 39, 49] states that samples with a smaller loss are cleaner than their high loss counterpart. Previous works utilizing the small loss have proven its efficiency for artificial noise distributions and our selection of a clean subset and training in a semi-supervised manner follows a similar approach  [13, 28, 39]. However, our feature-based noise generated after label propagation is unbalanced in number of samples and level of noise in each class, thus posing a difficult scenario that has not being addressed in the label noise literature. We therefore propose a different method to identify clean samples using the cross-entropy loss:


with softmax-normalized logits

and training with a high learning rate that helps prevent label noise memorization [1] on the extended dataset . Samples whose associated loss is low are more likely to have a correct label. The reliable set with , is then created by selecting for each class the originally labeled samples for that class in and the samples in class from with the lowest loss , i.e. highly reliable samples. The challenging noise present in makes the loss

during any particular epoch unstable (see Figure 

2). We therefore propose to average it over the last training epochs to create . We set the number of labeled samples per-class equally for all classes, i.e. , and choose it based on traditionally reported baselines for semi-supervised experiments [2, 6, 38]. For example, in CIFAR-10 usually achieves convergence to reasonable performance. Table II

shows that the approach and the noise percentage of the generated dataset is not overly sensitive to this hyperparameter.

Iii-C Semi-supervised learning

Unlike traditional learning from and , ReLaB empowers semi-supervised algorithms with a (larger) reliable labeled set extended from the original (smaller) labeled set . The extension from to is done in a completely unsupervised manner and as a consequence, we greatly reduce the error rates of SSL algorithms when few labels are given, e.g. the % error of ReMixMatch [6] in CIFAR-10 for one labeled sample per class () is reduced to % when using representative labeled samples.

Fig. 2: The label noise percentage of the reliable extended set . Exploiting the per-epoch loss strongly impacts the noise percentage, whereas averaging losses across epochs provides a stable and low label noise percentage. Example extracted when applying ReLaB in CIFAR-10 with 1 labeled sample per class.

Iv Experiments

Iv-a Datasets and implementation details

We experiment with three image classification datasets: CIFAR-10 [31], CIFAR-100 [31], and mini-ImageNet [53]. CIFAR (mini-ImageNet) data consists of 60K () RGB images split into 50K training samples and 10K for testing. CIFAR-10 samples are organized in 10 classes, while CIFAR-100 and mini-ImageNet are in 100.

We construct the reliable set by training for 60 epochs with a high learning rate (0.1) to prevent label noise memorization [1] and select the lowest loss samples per class at the end of the training. We average the per-sample loss over the last epochs of training to stabilize the reliable sample selection (see Figure 2). Regarding SSL, we always use a standard WideResNet-28-2 [60] for fair comparison with related work. We combine our approach with state-of-the-art pseudo-labeling [2] and consistency regularization-based [6] semi-supervised methods to prove the stability of ReLaB for different semi-supervised strategies. We use the default configuration for pseudo-labeling111 except for the network initialization, where we make use of self-supervision [19] and freeze all the layers up to the last convolutional block in a similar fashion than [43]. The network is warmed up on the labeled set for epochs [2] and then trained for epochs on the whole dataset. For ReMixMatch222 we found no initialisation was necessary and train for epochs. Experiments in Subsection IV-C for the supervised alternatives on dealing with label noise [1, 63] follow the authors’s configurations, while cross-entropy training in Table III is done for 150 epochs with an initial learning rate of 0.1 that we divide by 10 in epochs 80 and 130.

Labels/class 1 4 10 4 10 25
RotNet [19] WRN-28-2
NPID [55] WRN-28-2
UEL [34] WRN-28-2
AND [25] WRN-28-2
TABLE I: Label noise percentage in

after label propagation for different self-supervised methods and architectures. The average error and the standard deviation are reported over 3 runs with different labeled samples in

. Lower is better.
Noise (%) SSL error Noise (%) SSL error
ReLaB + RMM ()
ReLaB + RMM ()
ReLaB + RMM ()
ReLaB + RMM ()
TABLE II: Sensitivity to the subset size with 4 labeled samples per class (). We report label noise percentage in and final error rates after semi-supervised training when using ReMixMatch (RMM) [6].

Iv-B Importance of the self-supervised representations for label propagation

Label propagation relies upon representations extracted form the data and is as such conditioned by the quality of these representations. We propose to exploit unsupervised learning to obtain these representations, which strongly impacts the label propagation proposed in Subsection III-A (see Table I). In particular, we present the label noise percentage of the extended labeled set in CIFAR-10 (100) formed after label propagation of the specified self-supervised representations with 1, 4 and 10 (4, 10 and 25) labeled samples per-class in . We select RotNet [19], NPID [55], UEL [34], and AND  [25] as four recent self-supervised methods, and experiment with the WideResNet-28-2 (WRN-28-2) [60], ResNet-18 (RN-18) and ResNet-50 (RN-50) [26] architectures. We confirm that the architecture has a key impact on the label noise percentage, which agrees with previous observations on the quality of self-supervised features from larger architectures [29]. More capacity does not reduce the noise percentage for RotNet, whereas NPID, UEL, and AND are more stable across architectures and different amounts of labels. We select AND coupled with ResNet-50 for learning self-supervised features suitable for label propagation in the subsequent experimentations.

Iv-C Dealing with noisy labels

M [63]
DB [1]
DB + AA [11]
ReLaB + PL [2]
ReLaB + RMM [6]
TABLE III: Error rates for ReLaB followed by SSL with 4 labeled samples per class () in CIFAR-10 and CIFAR-100 comparaed to training directly on the noisy set with label noise robust methods [1, 63].
Labeled samples 10 40 100 250
-model [42] - - -
MT [50] - - -
PL [2]
MM [7] - -
UDA [57] - -
RMM [6]
EnAET [56] - -
ReLaB + PL
Labeled samples 100 400 1000 2500
-model [42] - - -
MT [50] - - -
PL [2]
MM [7] - -
UDA [57] - - - -
RMM [6]
EnAET [56] - - -
ReLaB + PL
TABLE IV: Effect of ReLaB on semi-supervised learning in CIFAR-10 (100) on top (bottom) with very limited amounts of labeled data. for CIFAR-10 (100). Results marked with are from [47] or [56], while the rest are from our own runs. Bold denotes best. The average error and the standard deviation are reported over 3 runs with different labeled samples. We do not report higher numbers of labeled samples as the differences among recent SSL algorithms become statistically insignificant [47].

The extended dataset after label propagation contains label noise; we proposed in Subsection III-B to select a subset of samples by selecting the most reliable samples via the small loss trick to reduce such noise. represents an extended labeled set when compared to the small labeled set . Here we analyze the importance of ’s size on its label noise percentage and SSL performance. Table II shows how, although selecting more samples slightly increases the noise percentage, the semi-supervised errors are relatively insensitive to this and are even sometimes reduced due to more samples being considered. This tendency ceases at , where more samples do not compensate the higher noise percentage. Based on this experiment and the typical amounts of labeled samples needed to perform successful SSL [2, 7, 23, 50], we choose use for CIFAR-10 (100) for further experiments.

There are also supervised alternatives on dealing with label noise [1, 63]. Table III compares the proposed approach with standard cross-entropy (CE) training on and recent label noise robust methods such as the noise resistant Mixup (M) augmentation [63] and the Dynamic Bootstrapping (DB) loss correction method [1]. In both CIFAR-10 and CIFAR-100, ReLaB + ReMixMatch (RMM) outperforms supervised alternatives. This does not hold for ReLaB + Pseudo-labeling (PL) in CIFAR-100, which is slightly ourperformed by DB. To demonstrate that ReLaB + RMM does not lead to better performance solely because of stronger data augmentation used in RMM, we equip DB with the strong augmentation policy AutoAugment (AA) [11] (DB + AA). This improved DB is still far from ReLaB + RMM performance, demonstrating the utility of the the reliable set selection followed by SSL compared to supervised alternatives.

Iv-D Semi-supervised learning with Reliable Label Bootstrapping

Table IV shows the benefits of ReLaB for semi-supervised learning with PL [2] and ReMixMatch (RMM) [6] compared to direct application of semi-supervised methods in CIFAR-10/100. Our focus is very low levels of labeled samples: semi-supervised methods [6] already achieve very good performance with larger numbers of labeled samples. ReLaB acts as a pre-processing step that extends the number of available samples, thus enabling better performance of semi-supervised methods. We further study the 1 sample per class scenario in Subsection IV-E.

Table VI demonstrates the scalability of our approach to higher resolution images by evaluating ReLaB + PL [2] on mini-ImageNet [53]. We use ResNet-18 instead of a ResNet-50 to train AND with an acceptable batch size for the mini-ImageNet experiments due to GPU memory constraints.

Iv-E Very low levels of labeled samples

The high standard deviation using 1 sample per class () in CIFAR-10 (Table IV) motivates the proposal of a more reasonable method to compare against other approaches. To this end, the authors of [47] proposed 8 different labeled subsets for 1 sample per class in CIFAR-10, ordered from more representative to less representative, we reduce the experiments to 3 subsets: the most representative, the least representative, and one in the middle. Figure 3 shows the selected subsets; the exact sample ids will are available on

Table V reports the performance for each subset and compares against FixMatch [47] and our configuration of ReMixMatch [6]. Note that the results obtained for the less representative samples reflect the results that can be expected on average when drawing labeled samples randomly (see Table IV). Furthermore, although there is a high accuracy variability with 1 sample per class on CIFAR-10, the standard deviation over the CIFAR-100 and mini-ImageNet runs is low enough that it can be directly compared to others even when drawing the labeled samples randomly and therefore we omit the fixed samples comparison.

Fig. 3: Labeled samples used for the 1 sample per class study on CIFAR-10 and taken from [47], ordered from top to bottom from most representative to least representative.
ReMixMatch [6]
FixMatch [47]
ReLaB + PL
TABLE V: Error rates for 1 sample per class on CIFAR-10 with different labeled sets. ReLaB enables a convergence better than a random guess even for the least representative sample. All results are from our own runs except FixMatch [47]. Key: MR (Most Representative), LR (Less Representative), NR (Not Representative).
Labeled samples 100 400 1000 4000
PL [2]
ReLaB + PL
TABLE VI: Effect of ReLaB to improve semi-supervised learning on mini-ImageNet very limited amounts of labeled data and . We run the experiment with a WideResNet-28-2 to set a comparable baseline to the CIFARs datasets. Bold denotes best results. The average error and the standard deviation are reported over 3 runs with different labeled samples. For 4000 labeled samples, RelaB does not bootstrap additional samples since .

V Conclusion

ReLaB is a label bootstrapping method that enables the use of standard semi-supervised algorithms with very sparsely labelled data by efficiently leveraging self-supervised learning. We extend the labeled pool through propagation in a self-supervised feature space and properly deal with label noise resulting from the automatic label assignment to extract an extended clean subset of labeled samples before training in a semi-supervised fashion. We demonstrate the direct impact of better unsupervised features for the performance of ReLaB and enable traditional semi-supervised algorithms to reach remarkable and stable accuracies with very few labeled samples on standard datasets.


This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under grant number [SFI/15/SIRG/3283] and [SFI/12/RC/2289_P2] as well as from the Department of Agriculture, Food and Marine on behalf of the Government of Ireland under Grant Number [16/RC/3835].


  • [1] E. Arazo, D. Ortego, P. Albert, N. O’Connor, and K. McGuinness (2019) Unsupervised Label Noise Modeling and Loss Correction. In

    International Conference on Machine Learning (ICML)

    Cited by: §I, §II, §II, §III-B, §IV-A, §IV-C, TABLE III.
  • [2] E. Arazo, D. Ortego, P. Albert, N.E. O’Connor, and K. McGuinness (2019) Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning. arXiv: 1908.02983. Cited by: §I, §II, §III-B, §IV-A, §IV-C, §IV-D, §IV-D, TABLE III, TABLE IV, TABLE VI.
  • [3] Y. M. Asano, C. Rupprecht, and A. Vedaldi (2020) Self-labelling via simultaneous clustering and representation learning. In International Conference on Learning Representations (ICLR), Cited by: §I, §II.
  • [4] A. Babenko and V. S. Lempitsky (2015) Aggregating Deep Convolutional Features for Image Retrieval. In

    European Conference on Computer Vision (ECCV)

    Cited by: §III-A.
  • [5] Y. Bengio, O. Delalleau, and N. Le Roux (2006) Label propagation and quadratic criterion. Technical report Carnegie Mellon University. Cited by: §II.
  • [6] D. Berthelot, N. Carlini, E. Cubuk, A. Kurakin, K. Sohn, H. Zhang, and C. Raffel (2020) ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring. In International Conference on Learning Representations (ICLR), Cited by: §I, §II, §II, §III-A, §III-B, §III-C, §III, §IV-A, §IV-D, §IV-E, TABLE II, TABLE III, TABLE IV, TABLE V.
  • [7] D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. Raffel (2019) MixMatch: A Holistic Approach to Semi-Supervised Learning. In Advances in Neural Information Processing Systems (NeuRIPS), Cited by: §I, §I, §II, §IV-C, TABLE IV.
  • [8] J. Bridle, A. Heading, and D.. MacKay (1992) Unsupervised Classifiers, Mutual Information and’Phantom Targets. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §II.
  • [9] M. Caron, P. Bojanowski, A. Joulin, and M. Douze (2018) Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision (ECCV), Cited by: §I, §II.
  • [10] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In European Conference on Computer Vision (ECCV), Cited by: §I.
  • [11] E. Cubuk, B. Zoph, D. Mané, V. Vasudevan, and Q. Le (2019) AutoAugment: Learning Augmentation Policies from Data. In

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    Cited by: §IV-C, TABLE III.
  • [12] D. Damen, H. Doughty, G.M. Farinella, S. Fidler, A. Furnari, E. Kazakos, D. Moltisanti, J. Munro, T. Perrett, W. Price, and M. Wray (2018) Scaling Egocentric Vision: The EPIC-KITCHENS Dataset. In European Conference on Computer Vision (ECCV), Cited by: §I.
  • [13] Y. Ding, L. Wang, D. Fan, and B. Gong (2018) A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels. In IEEE Winter Conference on Applications of Computer Vision (WACV), Cited by: §I, §II, §III-B.
  • [14] C. Doersch, A. Gupta, and A. Efros (2015) Unsupervised Visual Representation Learning by Context Prediction. In IEEE International Conference on Computer Vision (ICCV), Cited by: §I, §II.
  • [15] L. Dong-Hyun (2013) Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks. International Conference on Machine Learning Workshops (ICMLW). Cited by: §II.
  • [16] M. Donoser and H. Bischof (2013) Diffusion Processes for Retrieval Revisited. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §II, §III-A, §III.
  • [17] M. Douze, A. Szlam, B. Hariharan, and H. Jegou (2018) Low-shot learning with large-scale diffusion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §II, §II.
  • [18] Z. Feng, C. Xu, and D. Tao (2019) Self-Supervised Representation Learning by Rotation Feature Decoupling. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I.
  • [19] S. Gidaris, P. Singh, and N. Komodakis (2018) Unsupervised Representation Learning by Predicting Image Rotations. In International Conference on Learning Representations (ICLR), Cited by: §I, §I, §I, §II, §II, §IV-A, §IV-B, TABLE I.
  • [20] B. Han, Q. Yao, X. Yu, G. Niu, M. Xu, W. Hu, I. Tsang, and M. Sugiyama (2018) Realistic evaluation of deep semi-supervised learning algorithms. In Advances in Neural Information Processing Systems (NeuRIPS), Cited by: §I, §II.
  • [21] D. Hendrycks, M. Mazeika, D. Wilson, and K. Gimpel (2018) Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §II.
  • [22] P. Husser, A. Mordvintsev, and D. Cremers (2017) Learning by Association - A versatile semi-supervised training method for neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §II.
  • [23] A. Iscen, G. Tolias, Y. Avrithis, and O. Chum (2019) Label propagation for deep semi-supervised learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §II, §II, §III-A, §III, §IV-C.
  • [24] A. Iscen, G. Tolias, Y. Avrithis, T. Furon, and O. Chum (2017) Efficient Diffusion on Region Manifolds: Recovering Small Objects with Compact CNN Representations. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §III-A, §III.
  • [25] H. Jiabo, D. Qi, G. Shaogang, and Z. Xiatian (2019)

    Unsupervised Deep Learning by Neighbourhood Discovery

    In International Conference on Machine Learning (ICML), Cited by: §II, §IV-B, TABLE I.
  • [26] H. Kaiming, Z. Xiangyu, R. Shaoqing, and S. Jian (2015) Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I, §IV-B.
  • [27] K. Kamnitsas, D. Castro, L. Le Folgoc, I. Walker, R. Tanno, D. Rueckert, B. Glocker, A. Criminisi, and A. V. Nori (2018) Semi-Supervised Learning via Compact Latent Space Clustering. In International Conference on Machine Learning (ICML), Cited by: §II.
  • [28] Y. Kim, J. Yim, J. Yun, and J. Kim (2019) NLNL: Negative Learning for Noisy Labels. In IEEE International Conference on Computer Vision (ICCV), Cited by: §II, §III-B.
  • [29] A. Kolesnikov, X. Zhai, and L. Beyer (2019) Revisiting self-supervised visual representation learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §II, §II, §III-A, §IV-B.
  • [30] R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D.A. Shamma, M. Bernstein, and F.-F. Li (2016) Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. arXiv: 1602.07332. Cited by: §I.
  • [31] A. Krizhevsky and G. Hinton (2009) Learning multiple layers of features from tiny images. Technical report University of Toronto. Cited by: §IV-A.
  • [32] W. Li, L. Wang, W. Li, E. Agustsson, and L. Van Gool (2017) WebVision Database: Visual Learning and Understanding from Web Data. arXiv: 1708.02862. Cited by: §I.
  • [33] Y. Li, L. Liu, and R. Tan (2019) Certainty-Driven Consistency Loss for Semi-supervised Learning. arXiv: 1901.05657. Cited by: §II.
  • [34] Y. Mang, Z. Xu, Y. Pong, and C. Shih-Fu (2019) Unsupervised Embedding Learning via Invariant and Spreading Instance Feature. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §IV-B, TABLE I.
  • [35] T. Miyato, S. Maeda, S. Koyama, and S. Ishii (2017) Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Cited by: §II.
  • [36] M. Noroozi, A. Vinjimoor, P. Favaro, and H. Pirsiavash (2018) Boosting Self-Supervised Learning via Knowledge Transfer. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I.
  • [37] M. Noroozi and P. Favaro (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In European Conference on Computer Vision (ECCV), Cited by: §II.
  • [38] A. Oliver, A. Odena, C. Raffel, E. D. Cubuk, and I. Goodfellow (2018) Realistic evaluation of deep semi-supervised learning algorithms. In Advances in Neural Information Processing Systems (NeuRIPS), Cited by: §II, §II, §III-B.
  • [39] D. Ortego, E. Arazo, P. Albert, N. O’Connor, and K. McGuinness (2019) Towards Robust Learning with Different Label Noise Distributions. arXiv: 1912.08741. Cited by: §I, §II, §III-B, §III.
  • [40] G. Patrini, A. Rozza, A. Krishna Menon, R. Nock, and L. Qu (2017) Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I, §II.
  • [41] F. Radenovic, G. Tolias, and O. Chum (2018) Fine-tuning CNN Image Retrieval with No Human Annotation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Cited by: §III-A.
  • [42] A. Rasmus, H. Valpola, M. Honkala, M. Berglund, and T. Raiko (2015) Semi-Supervised Learning with Ladder Network. In Advances in Neural Information Processing Systems (NeuRIPS), Cited by: TABLE IV.
  • [43] S-A. Rebuffi, S. Ehrhardt, K. Han, A. Vedaldi, and A. Zisserman (2019) Semi-Supervised Learning with Scarce Annotations. arXiv: 1905.08845. Cited by: §I, §II, §IV-A.
  • [44] S. Reed, H. Lee, D. Anguelov, C. Szegedy, D. Erhan, and A. Rabinovich (2014) Training deep neural networks on noisy labels with bootstrapping. arXiv: 1412.6596. Cited by: §I, §II.
  • [45] W. Shi, Y. Gong, C. Ding, Z. Ma, X. Tao, and N. Zheng (2018) Transductive Semi-Supervised Deep Learning Using Min-Max Features. In European Conference on Computer Vision (ECCV), Cited by: §II.
  • [46] W. Shi, Y. Gong, C. Ding, Z. MaXiaoyu Tao, and N. Zheng (2018) Transductive Semi-Supervised Deep Learning using Min-Max Features. In European Conference on Computer Vision (ECCV), Cited by: §I.
  • [47] K. Sohn, D. Berthelot, C.-L. L, Z. Zhang, N. Carlini, E. Cubuk, A. Kurakin, H. Zhang, and C. Raffel (2020) FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. arXiv: 2001.07685. Cited by: Fig. 3, §IV-E, §IV-E, TABLE IV, TABLE V.
  • [48] M. Szummer and J. Tommi (2002) Partially labeled classification with Markov random walks. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §II, §III-A, §III.
  • [49] D. Tanaka, D. Ikami, T. Yamasaki, and K. Aizawa (2018) Joint Optimization Framework for Learning with Noisy Labels. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I, §II, §III-B.
  • [50] A. Tarvainen and H. Valpola (2017) Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems (NeurIPS), Cited by: §I, §II, §IV-C, TABLE IV.
  • [51] G. Tolias, Y. Avrithis, and H. Jégou (2013) To Aggregate or Not to aggregate: Selective Match Kernels for Image Search. In IEEE International Conference on Computer Vision (ICCV), Cited by: §III-A, §III.
  • [52] V. Verma, A. Lamb, J. Kannala, Y. Bengio, and D. Lopez-Paz (2019) Interpolation Consistency Training for Semi-Supervised Learning. In

    International Joint Conferences on Artificial Intelligence (IJCAI)

    Cited by: §II.
  • [53] O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra (2016) Matching Networks for One Shot Learning. In Advances in Neural Information Processing Systems (NeuRIPS), Cited by: §IV-A, §IV-D.
  • [54] Y. Wang, W. Liu, X. Ma, J. Bailey, H. Zha, L. Song, and S.-T. Xia (2018) Iterative learning with open-set noisy labels. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §II.
  • [55] Z. Wu, Y. Xiong, S. X. Yu, and D. Lin (2018) Unsupervised feature learning via non-parametric instance discrimination. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §II, §IV-B, TABLE I.
  • [56] W. Xiao, K. Daisuke, L. Jiebo, and Q. Guo-Jun (2019)

    EnAET: Self-Trained Ensemble AutoEncoding Transformations for Semi-Supervised Learning

    arXiv: 1911.09265. Cited by: §I, §II, TABLE IV.
  • [57] Q. Xie, Z. Dai, E. Hovy, M.-T. Luong, and Q. Le (2019) Unsupervised Data Augmentation for Consistency Training. arXiv: 1904.12848. Cited by: TABLE IV.
  • [58] N. Xu, L. Yang, Y. Fan, J. Yang, D. Yue, Y. Liang, B. Price, S. Cohen, and T. Huang (2018) YouTube-VOS: Sequence-to-Sequence Video Object Segmentation. In European Conference on Computer Vision (ECCV), Cited by: §I.
  • [59] K. Yi and J. Wu (2019) Probabilistic End-To-End Noise Correction for Learning With Noisy Labels. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §II.
  • [60] S. Zagoruyko and N. Komodakis (2016) Wide residual networks. arXiv: 1605.07146. Cited by: §I, §IV-A, §IV-B.
  • [61] A.R. Zamir, A. Sax, W. Shen, L.J. Guibas, J. Malik, and S. Savarese (2018) Taskonomy: disentangling task transfer learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §II.
  • [62] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals (2017) Understanding deep learning requires re-thinking generalization. In International Conference on Learning Representations (ICLR), Cited by: §II, §III-B.
  • [63] H. Zhang, M. Cisse, Y. Dauphin, and D. Lopez-Paz (2018) mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations (ICLR), Cited by: §II, §II, §III-B, §IV-A, §IV-C, TABLE III.
  • [64] L. Zhang, G.-J. Qi, L. Wang, and J. Luo (2019) Aet vs. aed: Unsupervised representation learning by auto-encoding transformations rather than data. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I, §II, §II.
  • [65] R. Zhang, P. Isola, and A. A. Efros (2016) Colorful image colorization. In European Conference on Computer Vision (ECCV), Cited by: §I, §II.
  • [66] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf (2003) Learning with Local and Global Consistency. In International Conference on Neural Information Processing Systems (NeurIPS), Cited by: §II.
  • [67] P. Zhu, L. Wen, X. Bian, L. Haibin, and Q. Hu (2018) Vision Meets Drones: A Challenge. arXiv: 1804.07437. Cited by: §I.