Part of the success of deep learning models stems from large amounts of labeled data. However, the high cost of acquiring such a large amount of labeled data hinders the widespread application of deep learning models, especially for those fields that require expert annotations, like medical images, or marine biology images[1, 8]. Semi-supervised learning is one of the most powerful methods that can drastically reduce the labeled data required[33, 42]. The basic idea is to select confident model predictions as pseudo-labels to update the parameters of the model. Recent works[33, 42] have shown that they can achieve similar accuracy to supervised learning with fewer annotations.
Existing semi-supervised learning using randomly sampled labeled sets suffers from difficulty in further reducing the number of labels and extremely high computational burden. As shown in Fig. 0(a), even the state-of-the-art methods, Flexmatch, experience a sharp drop in accuracy when the number of annotations decreases. Besides, semi-supervised learning adds considerable computational burden relative to other methods to improve labeling efficiency, as shown in Fig. 0(b). For example, even on the small dataset CIFAR-10, standard semi-supervised training consumes over one week with single GPU, which is about 100 times for supervised training. So in this paper, we try to improve model performance with fewer labels and speed up model training.
. Because selecting samples in active learning takes much less time than training a model, we estimate the total time directly by multiplying the single-round model training time by the number of active learning rounds. Assuming that active learning consists of 4 rounds
These two phenomenons may be caused by poor pseudo-labels in the early training stage of semi-supervised learning. The result of semi-supervised learning is always closely related to the quality of pseudo-labels. Current semi-supervised models are trained from scratch, so in early training stage, pseudo-labels is poor. When there are enough labeled samples, the supervised loss can gradually guide the model to improve pseudo-labels, but when there are few labeled samples, this constraint becomes less influential, making it hard for semi-supervised model to correct pseudo-labels. Furthermore, improving pseudo-labels from scratch requires a large number of training iterations.
To address these issues, in this paper, we propose Active Self-Semi-Supervised Learning (AS3L) to guide semi-supervised learning by prior pseudo-labels. AS3L does label propagation on self-supervised features to generate good pseudo-labels for bootstrapping semi-supervised learning. Since the quality of pseudo-labels depends not only on the quality of features but also on the quality of labeled samples, especially with few annotations, so we develop an active learning strategy to select labeled samples to generate accurate pseudo-labels. Furthermore, to make better use of these pseudo-labels, we design a training mechanism that integrates priors and model predictions in the early training stage, and then updates or removes priors.
The contributions of our work are summarized as follows:
1. We propose the AS3L framework that extends the success of semi-supervised learning to cases with fewer labels. Our proposed method consistently outperforms state-of-the-art algorithms with fewer annotations.
2. We provide a new idea for accelerating semi-supervised training: bootstrapping with good prior pseudo-labels. Specifically, We note that AS3L consistently achieves 3x speedup, with improved accuracy in most cases.
3. We develop an active learning strategy that is tightly coupled to the AS3L framework, which can greatly improve the quality of the obtained prior pseudo-labels when there are few labeled samples, thereby helping AS3L work well with fewer labels and train quickly.
2 Related Work
2.1 Active Learning
2.1.1 Active Learning Strategy for Supervised Learning
The goal of active Learning is finding and annotating those most valuable samples to help model perform better with limited annotations. Most active sampling strategies work in an annotation-training loop: select, annotate samples, and then retrain the model with labeled samples until the annotation budget is exhausted. Max entropy, max-margin and BALD are representatives of uncertainty-based methods. They find some samples for which the model is not confident. K-medoids, Coreset and Wass are typical diversity-based strategies. They sample a subset that covers the entire feature space as much as possible. In order to select samples that take into account both diversity and uncertainty, some hybrid strategies have been proposed. Suggestive annotation constructs a candidate set with high uncertainty and then selects diverse samples in this set. BADGE samples in gradient embedding space. All these strategies work in the context of supervised learning, training the model with only labeled samples, so their performance is weaker than strategies working in the context of semi-supervised learning, which is trained with all samples. In this paper, we propose an active strategy in the context of semi-supervised learning to get better performance.
2.1.2 Active strategy for Semi-Supervised Learning
Active learning strategies based on adversarial learning also use unlabeled data for training, but the ability of such methods to use unlabeled data is weaker than existing semi-supervised learning models, so the performance is still far lower than semi-supervised learning[32, 34, 41]. Also, some researchers have tried to directly combine existing semi-supervised learning with these active strategies developed in the context of supervised learning, but the results were unexpectedly poor, even worse than random selection baseline. To bridge this gap, some new active strategies have been designed in the background of semi-supervised learning. These strategies are more tightly coupled with existing semi-supervised learning. Consistency-based methods argue that we should choose samples that are hard for semi-supervised models, i.e. samples with inconsistent predictions for data augmentation. Jiannan proposed to combine adversarial training and graph-based label propagation to select samples close to cluster boundaries with high uncertainty. However, these strategies are multiple-shot strategies, which means unbearably high computational burden in semi-supervised learning scenarios.
2.1.3 Single-shot Active strategy
Single-shot active learning strategy requests all labels in single times, which may help us benefit from active learning with little extra computational burden. Until now, little attention has been paid to this field. Pseudo-annotator
trains multiple models based on randomly guessed labels for building a single-shot active learning strategy. The way of reducing the number of interactions by sacrificing the cost of model training is not feasible in the deep learning era, because the training cost of deep models is much higher than that of traditional machine learning models such as SVM. Also, Xudong proposes a diversity-based sampling strategy, which select samples in self-supervised feature space. Although this is a similar method to our work, the method is designed according to traditional diversity criteria, and the active learning part and the semi-supervised learning part are independent of each other. However, our method builds a more tightly coupled active self-semi-supervised learning framework and constructs an active learning strategy from the perspective of improving the accuracy of prior pseudo-labels, achieving the effect of reducing both labeling and training costs.
2.2 Semi-Supervised Learning
2.2.1 Semi-Supervised Learning from Scratch
Semi-supervised training typically exploits unlabeled samples with consistency regularization[7, 6, 36, 33] and pseudo-labeling techniques[23, 30], which force the model to predict consistently for various augmented samples. Conventional semi-supervised learning ignores the influence of labeled samples. They build the labeled set by random sampling or stratified random sampling. It makes them perform poorly when only a limited number of annotations are available. Furthermore, semi-supervised learning from scratch are plagued by high training cost. Although a recent work Flexmatch accelerates convergence by curriculum pseudo labeling, it still requires a lot of training iterations. In this paper, we improve these two problems by utilizing information from self-supervised learning.
2.2.2 Semi-Supervised Learning based on Self-Supervised Models
, in the first stage, the model is trained in a purely self-supervised manner, after which, supervised training is performed using a subset of labeled samples, followed by self-distillation using all labeled and unlabeled samples. Therefore, the performance of the model strongly depends on the classifiers trained during the supervised training phase. When annotations are limited, it is a reasonable inference that they perform poorly. Recently, Hao Ping proposed to use label propagation instead of complex semi-supervised techniques. However, self-supervised feature is not perfect for a specific task so that simple label propagation based on this feature does not give ideal results. In this paper, we construct a framework that utilizes pseudo-labels generated by self-supervised features to guide semi-supervised learning.
2.2.3 Train Semi-Supervised Model Faster
In addition to stopping training early, choosing an appropriate unlabeled set for training is the core idea of accelerating semi-supervised training[26, 21, 22]. RETRIEVE is representative of these methods. It constructs a bi-level optimization problem that selects the subset of unlabeled samples that minimizes the loss for labeled samples. But these methods usually lose model accuracy, especially when the number of labeled samples is small. In this paper, we directly reduce the number of training iteration on the basis of guiding training with prior pseudo-labels, and achieve the purpose of accelerating semi-supervised training with almost no loss or even improvement of model accuracy.
2.3 Self-Supervised Learning
Self-supervised learning has received extensive attention for its ability to provide high-quality feature representations without any labels. After some explorations on simple pretext task like predict rotation and jigsaw puzzle, contrastive training became mainstream. Contrastive losses in SimCLR and Moco[19, 11] consist of positive pairs (augmented versions of the same input) and negative pairs (different images), and training forces the model to generate more similar representations for positive pairs than for negative pairs. However, the extremely high computational burden is their main disadvantage. BYOL and Simsiam provide a user-friendly way that training model just with positive pairs and take much fewer computation resources. In this paper, we construct our framework based on self-supervised features.
3 Active Self-Semi-Supervised Learning
Suppose we collect an unlabeled dataset with images, , and we have the budget to request the oracle to annotate images. We expect to train the model with labeled dataset and the unlabeled dataset with acceptable training burden. In this section, we first give overview of our active self-semi-supervised learning(AS3L) framework and then describe each module in AS3L: the proposed single-shot active learning strategy, pseudo-labels generation method, and a semi-supervised training method guided by prior pseudo-labels.
3.1 AS3L Framework
Existing semi-supervised learning methods improve model performance by building a virtuous cycle of updating the model and improving pseudo-labels. In this loop, some unlabeled samples with confident predictions are used to update the model, and then the updated model makes more accurate predictions for unlabeled samples. Generally, the model needs many training iterations to correct its predictions so that it produces sufficiently accurate pseudo-labels. The intuition is that if we have accurate priors at the starting point, the model does not need to be trained for such a large number of iterations. Also, when there are too few labels, the model may not get enough correct guesses to allow the model to enter a virtuous cycle, leading to poor result. Our results (Sec. 4.2) suggest that providing accurate prior pseudo-labels can be a good bootstrapping way to help semi-supervised models enter a virtuous cycle, even with fewer labels, by giving the model more constraints that are likely correct.
In our framework, self-supervised learning and label propagation are used to produce prior pseudo-labels before starting semi-supervised training. To improve the accuracy of these prior labels , we develop an active strategy to select appropriate labeled samples. Additionally, this active strategy is designed as a single-shot strategy to avoid adding too much extra running time.
We incorporate into the existing semi-supervised training framework using the following rationale: ideally, when the model prediction is inaccurate (early stage of semi-supervised training), is used to guide semi-supervised training. And when model predictions are more accurate than , using model predictions as pseudo-labels can leverage semi-supervised learning to correct the pseudo-labels. However, it is difficult to find the exact switching point. We address this by combining a rough switching point and a posterior pseudo-label. The remaining part of the approach is consistent with the existing semi-supervised learning, using supervised loss and consistency regularization to update the model.
To sum up, as shown in Fig. 2, first, we generate a feature representation from self-supervised training. Then, our active learning strategy selects samples and queries their labels, and then propagates annotations in to obtain prior pseudo-labels for unlabeled samples. Finally training a semi-supervised model with the combination of these priors and model predictions.
3.2 Single-shot Active Labeled Sample Selection
The goal of our active learning strategy is to obtain accurate prior pseudo-labels. Thus, in addition to the traditional expectation that selected samples should cover the entire feature space well, we also aim to select samples that have the same label as the dominant label in surrounding samples. Based on the observation that in the self-supervised feature space, , samples located in the same cluster would have similar labels and samples near the center of the cluster are more likely to have the same label as the majority of samples in that cluster. We design our active strategy to find samples close to the center of the cluster.
Although self-supervised training has been proven to provide very good features, , with excellent performance in linear evaluation[9, 12], we experimentally find that clustering directly on is not a good choice. One possible reason is that self-supervised features have a different distribution than features trained with labels, as shown in Fig. 3, self-supervised features generally distribute scattering due to their training with finer-grained surrogate tasks. This means that the distance between features of the same class is large, and the distance between features of different classes is small, which will affect the performance of the clustering algorithm and potentially the accuracy of our pseudo-labels. To obtain a more suitable feature space, we fine-tune these features based on the clusters before selecting samples to label, as explained next.
3.2.1 Fine-tuning features
To make samples in the same cluster closer together, we use a mean squared error (MSE) loss to force samples in the same cluster to be close to their centers. Also, to improve robustness, we employ
times K-means on, with the final loss as defined by Eq. 1, where is sample’s fine-tuned feature, is cluster center corresponding to sample in the cluster. In the process of fine-tuning, since the loss is defined based on times clustering with randomness, those samples that are stable within the same cluster become closer, while other samples attracted by different cluster centers, do not approach one of the centers, thus improving the clustering results.
Additionally, considering the computational cost and still wanting to keep the approximate self-supervised feature structure, we add a single linear layer to the self-supervised trained encoder. During training, the encoder weights are frozen and only the newly added linear layer weights are adjusted. Finally, we select labeled samples and generate prior pseudo-labels based on the linear layer output features .
3.2.2 Select Labeled Set Based on Multiple Clusters
Similarly, perform times K-means clustering on , to improve robustness. We believe that samples occur in same classes in multiple random clustering are more likely to belong to the same semantic category. Therefore, we find the samples that are assigned to the same classes in the times cluster. Then medians are selected to annotate. Detailed algorithm implementation as show in appendix Algorithm 1.
3.3 Prior Pseudo-label Generation
We generate prior pseudo-labels by propagating labels based on clusters. We use the constrained seed K-Means instead of K-means to benefit from labeled sample constraints for better clustering. The labels of the labeled samples are propagated to all unlabeled samples in the same cluster, and then normalized to obtain the prior pseudo-labels. We note that as the number of clusters
increases, the probability of samples in the same cluster having the same label increases. Therefore, in order to improve the accuracy of the prior pseudo-labels, we should increase. However, as increases, the number of samples contained in each cluster decreases, and the number of clusters that do not contain any labeled samples increase (especially, when is greater than the number of labeled samples), resulting in more unlabeled samples that are not propagated to the labels. As a trade-off, we set different for times clustering such that most of the unlabeled samples will be covered by labeled samples, while those farther away from any labeled samples will have lower confidence.
3.4 Semi-Supervised Training Guided by Prior
After active learning and label propagation, we get a labeled set , an unlabeled set with prior pseudo-labels . We formulate the semi-supervised loss in Eq. 2, where is the cross entropy loss of labeled samples, is the consistency loss for unlabeled samples and is a coefficient for trade-off between these two losses. is defined in Eq. 4, where is a ratio of the number of unlabeled samples and labeled samples in each training batch, is batch size, is an adaptive threshold used in Flexmatch, is a random data augmentation function, is the model’s prediction for samples with weak data augmentation, is the model’s prediction for samples with strong data augmentation, is final pseudo-label, is the ‘hard’ one-hot form of and is the cross entropy loss.
As described in Sec. 3.1, semi-supervised models have a strong ability to improve pseudo-labels when entering a virtuous cycle. To allow priors to play a role in guiding training and avoid model overfitting with noisy , we use the normalized sum of model predictions and priors as the final pseudo-labels to guide early training as Eq. 5, where denotes a pre-defined training iteration number. We assume the model has enough correct pseudo-labels after iterations. In order to maximize the effect of semi-supervised training, we tried two options: one option is to remove after iterations, i.e. the model becomes a common semi-supervised training framework, and the other is to update by re-clustering on semi-supervised features. We experimentally find that, except for very few annotations, removing the prior after iteration is a good choice.
Our method is evaluated on CIFAR-10, CIFAR-100 and STL-10, a general benchmark for semi-supervised learning. We experimented with labeled sample sets of various sizes, especially with fewer annotation samples than previous papers (10 labeled samples for CIFAR-10, 200 labeled samples for CIFAR-100, and 20 labeled samples for STL-10).
We compare our method with Flexmatch, a state of the art semi-supervised method, for both the standard number of training iterations and the case of fewer training iterations. To compare the effect of labeled sample selection, we set the baseline as: Coreset(K-center greedy), K-medoids, and stratified random sampling commonly used in semi-supervised learning. Furthermore, we also compare with linear evaluation that trains a linear classifier upon freezing the encoder from self-supervised learning.
4.1.3 Implemention details
For a fair comparison, we set similar hyper-parameters and network architectures to most semi-supervised learning algorithms: SGD with momentum 0.9, initial learning rate 0.03, is 1, is 7, batch size is 64, and a cosine annealing learning rate scheduler. Network architecture: WRN-28-2 for CIFAR-10, WRN-28-8 for CIFAR-100 and WRN-37-2 for STL-10. The standard semi-supervised algorithm is trained for iterations. And in the remaining experiments comparing active learning strategies, the number of training iteration is set to (for CIFAR-10 with 10 labels) or (others). The change point is 6900. To avoid a heavy computational burden, we adopt Simsiam for self-supervised learning and use the same backbone as the semi-supervised learning stage. We follow Simsiam to set hyper-parameters in the self-supervised training stage. The network weight trained from self-supervised learning is used to initialize encoder in semi-supervised training. Clustering is implemented 6 times ( is 6) in both active sampling strategy and label propagation. For active sampling, the number of classes in each cluster, , is equal to the number of selected samples. For label propagation, clustering is done by Constrained Seed K-Means and is set to 10, 20, 30, 40, 50 and 60, respectively. The linear layer used in Sec. 3.2
has the same dimension that the final layer of the backbone, and fine-tuning features trains for 40 epochs.
4.2 Main results
The experimental results are shown in Table 1. Here, we report best model accuracy following . Our method consistently outperforms other active learning strategies and in most cases even standard semi-supervised learning (with much more training iterations than ours). When the number of labeled samples is close to the number of true classes, random baseline is better than existing active sampling strategies: K-medoids and Coreset-greedy because these strategies cannot cover most classes in the dataset. With enough labeled samples, our method also has similar model accuracy to standard semi-supervised training, but we only take about 1/3 of the training time. Furthermore, for the more complex STL-10 dataset, the accuracy of semi-supervised training is very similar to linear evaluation. This is possibly because the STL-10 dataset contains some images of categories other than the classification task, which affects the semi-supervised algorithm more than it affects the linear evaluation.
|Size of Labeled Set||10||40||200||400||20||40|
iterations. All results are average over 3 runs. * denotes results from torch-SSL. The best results are shown in red and the second best results are shown in blue
4.2.1 Training Cost
We compare the training cost from two aspects: the number of training iterations and the actual running time. We note that the computational cost of each self-supervised and semi-supervised iteration is similar when adopting Simsiam and following the hyperparameters setting in Sec.4.1
. The main difference in computational cost is in the loss function part, so the total number of training iterations is a good approximate metric for training cost, independent of the specific classification task. Also, for accurate comparison, we provide actual run time on a single RTX-3090 GPU. Because training on STL-10 requires multiple GPUs, the multi-GPU implementation may cause some additional runtime differences, so we only report results for the remaining two datasets that can be run on a single GPU. As shown in Table2, in most cases, our method is about 3 times faster than standard semi-supervised learning.
|# Iteration||Running Time / hours|
|Size of Labeled Set||10||40||200/400||10||40||200/400|
4.3 Model Detailed Analysis
We further analyze our approach from three aspects: ablation studies for the whole framework, ablation studies for our active sampling strategy and the detailed analysis on pseudo-label propagation.
4.3.1 Ablation Experiment of Framework
Ablation experiments were performed in CIFAR-10. We evaluate three components of our method: active labeled set selection, prior pseudo-label warm-up and re-clustering to update prior pseudo-labels. The results are shown in Table 3. (1) Actively selecting annotated samples can effectively improve the semi-supervised performance when the amount of annotations is small. (2) Guiding semi-supervised training with early on improves model performance regardless of whether labeled samples are selected by the proposed strategy. This confirms our claim that a better starting point for semi-supervised training can improve model performance. (3) The choice of guidance in the later stage of semi-supervised training is affected by the number of annotations. When the number of annotations is more than the number of true classes, prior guidance is not necessary in the later stage of semi-supervised training, otherwise, the continuously updated prior should be retained.
|with 10 labels|
|with 40 labels|
4.3.2 Ablation Study of Active Sampling Strategy
We compare the effects of various components in our proposed active learning strategy. Ablation experiments were performed on CIFAR-10. Here, K-medoids means clustering only once, and multi-clustering means clustering six times sequentially, as described in Sec. 4.1. As shown in Table 4, fine-tuning the features can lead to significant improvements, better class coverage and more accurate pseudo-label. Especially for the case with fewer annotations, multi-clustering and fine-tuning features can bring greater benefits.
|Accuracy of||Class Coverage|
We experiment with the effect of setting different clustering times on the samples selected by our proposed active learning strategy. The experiment is implemented on CIFAR-10 with 10 labels. As shown in Fig. 4, we note that our strategy is robust to the number of clustering . For different , our active learning strategy can select samples that better cover all classes, even in settings with very few annotations. mainly affects the accuracy of the prior pseudo-labels. The larger
is, the smaller the variance, and the accuracy of pseudo-labels can be slightly improved.
4.3.3 Prior Pseudo-label Propagation
For prior pseudo-label generation, we compare our method with LLGC, which is a typical baseline for label spreading, with the hyper-parameters of LLGC following . We also compare the effect of selecting labeled samples using different active learning methods on the accuracy of pseudo-labels in Table 5. Results confirm that the choice of labeled sample selection strategy has a large impact on the accuracy of pseudo-labels. The samples selected by our active learning strategy can yield more accurate pseudo-label when using different label propagation methods. And our pseudo-label propagation method is much better than LLGC when the number of labeled samples is close to the number of true classes, but slightly weaker than LLGC when there are more labels.
|Label propagation||Sampling strategy|
Furthermore, we study the impact of the number of classes in clustering, , in our label propagation. The experiments consist of three settings containing different sizes of , as shown in Table 6. Expected calibration error(ECE) is used to describe how well is calibrated to the true accuracy, smaller values indicate less miscalibration. The results confirm that using different in label propagation is a good compromise between accuracy and calibration.
In this paper, we show that prior pseudo-labels can serve as a good intermediate step to transfer information from self-supervised features to improve semi-supervised training. We also show that a single-shot active learning strategy can enhance this prior. Semi-supervised training guided by this prior can greatly improve the performance of the model with few annotations while reducing computational cost.
-  Inigo Alonso, Matan Yuval, Gal Eyal, Tali Treibitz, and Ana C Murillo. Coralseg: Learning coral segmentation from sparse annotations. Journal of Field Robotics, 36:1456–1477, 2019.
-  Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agarwal. Deep batch active learning by diverse, uncertain gradient lower bounds. In International Conference on Learning Representations, 2020.
-  Haoping Bai, Meng Cao, Ping Huang, and Jiulong Shan. Self-supervised semi-supervised learning for data labeling and quality evaluation. arXiv preprint arXiv:2111.10932, 2021.
Maria-Florina Balcan, Andrei Broder, and Tong Zhang.
Margin based active learning.
International Conference on Computational Learning Theory, pages 35–50. Springer, 2007.
Sugato Basu, Arindam Banerjee, and Raymond Mooney.
Semi-supervised clustering by seeding.
In In Proceedings of 19th International Conference on Machine
. Citeseer, 2002.
-  David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, and Colin Raffel. Remixmatch: Semi-supervised learning with distribution matching and augmentation anchoring. In International Conference on Learning Representations, 2020.
-  David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, and Colin A Raffel. Mixmatch: A holistic approach to semi-supervised learning. Advances in Neural Information Processing Systems, 32, 2019.
-  Michael Bewley, Ariell Friedman, Renata Ferrari, Nicole Hill, Renae Hovey, Neville Barrett, Ezequiel M Marzinelli, Oscar Pizarro, Will Figueira, Lisa Meyer, et al. Australian sea-floor survey data, with images and expert annotations. Scientific data, 2:1–13, 2015.
-  Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
-  Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey E Hinton. Big self-supervised models are strong semi-supervised learners. Advances in neural information processing systems, 33:22243–22255, 2020.
-  Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020.
-  Xinlei Chen and Kaiming He. Exploring simple siamese representation learning. In , pages 15750–15758, 2021.
-  Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
-  Yarin Gal, Riashat Islam, and Zoubin Ghahramani. Deep bayesian active learning with image data. In International Conference on Machine Learning, pages 1183–1192. PMLR, 2017.
-  Mingfei Gao, Zizhao Zhang, Guo Yu, Sercan Ö Arık, Larry S Davis, and Tomas Pfister. Consistency-based semi-supervised active learning: Towards minimizing labeling cost. In European Conference on Computer Vision, pages 510–526. Springer, 2020.
-  Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. In International Conference on Learning Representations, 2018.
-  Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems, 33:21271–21284, 2020.
-  Jiannan Guo, Haochen Shi, Yangyang Kang, Kun Kuang, Siliang Tang, Zhuoren Jiang, Changlong Sun, Fei Wu, and Yueting Zhuang. Semi-supervised active learning for semi-supervised models: exploit adversarial examples with graph-based virtual labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2896–2905, 2021.
-  Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
-  Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, and Ondrej Chum. Label propagation for deep semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5070–5079, 2019.
-  Krishnateja Killamsetty, S Durga, Ganesh Ramakrishnan, Abir De, and Rishabh Iyer. Grad-match: Gradient matching based data subset selection for efficient deep model training. In International Conference on Machine Learning, pages 5464–5474. PMLR, 2021.
-  Krishnateja Killamsetty, Xujiang Zhao, Feng Chen, and Rishabh Iyer. Retrieve: Coreset selection for efficient and robust semi-supervised learning. Advances in Neural Information Processing Systems, 34, 2021.
Dong-Hyun Lee et al.
Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks.In Workshop on challenges in representation learning, ICML, volume 3, page 896, 2013.
-  David D Lewis and Jason Catlett. Heterogeneous uncertainty sampling for supervised learning. In Machine learning proceedings 1994, pages 148–156. Elsevier, 1994.
-  Rafid Mahmood, Sanja Fidler, and Marc T Law. Low budget active learning via wasserstein distance: An integer programming approach. arXiv preprint arXiv:2106.02968, 2021.
-  Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec. Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning, pages 6950–6960. PMLR, 2020.
-  Sudhanshu Mittal, Maxim Tatarchenko, Özgün Çiçek, and Thomas Brox. Parting with illusions about deep active learning. arXiv preprint arXiv:1912.05361, 2019.
Mahdi Pakdaman Naeini, Gregory Cooper, and Milos Hauskrecht.
Obtaining well calibrated probabilities using bayesian binning.
Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
-  Mehdi Noroozi and Paolo Favaro. Unsupervised learning of visual representations by solving jigsaw puzzles. In European conference on computer vision, pages 69–84. Springer, 2016.
-  Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, and Mubarak Shah. In defense of pseudo-labeling: An uncertainty-aware pseudo-label selection framework for semi-supervised learning. In International Conference on Learning Representations, 2021.
Ozan Sener and Silvio Savarese.
Active learning for convolutional neural networks: A core-set approach.In International Conference on Learning Representations, 2018.
-  Samarth Sinha, Sayna Ebrahimi, and Trevor Darrell. Variational adversarial active learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5972–5981, 2019.
-  Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. Advances in Neural Information Processing Systems, 33:596–608, 2020.
-  Toan Tran, Thanh-Toan Do, Ian Reid, and Gustavo Carneiro. Bayesian generative active deep learning. In International Conference on Machine Learning, pages 6295–6304. PMLR, 2019.
-  Xudong Wang, Long Lian, and Stella X Yu. Unsupervised data selection for data-centric semi-supervised learning. arXiv preprint arXiv:2110.03006, 2021.
-  Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and Quoc Le. Unsupervised data augmentation for consistency training. Advances in Neural Information Processing Systems, 33:6256–6268, 2020.
-  Lin Yang, Yizhe Zhang, Jianxu Chen, Siyuan Zhang, and Danny Z Chen. Suggestive annotation: A deep active learning framework for biomedical image segmentation. In International conference on medical image computing and computer-assisted intervention, pages 399–407. Springer, 2017.
-  Yazhou Yang and Marco Loog. Single shot active learning using pseudo annotators. Pattern Recognit., 89:22–31, 2019.
-  Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. In British Machine Vision Conference 2016. British Machine Vision Association, 2016.
-  Xiaohua Zhai, Avital Oliver, Alexander Kolesnikov, and Lucas Beyer. S4l: Self-supervised semi-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1476–1485, 2019.
-  Beichen Zhang, Liang Li, Shijie Yang, Shuhui Wang, Zheng-Jun Zha, and Qingming Huang. State-relabeling adversarial active learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8756–8765, 2020.
-  Bowen Zhang, Yidong Wang, Wenxin Hou, Hao Wu, Jindong Wang, Manabu Okumura, and Takahiro Shinozaki. Flexmatch: Boosting semi-supervised learning with curriculum pseudo labeling. Advances in Neural Information Processing Systems, 34, 2021.
-  Dengyong Zhou, Olivier Bousquet, Thomas Lal, Jason Weston, and Bernhard Schölkopf. Learning with local and global consistency. Advances in neural information processing systems, 16, 2003.