DeepAI
Log In Sign Up

ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning

Existing semi-supervised learning (SSL) algorithms typically assume class-balanced datasets, although the class distributions of many real-world datasets are imbalanced. In general, classifiers trained on a class-imbalanced dataset are biased toward the majority classes. This issue becomes more problematic for SSL algorithms because they utilize the biased prediction of unlabeled data for training. However, traditional class-imbalanced learning techniques, which are designed for labeled data, cannot be readily combined with SSL algorithms. We propose a scalable class-imbalanced SSL algorithm that can effectively use unlabeled data, while mitigating class imbalance by introducing an auxiliary balanced classifier (ABC) of a single layer, which is attached to a representation layer of an existing SSL algorithm. The ABC is trained with a class-balanced loss of a minibatch, while using high-quality representations learned from all data points in the minibatch using the backbone SSL algorithm to avoid overfitting and information loss.Moreover, we use consistency regularization, a recent SSL technique for utilizing unlabeled data in a modified way, to train the ABC to be balanced among the classes by selecting unlabeled data with the same probability for each class. The proposed algorithm achieves state-of-the-art performance in various class-imbalanced SSL experiments using four benchmark datasets.

READ FULL TEXT VIEW PDF
07/17/2020

Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning

While semi-supervised learning (SSL) has proven to be a promising way fo...
03/10/2022

BASIL: Balanced Active Semi-supervised Learning for Class Imbalanced Datasets

Current semi-supervised learning (SSL) methods assume a balance between ...
12/08/2021

CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised Learning

In this paper, we propose a novel co-learning framework (CoSSL) with dec...
10/12/2020

Class-Weighted Evaluation Metrics for Imbalanced Data Classification

Class distribution skews in imbalanced datasets may lead to models with ...
05/01/2021

Semi-supervised Long-tailed Recognition using Alternate Sampling

Main challenges in long-tailed recognition come from the imbalanced data...
04/19/2022

Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for Imbalanced Learning

Class imbalance distribution widely exists in real-world engineering. Ho...
11/09/2021

A Topological Data Analysis Based Classifier

Topological Data Analysis (TDA) is an emergent field that aims to discov...

1 Introduction

Recently, numerous deep neural network (DNN)-based semi-supervised learning (SSL) algorithms have been proposed to improve the performance of DNNs by utilizing unlabeled data when only a small amount of labeled data is available. These algorithms have shown effective performance in various tasks. However, most existing SSL algorithms assume class-balanced datasets, whereas the class distributions of many real-world datasets are imbalanced. It is well known that classifiers trained on class-imbalanced data tend to be biased toward the majority classes. This issue can be more problematic for SSL algorithms that use predicted labels of unlabeled data for their training, because the labels predicted by the algorithm trained on class-imbalanced data become even more severely imbalanced

(NEURIPS2020_a7968b43). For example, Figure 1 (b) presents biased predictions of ReMixMatch (berthelot2019ReMixMatch)

, a recent SSL algorithm, trained on CIFAR-

-LT, which is a class-imbalanced dataset with the amount of Class 0 being 100 times more than that of Class 9, as depicted in Figure 1 (a). Although there are various class-imbalanced learning techniques, they are usually designed for labeled data, and thus cannot be simply combined with SSL algorithms under class-imbalanced SSL (CISSL) scenarios. Recently, a few CISSL algorithms have been proposed, but the CISSL problem is still underexplored.

 (a) Class-imbalanced training set      (b) ReMixMatch       (c) Proposed algorithm
Figure 1: Predictions on a class-balanced test set using ReMixMatch (b) and the proposed algorithm (c) trained on a class-imbalanced training set (a).

We propose a new CISSL algorithm that can effectively use unlabeled data, while mitigating class imbalance by using an existing DNN-based SSL algorithm (berthelot2019ReMixMatch; sohn2020FixMatch) as the backbone and introducing an auxiliary balanced classifier (ABC) of a single layer. The ABC is attached to a representation layer immediately preceding the classification layer of the backbone, based on the argument that a classification algorithm (i.e., backbone) can learn high-quality representations even if its classifier is biased toward the majority classes (kang2019decoupling). The ABC is trained to be balanced across all classes by using a mask that rebalances the class distribution, similar to re-sampling in previous SSL studies (barandela2003restricted; chawla2002smote; he2009learning; japkowicz2000class). Specifically, the mask stochastically regenerates a class-balanced subset of a minibatch on which the ABC is trained. The ABC is trained simultaneously with the backbone, so that the ABC can use high-quality representations learned from all data points in the minibatch using the backbone. In this way, the ABC can overcome the limitations of the previous resampling techniques, overfitting on minority-class data or loss of information on majority-class data (NEURIPS2019_621461af; NEURIPS2020_2ba61cc3).

Moreover, to place decision boundaries in low-density regions by utilizing unlabeled data, we use consistency regularization, a recent SSL technique, which enforces the classification outputs of two augmented or perturbed versions of the same unlabeled example to remain unchanged. In particular, we encourage the ABC to be balanced across classes when using consistency regularization by selecting unlabeled examples with the same probability for each class using a mask. Figure 1 (c) illustrates that compared to the results of ReMixMatch in Figure 1 (b), the class distribution of the predicted labels becomes more balanced using the proposed algorithm trained on the same dataset. Our experimental results under various scenarios demonstrate that the proposed algorithm achieves state-of-the-art performance. Through qualitative analysis and an ablation study, we further investigate the contribution of each component of the proposed algorithm. The code for the proposed algorithm is available at https://github.com/LeeHyuck/ABC.

2 Related Work

Semi-supervised learning (SSL) Recently, several SSL techniques that utilize unlabeled data have been proposed. Entropy minimization (NIPS2004_96f2b50b) encourages the classifier outputs to have low entropy for unlabeled data, as in pseudo-labels (lee2013pseudo). Mixup regularization (NEURIPS2019_1cd138d0; ijcai2019-504)

makes the decision boundaries farther away from the data clusters by encouraging the prediction for an interpolation of two inputs to be the same as the interpolation of the prediction for each input. Consistency regularization

(park2018adversarial; miyato2018virtual; NIPS2017_68053af2) encourages a classifier to produce similar predictions for perturbed versions of the same unlabeled input. To create perturbed unlabeled inputs, various data augmentation techniques have been used. For example, FixMatch (sohn2020FixMatch) and ReMixMatch (berthelot2019ReMixMatch) used strong augmentation methods such as Cutout (devries2017improved) and RandomAugment (cubuk2020randaugment). FixMatch and ReMixMatch are used as the backbone of the proposed algorithm; they are described in Section 3.2.

Class-imbalanced learning (CIL) As a popular approach for CIL, re-sampling techniques (japkowicz2000class; chawla2002smote; barandela2003restricted; he2009learning) balance the number of training samples for each class in the training set. As another popular approach, re-weighting techniques (NIPS2013_9aa42b31; huang2016learning; NIPS2017_147ebe63) re-weight the loss for each class by a factor inversely proportional to the number of data points belonging to that class. Although these approaches are simple, they have some drawbacks. For example, oversampling from minority classes can cause overfitting, whereas undersampling from majority classes can cause information loss (NEURIPS2019_621461af). In the case of re-weighting, gradients can be calculated to be abnormally large when the class imbalance is severe, resulting in unstable training (NEURIPS2019_621461af; an2021why). Many attempts have been made to alleviate these problems, such as effective re-weighting (cui2019class) and meta-learning-based re-weighting (ren2018learning; jamal2020rethinking). New forms of losses have also been proposed (NEURIPS2019_621461af; NEURIPS2020_2ba61cc3). In (yin2018feature; kim2020m2m), knowledge is transferred from the data of majority classes to the data of minority classes. These CIL algorithms were designed for labeled data and require label information; thus, they are not applicable to unlabeled data. In (kang2019decoupling), it was found that biased classification is mainly due to the classification layer and that a classification algorithm can learn meaningful representations even from a class-imbalanced training set. Based on this finding, we design the ABC to use high-quality representations learned from class-imbalanced data utilizing FixMatch (sohn2020FixMatch) and ReMixMatch (berthelot2019ReMixMatch).

Class-imbalanced semi-supervised learning (CISSL) There have been few studies on CISSL. In (NEURIPS2020_e025b627), it was found that more accurate decision boundaries can be obtained in class-imbalanced settings through self-supervised learning and semi-supervised learning. DARP (NEURIPS2020_a7968b43) refines biased pseudo-labels by solving a convex optimization problem. CReST (wei2021crest), a recent self-training technique, mitigates class imbalance by using pseudo-labeled unlabeled data points classified as minority classes with a higher probability than those classified as majority classes.

3 Methodology

3.1 Problem setting

Suppose that we have a labeled dataset , where is the th labeled data point and is the corresponding label. We also have an unlabeled dataset , where is the th unlabeled data point. We express the ratio of the amount of labeled data as . Generally, , because label acquisition is costly and laborious. We denote the number of labeled data points of class as , i.e., , and assume that the classes are sorted according to cardinality in descending order, i.e., . We denote the ratio of the class imbalance as . Under class-imbalanced scenarios, . Following previous CIL studies, we define the half of the classes containing a large amount of data as the majority classes, and the other half of the classes, containing a small amount of data, as the minority classes. Following wei2021crest, we assume that and share the same class distribution, i.e., the labeled and unlabeled datasets are class-imbalanced to the same extent. From and , we generate minibatches and for each iteration of training, where is the minibatch size. Using these minibatches for training, we aim to learn a model that performs effectively on a class-balanced test set.

3.2 Backbone SSL algorithm

We attach the ABC to the backbone’s representation layer, so that it can utilize the high-quality representations learned by the backbone. We use FixMatch (sohn2020FixMatch) or ReMixMatch (berthelot2019ReMixMatch) as the backbone, as these two have achieved state-of-the-art SSL performance. FixMatch uses the classification loss calculated from the weakly augmented labeled data point generated by flipping and cropping the image, and the consistency regularization loss calculated from the weakly augmented unlabeled data point and strongly augmented unlabeled data point generated by Cutout (devries2017improved) and RandomAugment (cubuk2020randaugment). ReMixMatch predicts the class label of the weakly augmented unlabeled data point using distribution alignment and sharpening, and assigns the predicted label to the strongly augmented unlabeled data point . These strongly augmented unlabeled data point and strongly augmented labeled data point are used to conduct mixup regularization. ReMixMatch also conducts consistency regularization in a manner similar to FixMatch and self-supervised learning using the rotation of the image (gidaris2018unsupervised; zhai2019s4l). FixMatch and ReMixMatch have greatly improved the SSL performance by learning high-quality representations using strong data augmentation. However, these algorithms can be significantly biased toward the majority classes in class-imbalanced settings.

Using FixMatch and ReMixMatch as the backbone of the proposed algorithm, we ensure that the ABC enjoys high-quality representations learned by the backbone, while replacing the backbone’s biased classifier. To train the ABC, we reuse the weakly augmented data and strongly augmented data used by the backbone to decrease the computational cost. Although we use FixMatch and ReMixMatch as the backbone in this study, the ABC can also be combined with other DNN-based SSL algorithms, as long as they use weakly augmented data and strongly augmented data.

3.3 ABC for class-imbalanced Semi-supervised learning

To train the ABC to be balanced, we first generate mask for each labeled data point

using a Bernoulli distribution

with the parameter set to be inversely proportional to the number of data points of each class. This setting makes generate mask 1 with high probability for the data points in the minority classes, but with low probability for those in the majority classes. Then, the classification loss is multiplied by the generated mask, so that the ABC can be trained with a balanced classification loss. Multiplying the classification loss by the mask can be interpreted as oversampling of the data points in the minority classes, whereas it can be interpreted as undersampling of those in the majority classes. In representation learning, oversampling and undersampling techniques have shown overfitting and information loss problems, respectively. In contrast, the ABC can overcome these problems because it uses the representations learned by the backbone, which is trained on all data points in the minibatch. The use of the mask to construct the balanced loss, instead of directly creating a balanced subset, allows the backbone and the ABC to be trained from the same minibatches. Therefore, the representations of minibatches calculated for training the backbone can be used again for training the ABC. Consequently, the proposed algorithm only requires a slightly increased time cost compared to training the backbone alone. This is confirmed in Section 4.3. The overall procedure of balanced training with mask for the ABC attached to a representation layer of the backbone is presented in Figure 2. The classification loss for the ABC, , with mask is expressed as

(1)
(2)

where is the standard cross-entropy loss, is an augmented labeled data point, is the predicted class distribution using the ABC for , and is the one-hot label for .

Figure 2: Overall procedure for balanced training of the ABC with a mask

3.4 Consistency regularization for ABC

To increase the margin between the decision boundary and the data points using unlabeled data, we conduct consistency regularization for the ABC, similar to the way in FixMatch. Specifically, we first obtain the predicted class distribution for a weakly augmented unlabeled data point using the ABC and use it as a soft pseudo-label . Then, for two strongly augmented unlabeled data points and , we train the ABC to produce their predicted class distributions, and , to be close to .

In class-imbalanced settings, because most unlabeled data points belong to majority classes, most weakly augmented unlabeled data points can be predicted as the majority classes. Then, consistency regularization would be conducted with a higher frequency for the majority classes, which can cause a classifier to be biased toward the majority classes. To prevent this issue, we conduct consistency regularization in a modified manner that is suitable for class-imbalance problems. Specifically, whereas FixMatch minimizes entropy by converting the predicted class distribution for a weakly augmented data point into a one-hot pseudo-label, we directly use the predicted class distribution as a soft pseudo-label. We do not pursue entropy minimization for the ABC because it can accelerate biased classification toward certain classes. Moreover, we once again generate mask for each unlabeled data point based on a soft pseudo label , and multiply the consistency regularization loss for by the generated mask, so that the ABC can be trained with a class-balanced consistency regularization loss. Note that existing resampling techniques are not applicable to unlabeled data, because they require a label for each data point. In contrast, we make it possible to resample unlabeled data by using the soft pseudo-label and the mask. The consistency regularization loss, , with mask is expressed as

(3)
(4)

where is the indicator function, is the highest predicted assignment probability for any class, representing the confidence of prediction, and is the confidence threshold. To avoid the unwanted effects of inaccurate soft pseudo-labels during consistency regularization, we only use the weakly augmented unlabeled data points whose confidence is higher than the threshold , similar to that in FixMatch. To take full advantage of few unlabeled data points with prediction confidence values that are higher than the confidence threshold in the early stage of training, we gradually decrease the parameter of the Bernoulli distribution for from 1 to , where is the one-hot pseudo-label obtained from . Following previous studies (berthelot2019ReMixMatch; miyato2018virtual; sohn2020FixMatch; NEURIPS2019_1cd138d0)

, we do not backpropagate gradients for pseudo-label prediction. The overall procedure for consistency regularization for the ABC is shown in Appendix A.

3.5 End-to-end training

Unlike a recent CIL trend to finetune a classifier in a balanced manner after representation learning is completed (i.e., decoupled learning of representations and a classifier) (kang2019decoupling; NEURIPS2020_2ba61cc3), we obtain a balanced classifier by training the proposed algorithm end-to-end. We train the proposed algorithm with the sum of losses from Sections 3.3 and 3.4, and the loss for the backbone,

. The total loss function

is expressed as

(5)

Whereas we use the sum of the losses for the backbone and ABC for training the proposed algorithm, we predict the class labels of new data points using only the ABC. In our experiments in Sections 4.4 and 4.5, we show that the proposed algorithm trained end-to-end produces better performance than competing algorithms with decoupled learning of representations and a classifier, and we analyze possible reasons. We present the pseudo code of the proposed algorithm in Appendix B.

4 Experiments

4.1 Experimental setup

We created class-imbalanced versions of CIFAR-10, CIFAR-100 krizhevsky2009learning

, and SVHN

(netzer2011reading) datasets to conduct experiments under various ratios of class imbalance and various ratios of the amount of labeled data . For class-imbalance types, we first consider long-tailed (LT) imbalance in which the number of data points exponentially decreases from the largest to the smallest class, i.e., , where . We also consider step imbalance (buda2018systematic) in which the whole majority classes have the same amount of data and the whole minority classes also have the same amount of data. Two types of class imbalance for the considered datasets are illustrated in Appendix C. For the main setting, we set , , and for CIFAR- and SVHN, and , and for CIFAR-. Similar to (NEURIPS2020_a7968b43), we set of CIFAR-100 to be relatively small because CIFAR- has only 500 training data points for each class. To evaluate the performance of the proposed algorithm on large-scale datasets, we also conducted experiments on 7.5M data points of 256 by 256 images from the LSUN dataset (yu15lsun).

We compared the performance of the proposed algorithm with that of various baseline algorithms. Specifically, we considered the following baseline algorithms:

  • Deep CNN (vanilla algorithm): This is trained on only labeled data with the cross-entropy loss.

  • BALMS (NEURIPS2020_2ba61cc3) (CIL algorithm): This state-of-the-art CIL algorithm does not use unlabeled data.

  • VAT (miyato2018virtual), ReMixMatch (berthelot2019ReMixMatch), and FixMatch (sohn2020FixMatch) (SSL algorithms): These are state-of-the-art SSL algorithms, but do not consider class imbalance.

  • FixMatch+CReST+PDA and ReMixMatch+CReST+PDA (CISSL algorithms): CReST+PDA (wei2021crest) mitigates class imbalance by using unlabeled data points classified as the minority classes with a higher probability than those classified as the majority classes.

  • ReMixMatch+DARP and FixMatch+DARP (CISSL algorithms): These algorithms use DARP (NEURIPS2020_a7968b43) to refine the pseudo labels obtained from ReMixMatch or FixMatch.

  • ReMixMatch+DARP+cRT and FixMatch+DARP+cRT (CISSL algorithms): Compared to ReMixMatch+DARP and FixMatch+DARP, these algorithms finetune the classifier using cRT (kang2019decoupling).

For the structure of the deep CNN used in the proposed and baseline algorithms, we used Wide ResNet-- (zagoruyko2016wide). We trained the proposed algorithm for iterations with a batch size of 64. The confidence threshold was set to based on experiments with various values of in Appendix D. We used the Adam optimizer (DBLP:journals/corr/KingmaB14) with a learning rate of , and used Cutout (devries2017improved) and RandomAugment (cubuk2020randaugment) for strong data augmentation, following (NEURIPS2020_a7968b43). Similar to (berthelot2019ReMixMatch; NEURIPS2019_1cd138d0), we evaluated the performance of the proposed algorithm using an exponential moving average of the parameters over iterations with a decay rate of , instead of scheduling the learning rate. In Tables 1-5, we used the overall accuracy and the accuracy only for minority classes as performance measures. We repeated the experiments five times under the main setting, and three times under the step imbalance and other settings of and

. We report the average and standard deviation of the performance measures over repeated experiments. For the vanilla algorithm, FixMatch+DARP+cRT, and ReMixMatch+DARP+cRT, which suffered from overfitting, we measured performance every 500 iterations and recorded the best performance. Further details of the experimental setup are described in Appendix E.

4.2 Experimental results

The performance of the competing algorithms under the main setting are summarized in Table 1. We can observe that the proposed algorithm achieved the highest overall performance, with improved performance for minority classes. Interestingly, VAT, an SSL algorithm, showed similar performance to the vanilla algorithm, and worse performance than BALMS, a CIL algorithm. Similarly, FixMatch and ReMixMatch, which do not consider class imbalance, showed poor performance for minority classes. Although BALMS mitigated class imbalance, it produced poor overall performance, as it did not use unlabeled data for training. This demonstrates the importance of using unlabeled data for training, even in the class-imbalanced setting. FixMatch+CReST+PDA and ReMixMatch+CReST+PDA mitigated class imbalance by using unlabeled data points classified as the minority classes with a higher probability, but produced lower performance than the proposed algorithm. This may be because even if all unlabeled data points classified as minority classes are additionally used for training, their amount is still less than that of the data in majority classes, while the proposed algorithm uses class-balanced minibatches by generating the mask. Fixmatch+DARP and ReMixMatch+DARP slightly mitigated class imbalance by refining biased pseudo-labels, but resulted in lower performance than the proposed algorithm. This may be because even perfect pseudo labels cannot change the underlying class-imbalanced distribution of the training data. By additionally using a rebalancing technique cRT, FixMatch(ReMixMatch)+DARP+cRT performed better than FixMatch(ReMixMatch)+DARP. However, FixMatch(ReMixMatch)+DARP+cRT still performed worse than FixMatch(ReMixMatch)+ABC, although it also uses high-quality representations learned by FixMatch(ReMixMatch) and techniques for mitigating class imbalance. The superior performance of FixMatch(ReMixMatch)+ABC over FixMatch(ReMixMatch)+DARP+cRT is probably because FixMatch(ReMixMatch)+ABC was trained end-to-end, and the ABC was also trained using unlabeled data. We discuss this in more detail in Sections 4.4 and 4.5

. Overall, the algorithms combined with ReMixMatch performed better than the algorithms combined with FixMatch. In addition to the overall accuracy and minority-class-accuracy, we also compared the performance of the competing algorithms in terms of the geometric mean (G-mean) of class-wise accuracy under the main setting in Appendix F.

CIFAR--LT SVHN-LT CIFAR--LT
Algorithm , , ,
Vanilla / / /
VAT (miyato2018virtual) / / /
BALMS (NEURIPS2020_2ba61cc3) / / /
FixMatch (sohn2020FixMatch) / / /
w/ CReST+PDA (wei2021crest) / / /
w/ DARP (NEURIPS2020_a7968b43) / / /
w/ DARP+cRT (NEURIPS2020_a7968b43) / / /
w/ ABC 81.1 / 72.0 92.0 / 87.9 56.3 / 43.4
ReMixMatch (berthelot2019ReMixMatch) / / /
w/ CReST+PDA (wei2021crest) / / /
w/ DARP (NEURIPS2020_a7968b43) / / /
w/ DARP+cRT (NEURIPS2020_a7968b43) / / /
w/ ABC 82.4 / 75.7 93.9 / 92.5 57.6 / 46.7
Table 1: Overall accuracy/minority-class-accuracy under the main setting

To evaluate the performance of the proposed algorithm in various settings, we conducted experiments using ReMixMatch, FixMatch, and the CISSL algorithms considered in Table 1, while changing the ratio of class imbalance and the ratio of the amount of labeled data . The results for CIFAR- are presented in Table 2, and the results for SVHN and CIFAR- are presented in Appendix G. In Table 2, we can observe that the proposed algorithm achieved the highest overall accuracy with greatly improved performance for minority classes for all settings. Because FixMatch+DARP+cRT and ReMixMatch+DARP+cRT do not use unlabeled data for classifier tuning, the difference in performance between FixMatch(ReMixMatch)+DARP+cRT and the proposed algorithm increased as the ratio of the amount of labeled data decreased and as the ratio of class imbalance increased. In addition, the difference in performance between FixMatch(ReMixMatch)+CReST+PDA and the proposed algorithm tended to increase as the ratio of class imbalance increased, because the difference between the number of labeled data points belonging to the majority classes and the number of unlabeled data points classified as the minority classes increases with .

CIFAR--LT
Algorithm , , , ,
FixMatch (sohn2020FixMatch) / / / /
w/ CReST+PDA (wei2021crest) / / / /
w/ DARP+cRT (NEURIPS2020_a7968b43) / / / /
w/ ABC 77.2 / 65.7 81.5 / 72.9 85.2 / 80.2 77.1 / 64.4
ReMixMatch (berthelot2019ReMixMatch) / / / /
w/ CReST+PDA (wei2021crest) / / / /
w/ DARP+cRT (NEURIPS2020_a7968b43) / / / /
w/ ABC 79.8 / 70.8 84.3 / 80.6 87.5 / 84.6 80.6 / 72.1
Table 2: Overall accuracy/minority-class accuracy for CIFAR- under various settings

We also conducted experiments under a step-imbalance setting, where the class imbalance was more noticeable. This setting assumes a more severely imbalanced class distribution than the LT imbalance settings, because half of the classes have very scarce data. The experimental results for CIFAR- are presented in Table 3, and the results for SVHN and CIFAR- are presented in Appendix H. In Table 3, we can see that the proposed algorithm achieved the best performance, and the performance margin is greater than that of the LT imbalance settings. ReMixMatch+CReST+PDA showed relatively low performance compared to the other algorithms.

CIFAR--Step,,
Algorithm w/ - w/ CReST+PDA (wei2021crest) w/ DARP+cRT (NEURIPS2020_a7968b43) w/ ABC
FixMatch (sohn2020FixMatch) / / / 75.9 / 57.0
ReMixMatch (berthelot2019ReMixMatch) / / / 76.4 / 65.7
Table 3: Overall accuracy/minority-class accuracy on CIFAR- under a step imbalance setting

To evaluate the performance of the proposed algorithm on a large-scale dataset, we also conducted experiments on the LSUN dataset (yu15lsun), which is naturally a long-tailed dataset. Among the algorithms considered in Tables 2 and 3, those combined with CReST were excluded for comparison, because CReST requires loading of the whole unlabeled data in the repeated process of updating pseudo-labels, which is not possible for the large-scale LSUN dataset. Instead, we additionally considered FixMatch+cRT and ReMixMatch+cRT for comparison. The experimental results are presented in Table 4. The proposed algorithm showed better performance than the other baseline algorithms. DARP resulted in degradation of the performance, possibly because the scale of the LSUN dataset is very large. Specifically, DARP solves a convex optimization with all unlabeled data points to refine the pseudo labels. As the scale of the unlabeled dataset increases, this optimization problem becomes more difficult to solve and, consequently, the pseudo-labels could be refined inaccurately. Unlike the results for other datasets, the algorithms combined with FixMatch performed better than the algorithms combined with ReMixMatch.

LSUN, ,
Algorithm w/ - w/ cRT (kang2019decoupling) w/ DARP (NEURIPS2020_a7968b43) w/ DARP+cRT (NEURIPS2020_a7968b43) w/ ABC
FixMatch (sohn2020FixMatch) / / / / 78.9 / 75.5
ReMixMatch (berthelot2019ReMixMatch) / / / / 76.9 / 69.5
Table 4: Overall accuracy/minority-class accuracy for the large-scale LSUN dataset

4.3 Complexity of the proposed algorithm

The proposed algorithm requires additional parameters for the ABC, but the number of the additional parameters is negligible compared to the number of parameters of the backbone. For example, the ABC additionally required only and of the number of backbone parameters for CIFAR- with classes and CIFAR- with classes, respectively. Moreover, because the ABC shares the representation layer of the backbone, it does not significantly increase the memory usage and training time. Furthermore, we could train the proposed algorithm on the large-scale LSUN dataset without a significant increase in computation cost, because the entire training procedure could be carried out using minibatches of data. In contrast, the algorithms combined with DARP required convex optimization for all pseudo-labels, which significantly increased the computation cost as the number of classes or the amount of data increased. Similarly, it required significant time to train the algorithms combined with CReST, because CReST requires iterative re-training with a labeled set expanded by adding unlabeled data points with pseudo-labels. We present the floating point operations per second (FLOPS) for each algorithm using Nvidia Tesla-V100 in Appendix I.

4.4 Qualitative analysis of high-quality representations and balanced classification

The ABC can use high-quality representations learned by the backbone when performing balanced classification. To verify this, in Figure 3, we present t-distributed stochastic neighbor embedding (t-SNE) (van2008visualizing) of the representations of the CIFAR- test set learned by the ABC (without SSL backbone), FixMatch+ABC, and ReMixMatch+ABC on CIFAR--LT under the main setting. Different colors indicate different classes. As expected, “ABC (without SSL backbone)" failed to learn class-separable representations because sufficient data were not used for training while using the mask. In contrast, by training the backbone (FixMatch or ReMixMatch) together with the ABC, the proposed algorithm could use the entire data and learn high-quality representations. In this example, ReMixMatch produced more separable representations than FixMatch, which shows that the choice of the backbone affects the performance of the proposed algorithm, as expected.

                   
 (a)ABC (without SSL backbone)           (b)FixMatch+ABC          (c) ReMixMatch+ABC
Figure 3: t-SNE of the proposed algorithm and the ABC (without SSL backbone)

The proposed algorithm can also mitigate class imbalance by using the ABC. To verify this, we compare the confusion matrices of the predictions on the test set of CIFAR- using ReMixMatch, ReMixMatch+DARP+cRT, and ReMiMatch+ABC trained on CIFAR- under the main setting in Figure 4. In the confusion matrices, the value in the th row and the th column represents the ratio of the amount of data belonging to the th class to the amount of data predicted as the th class. Each cell has a darker red color when the ratio is larger. We can see that ReMixMatch often misclassified data points in the minority classes (e.g., classes and into classes and ). This may be because ReMixMatch does not consider class imbalance, and thus biased pseudo-labels were used for training. ReMixMatch+DARP+cRT produced a more balanced class-distribution compared to ReMixMatch by additionally using DARP+cRT. However, a significant number of data points in the minority classes were still misclassified as majority classes. In contrast, ReMixMatch+ABC classified the test data points in the minority classes with higher accuracy, and produced a significantly more balanced class distribution than ReMixMatch+DARP+cRT, as shown in Figure 4 (c). As both ReMixMatch+DARP+cRT and ReMixMatch+ABC use ReMixMatch to learn representations, the performance gap between these two algorithms results from the different characteristics of the ABC versus DARP+cRT as follows. First, DARP+cRT does not use unlabeled data for training its classifier after representations learning is completed, whereas the ABC uses unlabeled data with unbiased pseudo-labels for its training. Second, whereas DARP+cRT decouples the learning of representations and training of a classifier, the ABC is trained end-to-end interactively with representations learned by the backbone. We also present the confusion matrices of the predictions on the test set of CIFAR- using FixMatch, FixMatch+DARP+cRT, and FixMatch+ABC as well as the confusion matrices of the pseudo-labels on the same dataset using ReMixMatch, ReMixMatch+DARP+cRT, ReMixMatch+ABC, FixMatch, FixMatch+DARP+cRT, and FixMatch+ABC in Appendix J. Moreover, we compare the ABC and the classifier of DARP+cRT in more detail using the validation loss plots in Appendix K.

 (a)ReMixMatch (b)ReMixMatch+DARP+cRT (c) ReMixMatch+ABC
Figure 4: Confusion matrices of the predictions on the test set of CIFAR-

4.5 Ablation study

We conducted an ablation study on CIFAR--LT in the main setting to investigate the effect of each element of the proposed algorithm. The results for ReMixMatch+ABC are presented in Table 5, where each row indicates the proposed algorithm with the described conditions in that row. The results are summarized as follows. 1) If we did not gradually decrease the parameter of the Bernoulli distribution when conducting consistency regularization, then an overbalance problem occurred because of unlabeled data misclassified as minority classes. 2) Without consistency regularization for the ABC, the decision boundary did not clearly separate each class. 3) Without using the mask for and , the ABC was trained to be biased toward the majority classes. 4) Without confidence threshold for consistency regularization, training became unstable and, consequently, the ABC was trained to be biased toward certain classes. 5) Similarly, if hard pseudo-labels, instead of soft pseudo-labels, were used for consistency regularization, then the ABC was biased toward certain classes. 6) If the ABC was solely used without the backbone, the performance decreased because the ABC could not use high-quality representations learned by the backbone. 7) When we used a re-weighting technique (he2009learning) instead of a mask for the ABC, training became unstable because of abnormally large gradients calculated for training on the data of the minority classes. 8) The decoupled training of the backbone and ABC resulted in decreased classification performance, as was also analyzed in Section 4.4. Similarly, we present the results of the ablation study for FixMatch+ABC in Appendix L.

Ablation study Overall Minority
ReMixMatch+ABC (proposed algorithm)
Without gradually decreasing the parameter of for consistency regularization
Without consistency regularization for the ABC
Without using the 0/1 mask for the consistency regularization loss
Without using the 0/1 mask for the classification loss
Without using the confidence threshold for consistency regularization
Using hard pseudo labels for consistency regularization
Without training backbone (ABC without SSL backbone)
Training the ABC with a re-weighting technique
Decoupled training of the backbone and ABC
Table 5: Ablation study for ReMixMatch+ABC on CIFAR--LT, ,

5 Conclusion

We introduced the ABC, which is attached to a state-of-the-art SSL algorithm, for CISSL. The ABC can utilize high-quality representations learned by the backbone, while being trained to make class-balanced predictions. The ABC also utilizes unlabeled data by conducting consistency regularization in a modified way for class-imbalance problems. The experimental results obtained under various settings demonstrate that the proposed algorithm outperforms the baseline algorithms. We also conducted a qualitative analysis and an ablation study to verify the contribution of each element of the proposed algorithm. The proposed algorithm assumes that the labeled and unlabeled data are class-imbalanced to the same extent. In the future, we plan to release this assumption by adopting a module for estimating class distribution. Deep learning algorithms can be applied to many societal problems. However, if the training data are imbalanced, the algorithms could be trained to make socially biased decisions in favor of the majority groups. The proposed algorithm can contribute to solving these issues. However, there is also a potential risk that the proposed algorithm could be used as a tool to identify minorities and discriminate against them. It should be ensured that the proposed method cannot be used for any purpose that may have negative social impacts.

6 Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2018R1C1B6004511, 2020R1A4A10187747).

References

Appendix A Overall procedure of consistency regularization for ABC

Figure 5 illustrates the overall procedure of consistency regularization for the ABC. Detailed procedure is described in Section 3.4 of the main paper.

Figure 5: Overall procedure of consistency regularization for the ABC

Appendix B Pseudo code of the proposed algorithm

The pseudo code of the proposed algorithm is presented in Algorithm 1. The for loop (lines 2 14) can be run in parallel. The classification loss and consistency regularization loss are expressed in detail in Sections 3.3 and 3.4 of the main paper.

Input: ,
      Output: Classification model
      Parameters : (Parameters of Wide ResNet-- and ABC)

1:while Training do
2:     for  to  do
3:          Augment
4:          WeakAugment
5:         StrongAugment,
6:         Predicted class distribution for
7:         Generate mask .
8:         Soft pseudo label
9:         if  then
10:              Predicted class distribution for
11:              Generate mask .
12:         end if
13:         Loss from the backbone += backbone
14:     end for
15:     Calculate the classification loss and consistency regularization loss .
16:     Total Loss
17:     ,
18:end while
Algorithm 1 Pseudo code of the proposed algorithm

Appendix C Two types of class imbalance for the considered datasets

         
 (a) Long-tailed imbalance           (b) Step imbalance
Figure 6: Long-tailed imbalance and step imbalance

Two types of class imbalance for the considered datasets are illustrated in Figure 6. For both types of imbalance, we set , , and . In Figure 6 (b), we can see that each minority class has a very small amount of data. Existing SSL algorithms can be significantly biased toward majority classes under step imbalanced settings.

Appendix D Specification of the confidence threshold

ReMixMatch+ABC CIFAR-10-LT,,
1 0.98 0.95 0.9 0.85 0.8 0.75 0.7
Mean and STD 78.9, 0.36 81.8, 0.34 82.3, 0.2 81.3, 0.32 81.5, 0.39 81.2, 0.63 80.0, 2.87 79.0, 5.76
Table 6:

Mean and standard deviation (STD) of validation accuracy during the last 50 epochs

In general, the confidence threshold should be set high enough, but not too high. If is low, training becomes unstable because many misclassified unlabeled data points would be used for training. However, if is too high, most of the unlabeled data points would not be used for consistency regularization. Based on these insights, we set as 0.95 in our experiments. We confirmed via experiments that this value of enabled high accuracy as well as stability. Specifically, we conducted experiments on CIFAR--LT for the main setting while changing the value of . We measured the validation accuracy of ReMixMatch+ABC during the last 50 epochs (1 epoch=500 iterations) of training and calculated the mean and standard deviation (STD) of these values. As can be seen from Table 6, the proposed algorithm achieved the highest mean and lowest STD of the validation accuracy when was 0.95. When was set higher or lower than 0.95, the mean of the validation accuracy decreased. In particular, as the value of decreased from 0.95, the STD increased rapidly, indicating instability of the training.

Appendix E Further details of the experimental setup

We describe further details of the experimental setup. To train the ReMixMatch, we gradually increased the coefficient of the loss associated with the unlabeled data points, following (NEURIPS2020_a7968b43). We found that without this gradual increase, the validation loss of the ReMixMatch did not converge. To train the FixMatch, we used the labeled dataset once more as an unlabeled dataset by removing the labels for the experiments using CIFAR- following the previous study (sohn2020FixMatch), but not for the experiments using CIFAR- and SVHN, because it did not improve the performance. We followed the default settings for the ReMixMatch (berthelot2019ReMixMatch) and FixMatch (sohn2020FixMatch), unless mentioned otherwise.

To train the ABC, we also gradually decreased the parameter of for calculating the classification loss in the experiments using CIFAR- and SVHN under the step imbalanced setting. This prevents unstable training by allowing each labeled data point of the majority classes to be more frequently used for training.

Appendix F Geometric mean (G-mean) of class-wise accuracy under the main setting

CIFAR--LT SVHN-LT CIFAR--LT
Algorithm , , ,
FixMatch (sohn2020FixMatch)
w/ CReST+PDA (wei2021crest)
w/ DARP (NEURIPS2020_a7968b43)
w/ DARP+cRT (NEURIPS2020_a7968b43)
w/ ABC 80.5 91.8 49.0
ReMixMatch (berthelot2019ReMixMatch)
w/ CReST+PDA (wei2021crest)
w/ DARP (NEURIPS2020_a7968b43)
w/ DARP+cRT (NEURIPS2020_a7968b43)
w/ ABC 81.9 93.8 50.8
Table 7: Performance comparison using G-mean for the main setting

To evaluate whether the proposed algorithm performs in a balanced way for all classes, we also measured the performance for the main setting using the geometric mean (G-mean) of class-wise accuracy with correction to avoid zeroing. We set the hyperparameter for the correction to avoid zeroing as 1

, which indicates that the minimum class-wise accuracy is 1. The results in Table 7 demonstrates that the proposed algorithm performed in a balanced way.

Appendix G Experimental results on SVHN and CIFAR- under various settings

For the experiments using SVHN with and , the solution of the convex optimization problem of ReMixMatch+DARP+cRT for refining the pseudo labels did not converge, and thus we could not measure the performance. The experimental results for SVHN and CIFAR- under various settings showed the same trend as those for CIFAR-, which is described in Section 4.2 of the main paper.

SVHN-LT
Algorithm , , , ,
FixMatch (sohn2020FixMatch) / / / /
w/ CReST + PDA (wei2021crest) / / / /
w/ DARP + cRT (NEURIPS2020_a7968b43) / / / /
w/ ABC 92.3 / 88.7 92.3 / 88.3 93.5 / 90.7 91.2 / 86.2
ReMixMatch (berthelot2019ReMixMatch) / / / /
w/ CReST + PDA (wei2021crest) / / / /
w/ DARP + cRT (NEURIPS2020_a7968b43) / / / /
w/ ABC 93.2 / 92.2 94.4 / 93.3 94.7 / 93.5 93.2 / 91.8
Table 8: Overall accuracy/minority-class-accuracy on SVHN under various settings
CIFAR--LT
Algorithm , , , ,
FixMatch (sohn2020FixMatch) / / / /
w/ CReST + PDA (wei2021crest) / / / /
w/ DARP + cRT (NEURIPS2020_a7968b43) / / / /
w/ ABC 49.7 / 34.6 58.3 / 46.7 61.6 / 53.0 53.6 / 38.8
ReMixMatch (berthelot2019ReMixMatch) / / / /
w/ CReST + PDA (wei2021crest) / / / /
w/ DARP + cRT (NEURIPS2020_a7968b43) / / / /
w/ ABC 52.5 / 38.5 59.3 / 49.5 63.5 / 57.1 55.4 / 42.8
Table 9: Overall accuracy/minority-class-accuracy on CIFAR- under various settings

Appendix H Experimental results on SVHN and CIFAR- under the step imbalanced setting

Experimental results for SVHN and CIFAR- under the step imbalanced setting showed the same tendency as that for CIFAR-, which is described in Section 4.2 of the main paper.

SVHN-Step,,
Algorithm w/ - w/ CReST + PDA (wei2021crest) w/ DARP + cRT (NEURIPS2020_a7968b43) w/ ABC
FixMatch (sohn2020FixMatch) / / / 91.2 / 85.6
ReMixMatch (berthelot2019ReMixMatch) / / / 91.3 / 89.8
Table 10: Overall accuracy/minority-class-accuracy on SVHN under step imbalanced setting
CIFAR--Step,,
Algorithm w/ - w/ CReST + PDA (wei2021crest) w/ DARP + cRT (NEURIPS2020_a7968b43) w/ ABC
FixMatch (sohn2020FixMatch) / / / 54.7 / 32.1
ReMixMatch (berthelot2019ReMixMatch) / / / 56.0 / 38.3
Table 11: Overall accuracy/minority-class-accuracy on CIFAR- under step imbalanced setting

Appendix I Floating point operations per second (FLOPS) of each algorithm

As we mentioned in Section 4.3 of the main paper, computation cost required for the algorithms combined with DARP increased as the number of classes or the amount of data increased. In contrast, computation cost required for the proposed algorithm did not significantly increased because the whole training procedure can be carried out using minibatches. FLOPS of FixMatch+CReST and ReMixMatch+CReST are the same as those of FixMatch and ReMixMatch, but the algorithms combined with CReST required iterative re-training with a labeled set expanded by adding unlabeled data points with pseudo labels. We measured FLOPS using Nvidia Tesla-V100. For the experiments on CIFAR- and CIFAR-, we used only one GPU, whereas we used four GPUs in parallel for the experiments on LSUN.

Algorithm CIFAR- CIFAR- LSUN
FixMatch (sohn2020FixMatch) iter/sec iter/sec iter/sec
FixMatch+DARP (NEURIPS2020_a7968b43) iter/sec iter/sec iter/sec
FixMatch+ABC iter/sec iter/sec iter/sec
ReMixMatch (berthelot2019ReMixMatch) iter/sec iter/sec iter/sec
ReMixMatch+DARP (NEURIPS2020_a7968b43) iter/sec iter/sec iter/sec
ReMixMatch+ABC iter/sec iter/sec iter/sec
Table 12: FLOPS of each algorithm

Appendix J Further qualitative analysis and quantitative comparison

Figure 7 (b) presents biased predictions of FixMatch (sohn2020FixMatch), a recent SSL algorithm, trained on CIFAR- with the amount of Class 0 being 100 times more than that of Class 9 as depicted in Figure 7 (a). In contrast, Figure 7 (c) presents that the class distribution of the predicted labels became more balanced using the FixMatch+ABC trained on the same dataset. These results are consistent with those in Figure 1 of the main paper.

 (a) Class-imbalanced training set      (b) FixMatch       (c) FixMatch+ABC
Figure 7: Predictions on a class-balanced test set of CIFAR- using FixMatch (b) and the FixMatch+ABC (c) trained on a class-imbalanced training set (a).

Because the use of the mask for the ABC plays a similar role of re-sampling techniques, we compare the representations of proposed algorithm with those of SMOTE (oversampling technique) (chawla2002smote)+CNN, and random undersampling (he2009learning)+CNN. Figure 8 (a), (b) and (c) present the t-SNE representations obtained using SMOTE+CNN, undersampling+CNN, and ABC only. Because re-sampling techniques can only be applied to labeled data, they cannot be combined with the SSL algorithms, and thus they were combined with CNN instead. SMOTE+CNN and undersampling+CNN learned less separable representations than the ABC only. These results show that using the mask instead of re-sampling techniques is more effective because we could utilize unlabeled data. In addition, the 0/1 mask enabled the ABC to be combined with the backbone, so that the ABC could use the high-quality representations learned by the backbone as shown in Figure 8 (d).

 (a) SMOTE (chawla2002smote)+CNN (b) Undersampling (he2009learning)+CNN (c) ABC only (d) ReMixMatch+ABC
Figure 8: t-SNE of the representations of the CIFAR- test set using re-sampling+CNN, ABC only, and ReMixMatch+ABC trained on CIFAR--LT, ,

We also compared the performance of the proposed algorithm with those of SMOTE+CNN and undersampling+CNN. The results in Table 13 show the importance of using unlabeled data for training and using the high-quality representations obtained from backbone.

Performance of each algorithm in Figure 8 and FixMatch+ABC Overall Minority
ReMixMatch+ABC
FixMatch+ABC
Without training backbone (ABC only)
SMOTE+CNN
Undersampling+CNN
Table 13: Performance of each algorithm in Figure 8 and FixMatch+ABC. The algorithms were trained on CIFAR--LT, , and tested on the test set of CIFAR-.

Figure 9 presents the confusion matrices of FixMatch, FixMatch+DARP+cRT, and FixMatch+ABC trained on CIFAR--LT, , . Similar to Figure 4 of the main paper, FixMatch and FixMatch+DARP+cRT often misclassified test data points in the minority classes (e.g., classes and into classes and ). In contrast, FixMatch+ABC classified the test data points in the minority classes with higher accuracy, and produced a significantly more balanced class-distribution than FixMatch and FixMatch+DARP+cRT.

 (a)FixMatch (b)FixMatch+DARP+cRT (c) FixMatch+ABC
Figure 9: Confusion matrices of the predictions on the test set of CIFAR-

Figure 10 presents the confusion matrices of the predictions on the unlabeled data. Similar to Figure 9 and Figure 4 of the main paper, FixMatch+ABC and ReMixMatch+ABC classified the unlabeled data points in the minority classes with higher accuracy, and produced a significantly more balanced pseudo labels than other algorithms. By using these balanced pseudo labels for training, the proposed algorithm could make a more balanced prediction on the test set.

 (a)FixMatch (b)FixMatch+DARP+cRT (c) FixMatch+ABC
 
 (d)ReMixMatch (e)ReMixMatch+DARP+cRT (f) ReMixMatch+ABC
Figure 10: Confusion matrices of the predictions on the unlabeled data of CIFAR-

Appendix K Detailed comparison between the end-to-end training of the proposed algorithm and decoupled learning of representations and a classifier

Although FixMatch+DARP+cRT and ReMixMatch+DARP+cRT also use the representations learned by ReMixMatch (berthelot2019ReMixMatch) and FixMatch (sohn2020FixMatch), they showed worse performance than the proposed algorithm. The performance gap between FixMatch(ReMixMatch)+DARP+cRT and the proposed algorithm results from the different characteristics of the ABC versus DARP+cRT as follows. First, whereas DARP+cRT decouples learning of representations and training of a classifier, the ABC is trained end-to-end interactively with representations that the backbone learns. Second, DARP+cRT does not use unlabeled data for training of its classifier after representations learning is finished, while the ABC is trained with unlabeled data to conduct consistency regularization so that decision boundaries can be placed in a low density region. To verify these reasons, we compare the validation loss graphs of the algorithms based on end-to-end training and decoupled learning of representations and a classifier in Figure 11. We recorded the validation loss of 100 epochs after the representations were fixed, where 1 epoch was set as 500 iterations. For the proposed algorithm, we recorded the validation loss of the last 100 epochs. In Figure 9 (a) and (b), we can see that the validation loss of the algorithms based on decoupled learning of representations and a classifier tended to increase after a few epochs. The validation loss was reduced by conducting consistency regularization (C/R) using unlabeled data, but it still tended to increase. In the case of ReMixMatch+DARP+cRT+C/R* and FixMatch+DARP+cRT+C/R*, which do not fix the representations (algorithms marked with *), high-quality representations learned by the backbone were gradually replaced by the representations learned with a re-balanced classifier, which caused overfitting on minority classes. We can observe a similar tendency in Figure 9 (c) under the supervised learning setting. In contrast, the validation loss of ReMixMatch+ABC, FixMatch+ABC, and the proposed algorithm under supervised learning setting decreased steadily and achieved the lowest validation loss. The performances of the algorithms based on end-to-end training and decoupled learning of representations and a classifier are summarized in Table 14.

 (a)With ReMixMatch (b)With FixMatch (c) Supervised setting
Figure 11: Validation loss graphs of algorithms based on end-to-end training and decoupled learning of representations and a classifier on the test set of CIFAR-. The algorithms in (a) and (b) were trained on the train set of CIFAR--LT with , , , and the algorithms in (c) were trained on the training set of CIFAR--LT with , , . C/R and * in the graphs indicate consistency regularization and non-fixed representations, respectively.
Performance of the algorithms based on end-to-end training versus decoupled learning Overall Minority
Under the semi-supervised learning setting
ReMixMatch+ABC (end-to-end training)
ReMixMatch+DARP+cRT+C/R* (Decoupled learning)
ReMixMatch+DARP+cRT+C/R (Decoupled learning)
ReMixMatch+DARP+cRT (Decoupled learning)
FixMatch+ABC (end-to-end training)
FixMatch+DARP+cRT+C/R* (Decoupled learning)
FixMatch+DARP+cRT+C/R (Decoupled learning)
FixMatch+DARP+cRT (Decoupled learning)
Under the supervised learning setting
End-to-End training of CNN with the ABC (end-to-end training)
cRT* (Decoupled learning of representations and the classifier of CNN)
cRT (Decoupled learning of representations and the classifier of CNN)
Table 14: Performance of the algorithms based on end-to-end training and decoupled learning of representations and a classifier. The algorithms were trained on the training set described in the caption of Figure 11. The algorithms were tested on the test set of CIFAR-.

Appendix L Ablation study for FixMatch (sohn2020FixMatch) + ABC on CIFAR-10

Results in Table 15 show a similar tendency as that for ReMixMatch+ABC in Section 4.5 of the main paper.

Ablation study Overall Minority
FixMatch+ABC (proposed algorithm)
Without gradually decreasing the parameter of for consistency regularization
Without consistency regularization for the ABC
Without using the 0/1 mask for the consistency regularization loss
Without using the 0/1 mask for the classification loss
Without using the confidence threshold for consistency regularization
Using hard pseudo labels for consistency regularization
Without training backbone (ABC only)
Training the ABC with a re-weighting technique
Decoupled training of the backbone and ABC
Table 15: Ablation study for FixMatch+ABC on CIFAR--LT, ,