1 Introduction
Deep learning has achieved outstanding results in several important classification problems, using large and wellcurated training data [12, 3]. However, most of the interesting data sets available in the society are orders of magnitude larger, but poorly curated, which means that the data may contain acquisition and labeling mistakes that can lead to poor generalisation [26]. Therefore, one of the important challenges of the field is the development of methods that can cope with such noisy label data sets. Lately, researchers have greatly fostered the development of this field by studying controlled synthetic label noise and discovering theories or methodologies that can then be applied to realworld noisy data sets.
The types of label noise investigated thus far can be classified into two categories –
closedset and openset noise. Although these terms (‘closedset’ and ‘openset’) were coined only recently by Wang et al. in [24] where they introduced the openset noisy label problem, the closedset noisy label problem has been extensively studied since much before. When handling closedset label noise, the majority of the learning algorithms assume a fixed set of training labels [14, 22]. In this setting, some of the training samples are annotated to an incorrect label, while their true class is present in the training label set. These mistakes can be completely random, where labels are flipped arbitrarily to an incorrect class, or consistent, when the annotator is genuinely confused about the annotation of a particular sample. A less studied label noise is the openset noisy label problem [24], where we incorrectly sample some data observations, such that their true annotation is not contained within the set of known training labels. A hyperbolic example of such a setting could be the presence of a horse image in the training set for modelling a cats vs dogs binary classifier. As evident from their definitions, these two types of label noise are mutually exclusive, i.e., a given noisy label cannot be closedset and openset at the same time.It is quite easy to substantiate that both openset and closedset noise are likely to cooccur in realworld data sets. For instance, recent methods for large scale data collection propose the use of querying commercial search engines (e.g., Google Images), where the search keywords serve as the labels of the queried images. It is evident from Figure 1 that collecting images using such methods can lead to both openset and closedset noise. However, thus far, no systematic study with controlled label noise has been presented, where the training data set contains both types of label noise simultaneously. Even though there have been papers that evaluated their proposed methods on both [13, 24], the training data sets have been exclusively corrupted with either closedset noise or openset noise, but never in a combined fashion.
In this paper, we formulate a novel benchmark evaluation to address the noisy label learning problem that consists of a combination of closedset and openset noise. This proposed benchmark evaluation is defined by three variables: 1) the total proportion of label noise in the training set, represented by ; 2) the proportion of closedset noise within the set of samples containing noisy labels, denoted by (this implies that samples of the entire data set have a closedset noisy label and samples of the entire data set have an openset noisy label); and 3) the source of openset noisy label data. Note that this setup generalises both types of label noise as it can collapse to one of the two noise types when .
The stateoftheart (SOTA) approaches that aim to solve the closedset noisy label problem focus on methods that identify the samples that were incorrectly annotated and update their labels with semisupervised learning (SSL) approaches
[14] for the next training iteration. This strategy is likely to fail in the openset problem because it assumes that there exists a correct class in the training labels for every training sample, which is not the case. On the other hand, the main approach addressing the openset noise problem targets the identification of noisy samples to reduce their weights in the learning process [24]. Such strategy is inefficient in closedset problems because the closedset noisy label samples are still very meaningful during the SSL stage. Hence, to be robust in the scenarios where both closedset and openset noise samples are present, the learning algorithm must be able to identify the type of label noise affecting each training sample, and then either update the label, if it is closedset noise, or reduce its weight, if it is openset noise. To achieve this, we propose a new learning algorithm, called EvidentialMix (EDM) – see Fig. 2. The key contributions of our proposed algorithm are the following:
EDM is able to accurately distinguish between clean, openset and closedset samples, thus allowing it to exercise different learning mechanisms depending on the type of label noise. In comparison, previous methods [14, 24] can only separate clean samples from noisy ones, but not closednoise from the opennoise samples.

We show that our method can learn superior feature representations than previous methods as evident from the tSNE plot in Figure 4, where our method has a unique cluster for each of the known label/class and another separate cluster for openset samples. In comparison, previous methods have shown to largely overfit the openset samples and incorrectly cluster them to one of the known classes.

We experimentally show that EDM produces classification accuracy that is comparable or better than the previous methods on various label noise rates (including the extreme case where ).
2 Prior Work
There is an increasing interest in the study of modelling deep learning classifiers with noisy labels. For the closedset noise, Reed et al. [19] proposed one of the first approaches that uses a transition matrix to learn how labels switch between different classes. The use of transition matrices has been further explored in many different ways [18, 5], but none of them show competitive results, likely because they do not include mechanisms to identify and handle samples containing noisy labels. Data augmentation approaches [27] have been successively explored by closedset noisy label methods, where the idea is that it can naturally increase the training robustness to label noise. Metalearning is another technique explored in closedset noisy label problems [15, 20], but the need for clean validation sets or artificial new training tasks makes this technique relatively unexplored. The use of curriculum learning (CL) [7] for closedset problems has been explored to relabel training samples dynamically during training, based on their loss values. This approach has been extended with the training of multiple models [17, 25] that aim to focus the training on samples with small loss that are inconsistently classified by the multiple models. Recently, the explicit identification of noisy samples using negative learning has been explored by Kim et al. [9], with competitive results. Another important approach in handling label noise is model ensembling, as proposed by Tarvainen et al.[23]. The use of robust generative classifiers (RoG) to improve the performance of discriminative classifiers has been explored by Lee et al. [13]
, where they build an ensemble of robust linear discriminative models using features extracted from several layers of the trained discriminative model – in principle, this approach has the potential to improve the performance of any method and has been successively tested in closedset and openset scenarios.
The learning with openset noisy labels has only recently been explored by Wang et al. [24], where the idea is to identify the samples containing noisy labels and reduce their weight in the training process since they almost certainly belong to a class not represented in the training set. Given that they are the only ones to explicitly address the openset noise, their method is the main baseline for that problem.
The current SOTA for closedset noisy label approaches are SELF [22] and DivideMix [14] – both consisting of methods that combine several of the approaches described above. SELF [22] combines model ensembling, relabelling, noisy sample identification, and data augmentation; while DivideMix [14] uses multiple model training, noisy sample identification, and data augmentation [1]. These two approaches are likely to be vulnerable to openset noise since they assume that training samples must belong to one of the training classes – an assumption that is not correct for openset noise.
3 Method
3.1 Problem Definition
We define the training set as , where the RGB image ( represents the image lattice), the set of training labels is denoted by , which forms the standard basis of dimensions, (, representing a multiclass problem). Note that is the noisy label for and the hidden clean label is represented by .
For the closedset noise problem with noise rate , we assume that is labelled as
with probability
, and with probability , with representing a random function that picks one of the labels in following a particular distribution parameterised by .For the openset noise problem with noise rate , we need to define a new training set (with ), where the label set for is represented by (with ) – this means that the images in no longer have labels in . In such openset problem, a proportion of samples is drawn with with , while a proportion of samples are obtained with with .
The combined closedset and openset problem with rates is defined by mixing the two types of noise above. More specifically, of the training set contains images annotated with , while images are sampled as with label , and images belong to and labelled with .
3.2 Noise Classification
The main impediment when dealing with this problem is the need to identify closedset and openset noisy samples since they must be dealt differently by the method. One possible way of dealing with this problem is by associating closedset samples with high losses computed from confident but incorrect classification [14]
, and openset samples with uncertain classification. To achieve this, we propose the use of the subjective logic (SL) loss function
[21] that relies on the theory of evidential reasoning and SL to quantify classification uncertainty. The SL loss makes use of the Dirichlet distribution to represent subjective opinions, encoding belief and uncertainty. A network trained with the SL loss tries to learn parameters of a predictive posterior as a Dirichlet density function for the classification of the training samples. The resulting output for a given sample is considered as the evidence for the classification of that sample over a set of class labels [8]. Figure 3 shows the persample loss distribution of training samples from a network trained using the SL loss. The separation between clean, closedset and openset samples is easy to capture.3.3 EvidentialMix
Our proposed EvidentialMix simultaneously trains two networks: NetS, which uses the SL loss [21], and NetD, which uses the SSL training mechanism and the DivideMix (DM) loss [14]. Broadly speaking, the ability of the SL loss to estimate classification uncertainty allows NetS to divide the training set into cleanset, openset and closedset samples. The predicted cleanset and closedset samples are then used to train NetD using MixMatch as outlined in [14], while the predicted openset samples are discarded for that epoch. Following this, NetD relabels the entire training data set (including predicted openset samples) that are then used to train NetS.
As NetS iteratively learns from the labels predicted by NetD, it gets better at splitting the data into the three sets. This is so because the labels from NetD become more accurate over the training process given that it is only trained on clean and closedset samples, and never on predicted openset samples. The two networks thus complement each other to produce accurate classification results for the combined closedset and openset noise problem. A detailed explanation is outlined below, while Alg. 1 delineates the full training algorithm.
Algorithm 1 trains NetD, represented by , and NetS, denoted by (with
) – both of which return a logit in
. In the warmup stage (see line WarmUp()), we train both models for a limited number of epochs using the cross entropy loss for , where probability is obtained by applying a softmax activation to form , and the following (SL) loss for [21]:(1) 
with [21]:
(2) 
where for class , with
representing the ReLU activation function, and
.The classification of samples into the clean, closedset, and openset is performed using the SL loss values from Eq. (2) for the entire training set . More specifically, we take the set of losses and fit a
component Gaussian mixture model (GMM) using the ExpectationMaximization algorithm. The idea we explore in this paper lies in the fact that the model output for the
clean samples will tend to be confident and at the same time agree with its original label, producing a small loss. The model output for closedset noise samples will also tend to be confident but at the same time disagree with its original label, generating a large loss value. The model output for openset noise samples, however, will not produce a confident output, resulting in a loss value that is neither large nor small. Therefore, the multicomponent GMM will capture each of these sets, where the clean probabilityis the posterior probability
, where denotes the set of Gaussian components with mean (i.e., small losses), the closedset posterior probability is computed from the components with mean (i.e., large losses), and the openset probability is the posterior of the remaining components with means – these posterior probabilities form the three sets , and . Using these posteriors, we can build the set of clean samples, represented by with samples where the clean posterior probability is larger than the other two probabilities, and the closedset, denoted by , containing samples that have the closedset posterior probability larger than the other two.Next, we train NetD with the clean set and closedset , defined above. A minibatch is sampled from and , and we augment each sample in each set times [14]. The average classification probabilities for the clean and closedset samples are then computed using the augmented samples, which after temperature sharpening, denoted by TempSharpen with denoting the temperature, form the ‘new’ samples and labels for the clean and closedset samples, and
, respectively. The last stage before stochastic gradient descent (SGD) is the MixMatch process
[1], where samples from the and are linearly combined to form and . SGD minimises the DM loss that combines the following functions [14]:(3) 
where denotes the weight of the loss associated with the unlabelled data set and weights the regularisation loss. The loss terms in Eq. (3) are defined by
(4) 
and
(5) 
where represents the model output of all labels for input . After training NetD, we train the NetS model by minimising the SL loss Eq. (1) with an updated training set, represented by , formed by , for that produces the new labels .
The inference for a test sample relies entirely on the NetD classifier, as follows: .
3.4 Implementation
We train an 18layer PreAct Resnet [6] (for both NetS and NetD) using stochastic gradient descent (SGD) with momentum of 0.8, wight decay of 0.0005 and batch size of 64. The learning rate is 0.02 for WarmUp and for 100 epochs in the main training process, which is reduced to 0.002 afterwards. The WarmUp stage lasts for 10 and 30 epochs for NetD and NetS, respectively, where NetD is trained with a crossentropy loss (i.e., in Eq. (5) using the unchanged training set ) while NetS is trained with the subjective logic loss in Eq. (1) also using . After WarmUp, both models are trained for epochs. Similar to [14], the number of data augmented samples is , the sharpening temperature is , the MixMatch parameter is , and the regularisation weight for the DM loss in Eq. (3) is . However, unlike [14] that manually selects the value of based on the value of , we set = 25 for all our experiments. For the GMM, we use components, , and since these values produced stable results.
0.3  0.6  
0  0.25  0.5  0.75  1  0  0.25  0.5  0.75  1  
ImageNet32  RoG [13]  Best  91.9  90.7  90.2  89.6  89.5  87.8  85.7  84.5  83.1  82.9 
Last  91.0  88.7  86.6  86.2  83.9  85.9  78.1  70.3  64.7  59.8  
ILON [24]  Best  91.8  90.7  88.0  86.5  85.8  87.7  83.4  81.2  78.7  77.3  
Last  90.6  86.9  82.0  77.3  72.7  85.5  72.6  58.9  54.4  46.5  
DivideMix [14]  Best  92.4  92.5  93.4  93.9  94.3  92.5  92.8  93.2  93.9  94.7  
Last  92.0  92.5  93.0  93.7  94.1  92.5  92.2  92.8  93.2  94.6  
EDM (Ours)  Best  93.2  94.4  94.7  95.1  95.2  91.2  93.7  94.0  94.1  94.1  
Last  92.5  93.7  94.5  94.7  94.8  90.9  93.1  93.4  93.9  94.1  
CIFAR100  RoG [13]  Best  91.4  90.9  89.8  90.4  89.9  88.2  85.2  84.1  83.7  83.1 
Last  89.8  87.4  85.9  84.9  84.5  82.1  72.9  66.3  62.0  59.5  
ILON [24]  Best  90.4  88.7  87.4  87.2  86.3  83.4  82.6  80.5  78.4  77.1  
Last  87.4  84.3  80.0  74.6  73.8  78.0  67.9  55.2  48.7  45.6  
DivideMix [14]  Best  89.3  90.5  91.5  93.0  94.3  89.0  90.6  91.8  93.4  94.4  
Last  88.7  90.1  90.9  92.8  94.0  88.7  89.8  91.5  93.0  94.3  
EDM (Ours)  Best  92.9  93.8  94.5  94.8  95.3  90.6  92.9  93.4  93.7  94.3  
Last  91.9  93.1  94.0  94.5  95.1  89.4  91.4  92.8  93.4  94.0  
4 Experiments
Following the prior work on closedset and openset noise problems [14, 24, 13], we conduct our experiments on the CIFAR10 data set [11] for closedset noise [14, 13]; and include the CIFAR100 (smallscale) [11] and ImageNet32 (largescale) [2] data sets for the openset noise scenario [24, 13]. CIFAR10 has 10 classes with 5000 3232 pixel training images per class (forming a total of 50000 training images), and a testing set with 10000 3232 pixel images with 1000 images per class. CIFAR100 has 100 classes with 5000 32
32 pixel images per class and ImageNet32 is a downsampled variant of ImageNet
[3], with 1281149 images and 1000 classes, but resized to 3232 pixels per image. All data sets above have been set up with curated labels, so below we introduce a new noisy label benchmark evaluation that combines closedset and openset synthetic label noise.4.1 Combined Openset and Closedset Noisy Label Benchmark
The proposed benchmark is defined by the rate of label noise in the experiment, denoted by , and the proportion of closedset noise in the label noise, denoted by . The closedset label noise is simulated by randomly selecting of the training samples from CIFAR10, and symmetrically shuffling their label, similarly to the synthetic label noise used in [14]. The openset label noise is simulated by randomly selecting of the training images from CIFAR10 and replacing them with images randomly selected from either CIFAR100 [11] or ImageNet32 [2], where a CIFAR10 label is randomly assigned to each one of these images. Results are based on the classification accuracy on the clean testing set from CIFAR10, using the benchmark proposed above. We also show a comparison of the sample distribution between EDM and other related approaches in the feature space using tSNE [16], and the effectiveness of EDM to separate clean, closedset and openset noisy samples.
4.2 Related Approaches for Comparison
We compare our proposed approach with the three methods listed below:
DivideMix ^{1}^{1}1We used the publicly available code provided by the authors of the paper to produce our results. [14]
is the current SOTA method that converts the problem of closedset noisy label learning into a semisupervised learning problem. It follows a multiplemodel approach that splits the training data into clean and noisy subsets by fitting a 2component Gaussian Mixture Model (GMM) to the loss values of the training samples at each epoch. Next, the framework discards the labels of the predicted noisy samples and uses MixMatch
[1] to train the model.ILON ^{2}^{2}2As the authors did not make their code publicly available, we implemented their method from scratch and trained a Siamese network of 18layer PreAct Resnet to produce our results. [24]
introduces the openset noisy label learning problem, where the proposed approach is based on an iterative solution that reweights the samples based on the outlier measure of the Local Outlier Factor Algorithm (LOF)
[10].RoG ^{1} [13] builds an ensemble of generative classifiers formed from features extracted from multiple layers of the the ResNet model. The authors of RoG tested their approach on both closedset noise and openset noise separately which makes it an important benchmark to consider in our combined setup.
4.3 Results and Discussion
Classification accuracy: Table 1 shows the results computed from the benchmark evaluation of the proposed EDM, in comparison with the results of RoG [13], ILON [24], and DivideMix [14]. The evaluation relies on different rates of label noise () and closedset noise () using CIFAR100 and ImageNet32 as openset data sets. Results show that our method EDM outperforms all the competing approaches for 17 out of the 20 noise settings and is a close second on the remaining 3. For , EDM produces better results than all competing methods for all values of and choice of openset data sets, with an improvement of more than over the next best method in some cases. On the other hand, both RoG and ILON perform significantly worse than EDM and DivideMix, particularly for where the difference in accuracy is over 15% in some cases. In general, RoG and ILON are observed to perform worse when the proportion of closedset noise increases, while the converse is true for DivideMix and EDM. It is also apparent that EDM is more robust to openset noise than DivideMix, as evident from classification results when is small.
Feature representations: We show the tSNE plots [16] in Fig. 4 for all the methods for the case where total noise rate , with the closedset proportion using CIFAR100 and ImageNet32 as openset data sets. In particular, the features for all methods are extracted from the last layer of the models (in our case, we use the features from the NetD, which is the one used for classification, as explained in Sec. 3.3). In the visualisation, the brown samples are from the openset data sets, while all other colours represent the true CIFAR10 classes. This clearly shows that our proposed EDM is quite effective at separating openset samples from the other clean and closedset training samples, while DivideMix and ILON largely overfit these samples, as evident from the spread of openset samples around CIFAR10 classes. Interestingly, RoG also shows good separation, but with apparently more complex distributions than EDM.
Noise classification: Fig. 5 shows the distribution of loss function values for the clean, openset and closedset samples in cases where the noise rate , with closedset rates using samples from both CIFAR100 and Imagenet32 as openset noise. From these graphs, it is clear that the SL loss in Eq. (2) is able to successfully distinguish samples from each one of the three sets above, even when only one of the noise types is available, such as the case when . This suggests that the exploration of uncertainty in the SL loss to identify samples belonging to openset noise is effective. Among the methods tested in this paper, DivideMix [14] also tries to separate the training samples into clean and noisy sets using the loss in (3). However, the resulting distribution seems inadequate to allow for a clear separation between the three sets because the openset and closedset noisy labels are basically indistinguishable. Consequently, DivideMix is able to separate clean samples from noisy samples, but not closedset noise from openset noise, thus forcing it to treat both noise types similarly during the training (i.e., both types are treated as closedset noise). This is not ideal given that the openset samples will be allocated to one of the incorrect training labels, which can ultimately cause the training to overfit these samples.
5 Conclusion
In this paper, we investigate a variant of the noisy label problem that combines openset [24, 13] and closedset noisy labels [14, 13]. To test various methods for this new problem, we propose a new benchmark that systematically changes the total noise rate and the proportion of closedset and openset noise. The openset samples were sourced from either a smallscale data set (CIFAR100) or a largescale data set (ImageNet32) such that the true label of these samples is not contained in the primary data set (CIFAR10). We argue that such a problem setup is more general and similar to reallife noisy label scenarios. We then propose the EvidentialMix algorithm to successfully address this new noise type with the use of the subjective logic loss [21] that produces low loss for clean samples, high loss for closedset noisy samples, and midrange loss for openset samples. The clear division of the training data allows us to (1) identify and thereby remove the openset samples from training to avoid overfitting them, given that they do not belong to any of the known classes, and (2) learn from the predicted closedset samples in a semisupervised fashion as in [14]. The evaluation shows that our proposed EDM is more effective to address this new combined openset and closedset label noise problem than the current stateoftheart approaches for closedset problems [14, 13] and openset problems [24, 13].
Future work: The motivation for introducing this problem was to open the dialogue in the research community to investigate the combined openset and closedset label noise. Moving forward, we aim to explore more challenging noise settings such as incorporating asymmetric [19] and semantic noise [13] in the proposed combined label noise problem. Since we are the first ones to address this problem in a controlled setup, there is no precedent on how these more challenging noise scenarios could be meaningfully incorporated. For instance, even though asymmetric closedset noise has previously been studied in the literature [19]
, it is not obvious what its counterpart, asymmetric openset noise entails; for instance, it is not immediately clear how to build a noise transition matrix between CIFAR10 and ImageNet classes. In addition, we see merit in investigating other types of uncertainty to identify openset noise, such as with Bayesian learning
[4], and aim to explore such methods.Acknowledgements: IR and GC gratefully acknowledge the support of the Australian Research Council through the Centre of Excellence for Robotic Vision CE140100016 and Future Fellowship (to GC) FT190100525. GC acknowledges the support by the Alexander von HumboldtStiftung for the renewed research stay sponsorship. RS acknowledges the support by the Playford Trust Honours Scholarship.
References
 [1] (201905) MixMatch: A Holistic Approach to SemiSupervised Learning. arXiv eprints, pp. arXiv:1905.02249. External Links: 1905.02249 Cited by: §2, §3.3, §4.2.
 [2] (2017) A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv preprint arXiv:1707.08819. Cited by: Figure 3, §4.1, §4.

[3]
(2009)
Imagenet: a largescale hierarchical image database.
In
2009 IEEE conference on computer vision and pattern recognition
, pp. 248–255. Cited by: §1, §4. 
[4]
(2016)
Dropout as a bayesian approximation: representing model uncertainty in deep learning.
In
international conference on machine learning
, pp. 1050–1059. Cited by: §5. 
[5]
(2017)
Training deep neuralnetworks using a noise adaptation layer
. Cited by: §2.  [6] (2016) Identity mappings in deep residual networks. In European conference on computer vision, pp. 630–645. Cited by: §3.4.
 [7] (2018) Mentornet: learning datadriven curriculum for very deep neural networks on corrupted labels. In International Conference on Machine Learning, pp. 2304–2313. Cited by: §2.
 [8] (201809) Uncertainty Aware AI ML: Why and How. arXiv eprints, pp. arXiv:1809.07882. External Links: 1809.07882 Cited by: §3.2.
 [9] (2019) Nlnl: negative learning for noisy labels. In Proceedings of the IEEE International Conference on Computer Vision, pp. 101–110. Cited by: §2.
 [10] (2009) LoOP: local outlier probabilities. In Proceedings of the 18th ACM conference on Information and knowledge management, pp. 1649–1652. Cited by: §4.2.
 [11] (2009) Learning multiple layers of features from tiny images. Cited by: Figure 3, §4.1, §4.
 [12] (201505) Deep learning. Nature 521, pp. 436–44. External Links: Document Cited by: §1.
 [13] (2019) Robust inference via generative classifiers for handling noisy labels. arXiv preprint arXiv:1901.11300. Cited by: §1, §2, Figure 4, Figure 4, Table 1, §4.2, §4.3, §4, §5, §5.
 [14] (202002) DivideMix: Learning with Noisy Labels as Semisupervised Learning. arXiv eprints, pp. arXiv:2002.07394. External Links: 2002.07394 Cited by: Figure 2, 1st item, §1, §1, §2, Figure 4, Figure 4, Figure 5, §3.2, §3.3, §3.3, §3.4, Table 1, §4.1, §4.2, §4.3, §4.3, §4, §5.
 [15] (2019) Learning to learn from noisy labeled data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5051–5059. Cited by: §2.
 [16] (2008) Visualizing data using tsne. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §4.1, §4.3.
 [17] (2017) Decoupling” when to update” from” how to update”. In Advances in Neural Information Processing Systems, pp. 960–970. Cited by: §2.
 [18] (2017) Making deep neural networks robust to label noise: a loss correction approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1944–1952. Cited by: §2.
 [19] (201412) Training Deep Neural Networks on Noisy Labels with Bootstrapping. arXiv eprints, pp. arXiv:1412.6596. External Links: 1412.6596 Cited by: §2, §5.
 [20] (2018) Learning to reweight examples for robust deep learning. arXiv preprint arXiv:1803.09050. Cited by: §2.
 [21] (2018) Evidential deep learning to quantify classification uncertainty. External Links: 1806.01768 Cited by: Figure 2, §3.2, §3.3, §3.3, §5.
 [22] (201910) SELF: Learning to Filter Noisy Labels with SelfEnsembling. arXiv eprints, pp. arXiv:1910.01842. External Links: 1910.01842 Cited by: §1, §2.
 [23] (2017) Mean teachers are better role models: weightaveraged consistency targets improve semisupervised deep learning results. In Advances in neural information processing systems, pp. 1195–1204. Cited by: §2.
 [24] (201803) Iterative Learning with Openset Noisy Labels. arXiv eprints, pp. arXiv:1804.00092. External Links: 1804.00092 Cited by: 1st item, §1, §1, §1, §2, Figure 4, Figure 4, Table 1, §4.2, §4.3, §4, §5.
 [25] (2019) How does disagreement help generalization against label corruption?. arXiv preprint arXiv:1901.04215. Cited by: §2.
 [26] (2016) Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530. Cited by: §1.
 [27] (2017) Mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412. Cited by: §2.
Comments
There are no comments yet.