Where is the Bottleneck of Adversarial Learning with Unlabeled Data?

Deep neural networks (DNNs) are incredibly brittle due to adversarial examples. To robustify DNNs, adversarial training was proposed, which requires large-scale but well-labeled data. However, it is quite expensive to annotate large-scale data well. To compensate for this shortage, several seminal works are utilizing large-scale unlabeled data. In this paper, we observe that seminal works do not perform well, since the quality of pseudo labels on unlabeled data is quite poor, especially when the amount of unlabeled data is significantly larger than that of labeled data. We believe that the quality of pseudo labels is the bottleneck of adversarial learning with unlabeled data. To tackle this bottleneck, we leverage deep co-training, which trains two deep networks and encourages two networks diverged by exploiting peer's adversarial examples. Based on deep co-training, we propose robust co-training (RCT) for adversarial learning with unlabeled data. We conduct comprehensive experiments on CIFAR-10 and SVHN datasets. Empirical results demonstrate that our RCT can significantly outperform baselines (e.g., robust self-training (RST)) in both standard test accuracy and robust test accuracy w.r.t. different datasets, different network structures, and different types of adversarial training.



There are no comments yet.


page 3


Are Labels Required for Improving Adversarial Robustness?

Recent work has uncovered the interesting (and somewhat surprising) find...

Domain Adaptation with Adversarial Training and Graph Embeddings

The success of deep neural networks (DNNs) is heavily dependent on the a...

Adversarial Self-Supervised Contrastive Learning

Existing adversarial learning approaches mostly use class labels to gene...

Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples

Adversarial training and its variants have become de facto standards for...

Adversarial Knowledge Transfer from Unlabeled Data

While machine learning approaches to visual recognition offer great prom...

Adversarial Learning for Supervised and Semi-supervised Relation Extraction in Biomedical Literature

Adversarial training is a technique of improving model performance by in...

The Conditional Entropy Bottleneck

Much of the field of Machine Learning exhibits a prominent set of failur...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Due to their superior performance, deep neural networks (DNNs) have been deployed on real systems in many fields, such as image recognition [11]

and natural language processing 

[38]. Real-world systems could take inputs shifted by various perturbations, e.g., different lighting effects on an image, various ambient noise on a conversation. Those could potentially cause unreliable predictions of DNNs. In particular, crafted adversarial examples [26] can easily flip the predictions of deployed DNNs, through adding imperceptible noise to natural data. It arouses anxieties on deploying DNNs in safety-critical fields, such as autonomous driving [7] and medical images analysis [16].

Recently, many efforts have been made on learning robust DNNs to resist such adversarial examples. In general, there are two broad branches in adversarial machine learning, i.e., certified robust training

[35, 30, 8, 14] and empirical robust training [17, 36, 33]. Their common purpose is to construct robust DNNs to mimic the natural occurring system (e.g., human visual system). A system is believed to be robust and invariant to adversarial perturbations since its output is smooth w.r.t. its input [32].

To acquire such smoothness, we can conduct data augmentation using adversarial data [17, 33] or perturbed data with Gaussian random noise [16, 8]

during training. In this way, predictions of DNNs around the data input could be insensitive to imperceptible perturbations. Nonetheless, Dimitris et al. elucidated that adversarial robustness may be at odds with standard accuracy 

[29]. To mitigate the large gap between robustness and accuracy, more well-labeled samples are needed during training [25], which also achieves the greater smoothness close to that of a natural occurring system. However, it is expensive to gather well-labeled data, not to mention large-scale well-labeled data for the smoothness requirement [25]. Fortunately, this issue can be alleviated through seminal efforts [20, 5, 31], namely utilizing unlabeled data to improve adversarial robustness. Conceptually, above works consist of three components:

  • (a) Given the existing training data where , they collected extra unlabeled data where . For example, to obtain the distribution of unlabeled data similar to that of CIFAR-10 [12], 80 million tiny images [28] could be utilized, where CIFAR-10 is a subset annotated by human.

  • (b) Based on those existing training data , they annotated unlabeled data to get , where , and . The goal is to minimize the divergence (e.g., KL divergence) between and .

  • (c) By jointly using dataset , they train robust DNNs using existing strategies, such as Madry’s adversarial training [17], adversarial training TRADES [36], and random smoothing [8].

For example, UAT++ in [31] and SSDRL in [20] integrate parts (b) and (c) into the objective functions of DNNs, which encourages the model output on unlabeled data close to unknown ground-truth labels when number of labeled training samples is large. However, the integration limits the diversity of annotation methods (i.e., part (b)), and it only enables regularization-based methods (e.g.,VAT [19]) to annotate extra unlabeled data. Many potential methods are excluded, such as co-training [4, 23] and graph-based models [39]. By contrast, robust self-training (RST) in [5] has three independent modules for parts (a)–(c), and each fungible part has its clear purpose. Thus, it is believed to be the best by the standard of modular design [34].

The purpose of part (a) is to gather qualified unlabeled data as much as possible, e.g., scratch websites for unlabeled images or collect medical images without doctor’s diagnosis. To gather such data, there are many standard methods, thus the improvement of this part is out the scope of the current paper. Meanwhile, different training methods in part (c) seem to hit their limits, which hardly narrows the gap between robust generalization and standard generalization further [25]. For example, on CIFAR-10, the state-of-the-art TRADES achieves the robust accuracy around 50% [36], while the standard accuracy should be above 90% [11, 37].

Figure 1: With out of CIFAR-10 training data separated out as labeled dataset and the remaining treated as unlabeled dataset , we compare the adversarial training performance, i.e., standard test accuracy and robust test accuracy. In each cell we conduct an adversarial training. For example, for the cell with unlabeled data number 20000 and label accuracy 0.87, we adversarially train a ResNet10 [11] based on training dataset , where is pseudo label dataset of and its labels’ accuracy is 87%. After adversarial training, we conduct the evaluation using CIFAR-10 test data. Standard test accuracy is evaluated on the natural data. Robust test accuracy (PGD-5) and robust test accuracy (PGD-10) are evaluated on adversarial data generated from its corresponding natural data using PGD-5 and PGD-10 [17] respectively. We use ResNet10 for all adversarial training. The three panels above use Madry’s adversarial training [17]. The three panels below use adversarial training TRADES [36].

Thus, this motivates us to improve part (b), namely including more well-labeled data. The label quality is quite crucial to boosting the adversarial robustness. For example, in Figure 1, given the fixed amount of unlabeled data, the increased label accuracy boosts both standard accuracy and robust accuracy significantly. As the amount of unlabeled data increases, high label accuracy takes positive effects on adversarial robustness, vice versa. Meanwhile, negative effects of low-quality labels are reinforced during training. RST [5]

firstly learns a classifier based merely on labeled dataset

, then uses the learned classifier to annotate all unlabeled data with pseudo labels to get . We name this classifier pre-determined annotator. The pre-determined annotator does not consider the knowledge of unlabeled data. As an simple example in Figure 2 illustrates, an annotator based solely on the labeled data may give wrong labels to a large portion of unlabeled data. RST has a bottleneck that it could give poor pseudo labels to unlabeled data, and later adversarial training could be fed on many erroneous data. Even worse, its error is accumulated and reinforced over training. The quality of pseudo labels decides the success of adversarial training.

Fortunately, there remains a lot of room to improve the quality of these labels. To break the bottleneck of RST [5], we leverage deep co-training to improve the quality of pseudo labels in part (b), and thus propose robust co-training (RCT) for adversarial learning with unlabeled data. The proposed algorithm utilizes two networks to correct the mistake of each other by getting consensus on unlabeled data. Meanwhile, each network robustly trains on adversarial examples generated by its peer network, which keeps both networks diverged in function. Our experiments confirm its effectiveness on the quality of pseudo labels, which could further boost both standard test accuracy and robust test accuracy in adversarial training. Our proposed method takes a giant leap in closing the gap between adversarially robust generalization and standard generalization.

2 Related Work

2.1 Semi-supervised Deep Learning

Many works have been proposed to boost the label quality of unlabeled data largely located in the area of semi-supervised learning (SSL). Self-training 

[18, 15] is one of simplest approaches in SSL. Self-training produces pseudo labels for unlabeled data using the model itself to obtain additional training data. Unlabeled data with confident predictions are recruited into training. However, self-training is hardly able to correct its own mistakes. If the model’s prediction on unlabeled data is confident but wrong, the wrong pseudo-labeled data is forever incorporated into training and it amplifies the model error over training iterations.

Multi-view training aims to train multiple models with different views of the data. These view enhance each other and can help to correct other’s mistakes. The most exemplar one is co-training [4]. To be specific, in [4] different views refer to different independent set of feature on the same data. For example, in web page classification, one set of feature is text on the webpage, another set of feature is its anchor text hyperlinks to that webpage. There are two models looking at different sets of feature. Each model are trained on its respective feature set. Over training iterations, unlabeled data with confident predictions by one model are moved to training set of its peer model.

Regularization based semi-supervised learning encourages output of different perturbations of input data to be close, through adding the regularization term in the loss function. For example,

[2, 13, 27] use random perturbations and [19] uses virtual adversarial perturbations. A comprehensive review on SSL, e.g. generated model based SSL and graph-based SSL refers to [6].

2.2 Adversarial Defense

Many works focus on building adversarially robust models against adversarial perturbations. In general, those are divided into two branches certified defenses and empirical defenses.

In certified defenses, the model’s prediction is expected to be unchanged for any perturbed data around its corresponding natural data. There are some exemplar works [24, 35, 8]. For example, [14, 8]

use randomized smoothing to transform base classifier to a new smoothed classifier. However, due to its strong assumption, certified robustness has difficulty in scalability in large models and high dimensional data, and suffers from low computational efficiency in its robustness certification.

Another line of defense is empirical defense. Empirical defense dynamically exploits adversarial examples and recruit them into the training along with natural data. Adversarial examples are exploited according to natural data. The network has a large loss on them, but they are visually indistinguishable with their natural data counterpart. The most exemplar ones are Madry’s adversarial training [17] and adversarial training TRADES [36].

In empirical defense, the purpose of defense is to minimize the adversarial risk, i.e.,

where denotes the true distributions over samples and denoted the allowed perturbations region of the sample point. The empirical defense is to find parameter minimize the empirical risk

where is a finite set of samples drawn i.i.d. from . To solve this min-max problem,  [17] applies Danskin’s theorem [3]. At each training iteration, Madry’s adversarial training firstly exploits adversarial examples that maximize the loss and then update the classifier based on these adversarial examples, i.e.,


where is adversarial example of within its allowed perturbation region , i.e., . The inner maximization is non-convex optimization problem with difficulty to get its exact solution. Projected gradient descent (PGD) [17] is utilized to approximately search its local minima. is the cross-entropy loss encouraging the predicted value of the adversarial example to be near the true label of its corresponding natural example .

Another exemplar work is TRADES [36]. Similar to VAT [19], they introduce a regularization term on the loss function encouraging similarity between predictions of and its adversarial example , i.e.,



is the Kullback Leibler divergence which measure the prediction difference,

is cross-entropy loss, and is the trade off parameter. It also uses PGD to approximately solve the inner maximization.

3 Methodology

In order to achieve greater smoothness in adversarial training, three seminal works leverage large-scale unlabeled data [20, 5, 31]. Figure 1 shows that, given a fixed amount of unlabeled data, both standard test accuracy and robust test accuracy can get improved when its label accuracy of pseudo labels improves. Thus, it is inevitable to require high-quality pseudo labels on those unlabeled data, and part (b) plays an vital role in the success of adversarial training.

Although Carmon et al. leverage the term “robust self-training” (RST) to characterize their algorithm [5], their actual operation for part (b) is not the conventional self-training. Specifically, they train a classifier merely based on . Then, they use the pre-trained classifier to annotate all unlabeled data in one time, which acquires pseudo labels on unlabeled data to get . We name such pre-trained classifier as pre-determined annotator. Finally, they jointly use dataset to train a adversarially robust deep neural network. However, is the pre-determined annotator good enough to annotate unlabeled dataset ? The answer is negative.

Figure 2: An example of the influence of decision boundary by unlabeled data. Blue circle point and red triangle point represent labeled dataset . Grey points represent unlabeled dataset . The shape of the points (circle or triangle) represents their true labels. The orange dashed line represents optimal decision boundary.

Figure 2 shows a simple example to simulate and explain why RST is not an optimal solution. Following RST, the left panel shows the decision boundary (orange line) learning merely from the labeled dataset blue circle, red triangle. Based on , the best annotator we get is the vertical orange line. It will perfectly classify the labeled dataset with zero errors. Nonetheless, it will inevitably annotate some of unlabeled data (grey points) with wrong labels (middle panel of Figure 2). For example, if we adopted the orange line in left panel as the annotator, at least 2 grey circle points would be wrongly annotated as triangle and 2 grey triangle points wrongly annotated as circle.

To address above issues, we can train a classifier based on and . Specifically, we first train a classifier based on . Then, we use the pre-trained classifier to annotate all unlabeled data , which acquires pseudo labels on unlabeled data to get . We jointly use dataset to re-train the classifier, and then re-annotate via re-trained classifier until the convergence (i.e., multiple times). Finally, we jointly use dataset to train a adversarially robust deep neural network. The key step is to use the re-trained classifier to annotate all unlabeled data in multiple times during training.

Taking right panel of Figure 2 as an example, which learns a new decision boundary (i.e., a good annotator). This annotator utilizes labeled data together with unlabeled data (grey points), and these unlabeled data can elucidate the data distribution well. The annotator embedded with the knowledge of both labeled and unlabeled data can characterize the true distribution accurately. Thus, it will annotate ground-truth labels to those unlabeled data (grey points). To sum up, pre-determined annotator (left panel of Figure 2) is not good enough to annotate unlabeled dataset , which motivates us to explore re-trained annotator (Sections 3.1 and 3.2).

3.1 The Simple Realization

The top simple realization is to utilize the conventional self-training [18, 15]. The key idea of self-training is to utilize DNN ’s predictions on unlabeled data over training iterations, namely annotating unlabeled data in multiple times

. Specifically, if the probability of

assigned to the most likely class is higher than a predetermined threshold , is added to the training set for further training with as its pseudo label, i.e., . This process is repeated for a fixed number of iterations or until no more unlabeled data available or confident.

Figure 3 empirically justifies the efficacy of self-training (red line), which significantly improves the quality of pseudo labels compared to pre-determined annotator (black line). However, there is a drawback in conventional self-training, namely network is hardly able to correct its own mistakes. Assume that the prediction of deep networks on an unlabeled data is incorrect at the early training stage. Nonetheless, the data with incorrect pseudo label will be utilized in future training iterations. Due to memorization effect of deep networks [1], will fit the wrongly-labeled data, which will hurt the test performance seriously [40]. This negative effects become even worse, when the domain of unlabeled data is different from that of labeled data [22].

To ameliorate the inferiority of self-training, the straightforward approach is to introduce a pair of networks correcting mistakes of each other, namely vanilla co-training [4]. Specifically, we train a pair of DNNs (i.e., and ) simultaneously. We encourage two networks making consistent predictions on unlabeled data. Meanwhile, two DNNs are feed with different orders of labeled data to keep the inconsistent pace of training. To be specific, at each training iteration with , and , two deep networks and feed forward the common unlabeled data and different labeled data and , and then update parameters and by


where is the learning rate, is the trade-off parameter, is cross entropy loss for labeled data, and is Jensen-Shannon divergence between two predicted probability between and on the same unlabeled data .

We leverage Jensen-Shannon (JS) divergence to measure the similarity between two predicted probability between and

. The JS value is bounded and positive, and the smaller value denotes larger similarity between two probability distributions, vice versa. To minimize JS divergence between predicted probability between

and on unlabeled data , Eq. (3) and (4) encourage and making similar predictions on unlabeled data . Meanwhile, at each training iteration, two DNNs learn from different labeled data and . This will keep each other diverged. Thus, two networks and could be complementary and could help to correct its peer’s mistake on unlabeled data. Besides Jensen-Shannon divergence, we can also use other divergences, such as KL-divergence and Hellinger distance.

From Figure 2, we observe an obvious improvement by vanilla co-training (yellow line) compared with self-training (red line). Taking a closer look at pseudo-label accuracy on unlabeled data, we find that the result of vanilla co-training is better than that of self-training. We believe that the interaction between peer networks (i.e., vanilla co-training) takes positive effects, while there is no any interaction in a single network (i.e., self-training). This point is also supported by the philosophy of collaborative learning [9], where each member interacts with others actively by sharing experiences. Each member takes on asymmetric roles so that new knowledge can be created within members. Nonetheless, the improvement of label accuracy is not completely satisfying. When the number of unlabeled data increases from 20k to 30k, label accuracy of vanilla co-training is similar to that of self-training, since vanilla co-training has the collapsing problem. Namely, two networks gradually become the same one in function, which will not be able to correct mistakes of each other on unlabeled data.

As shown in Figure 4

(yellow line), total variance of predictions


are large before 50 epochs. The high total variance denotes that two networks are diverged in function, since two networks have different views on unlabeled data at the initial training stage. Their different views come from different initialization and orders of fetching labeled data. The benefit of such divergence is that one network could have information gain from observing its peer network. Thus, they have sufficient capacities to correct mistakes of each other on unlabeled data. However, with the increase of training epochs, total variance of two networks gradually decreases and approaches near zero after 350 epochs. It means that two networks gradually converge to the same in function, and they can not correct mistakes of each other. Thus, vanilla co-training will gradually degenerate into self-training, which suffers from accumulated error problem at later training epochs.

Figure 3: Comparisons on pseudo label accuracy on unlabeled data by the annotator obtained by various methods. Pre-determined annotator is used in RST [5], self-training, vanilla co-training, deep co-training are our proposed methods to improve the quality of the annotator.

3.2 The Powerful Realization

To address the collapsing problem of vanilla co-training, we should keep two networks diverged in function. Especially at later training epochs, we should add an extra force pulling each other away, so that they always have capacities for correcting mistakes of each other. Inspired by [23], we encourage two networks diverged by exploiting peer’s adversarial examples, namely deep co-training.

In general, adversarial example is modified from natural example, where the network has large loss on adversarial example while it has small loss on its corresponding natural example. Adversarial example unveils the input space where the network could easily make mistakes. Such space is the weakest part (i.e., leading to unreliable prediction) corresponding to the network. Two networks can keep inconsistent from each other by learning from the weakest part of each other. Namely, each network robustly trains on adversarial examples generated by its peer network. Intuitively, each network always “looks” into peer’s weakest part. Thus, two networks could prevent themselves from collapsing into one function and constantly keep diverged.

Figure 4: Comparison of divergence (evaluated by total variance) between two DNNs and trained by vanilla co-training and deep co-training. Total variance is empirically calculated by probability predictions by and on test data.

Mathematically, at each training iteration with , and , two networks update themselves by


where and are weights of and , is the learning rate, , and are trade-off parameters, is cross entropy loss for labeled data, and is Jensen-Shannon divergence between two predicted probability between and . Most importantly, adversarial data , , and are exploited due to


where , are predicted labels by networks and on respectively, and is cross entropy loss.

Compared with vanilla co-training, the extra loss terms under-braced in Eq. (5) and (6) are introduced in the deep co-training. Note that , in Eq. (5) and (6) and in Eq. (7) - (10) control the “force” that pulls each other away. Specifically, and control importance of divergence term under-braced, while decides allowable size of norm ball around the natural data, where adversarial examples are generated. Increasing , and could enable more divergence between two networks. In practice, Eq. (7) - (10) are hard to be solved analytically, and thus we approximate its solution by PGD [17] or FGSM [10].

Figure 3 validates the efficacy of deep co-training (blue line). We set , , and in Eq. (5) and (6). We utilize FGSM with single step to search for peer network’s adversarial example. For fair comparison, we set in Eq. (3) and (4) (vanilla co-training). It empirically shows that the quality of pseudo labels by deep co-training (blue line) is significantly higher than that of vanilla co-training (yellow line).

To deeply understand the deep co-training, we analyze total variance of two networks over training epochs (blue line in Figure 4). Similar to vanilla co-training, both networks start to converge to each other. However, at late training epochs (e.g., after 300 epochs), total variance of vanilla co-training will approach near zero. In contrast, total variance of deep co-training can keep a positive value, since deep co-training exploits peer’s adversarial examples and prevents two networks collapsing into the same in function. This brings us Algorithm 1 called robust co-training, which connects the deep co-training and adversarial training. Our proposed algorithm can empirically boost both standard accuracy and robust accuracy as follows.

Input: , , , maximum training epoch , learning rate

, trade off hyperparameters

and Data: Labeled dataset
Fetch: Unlabeled dataset
/* Deep co-training */
2 for  = 1,2,…,  do
       Fetch: Mini-batch and from , from , the set of input domain of and is denoted as ,.
       Obtain: Adversarial set , targeting on on labeled dataset and unlabeled dataset according to Eq. (7) and Eq. (9).   // ,
       Obtain: Adversarial set , targeting on on labeled set and unlabeled set according to Eq. (8) and Eq. (10).   // ,
Label: Annotate using either or to get .
Obtain: Augmented dataset .
/* Adversarial training */
  // Use Eq.(1) or Eq.(2.2)
Algorithm 1 Robust co-training (RCT)

In Algorithm 1, the quality of pseudo labels on unlabeled data get improved significantly via deep co-training. Thus, we could obtain augmented dataset by joining labeled dataset and high-quality pseudo-labeled dataset . Then, we train a adversarially robust deep network on using either Madry’s adversarial training (i.e., Eq. (1)) [17] or TRADES (i.e., Eq. (2.2)) [36].

4 Experiments

We conduct experiments on real-world dataset CIFAR-10 and SVHN [21]. We make comparisons between our robust co-training (RCT) and robust self-training (RST) [5]. We show our algorithm could give better pseudo label than RST. As a result, our algorithm boosts both standard test accuracy and robust test accuracy of adversarial training by a large margin. Thus, we empirically justify our main claim: The quality improvement of pseudo labels on unlabeled data could lead to the better adversarial training.

4.1 Quality Improvement of Pseudo Labels

Deep co-training in Algorithm 1 could achieve a significant improvement on pseudo-label accuracy. Compared to pre-determined annotator used in RST, we could annotate unlabeled data more accurately. Especially, when there are more unlabeled data available, deep co-training could further increase the quality of pseudo labels while pre-determined annotator do not have such effects.

Figure 5: Pseudo label quality comparisons between using pre-determined annotator and deep co-training. Pre-determined annotator trained based on a single CNN13. Deep co-training are based on a pair of CNN13.

To justify these effects, we randomly select 4k training data as labeled set and simulate the remaining 4k, 8k, 16k, 32k, 40k, 46k unlabeled dataset in CIFAR-10 dataset. In SVHN dataset, we randomly select 1k training data as labeled dataset and simulate the remaining 1k, 2k, 4k, 8k, 17k, 35k, 72k as unlabeled dataset .

In Figure 5, we compare pseudo-label accuracy on unlabeled data generated by pre-determined annotator [5] and deep co-training, respectively. We use CNN13 [13] as the network backbone. Specifically, pre-determined annotator utilizes only a single CNN13, and deep co-training training utilizes a pair of CNN13. For pre-determined annotator, we train a single CNN13 based on merely until convergence. Then we use the converged CNN13 to yield pseudo labels on all unlabeled data (yellow line).

In deep co-training, we learn a annotator involving unlabeled data. During training, we keep two networks diverged in function by setting and in Eq. (5) and Eq. (6), which is inspired by [23]. We co-train two CNN13 according to the Algorithm 1, where maximum epoch , SGD with 0.9 momentum and learning rate starting from and decaying over epochs. Adversarial examples of Eq. (7) - (10) are generated by FGSM [10] with single step, and the is set to . Then, we randomly choose one of CNN13 pair as the annotator to label the unlabeled data.

Figure 5 shows the gap of pseudo-label accuracy between pre-determined annotator and deep co-training in both CIFAR-10 and SVHN dataset. Specifically, pre-determined annotator (yellow line) does not involve unlabeled data in the learning process. When there are more unlabeled data available, pseudo-label accuracy on unlabeled data does not increase and even decrease. Therefore, RST [5] leveraging pre-determined annotator does not incorporate the knowledge of unlabeled data, and performs undesirably (Section 4.2). By comparison, deep co-training (blue line) incorporates unlabeled data to learn the annotator. Besides, it introduces paradigm of collaborative learning to correct mistakes of each other. As a results, when there are more unlabeled data available, pseudo-label accuracy will increase correspondingly. Therefore, RCT (Algorithm 1) leveraging re-trained annotator incorporates the knowledge of unlabeled data, and performs desirably (Section 4.2).

4.2 Improved Performance of Adversarial Training

We annotate unlabeled data to achieve , where is pseudo label. Note that different methods can acquire , such as pre-determined annotator (in RST [5]), deep co-training (in Algorithm 1), and experts labeling. Then, we combine 4000 labeled dataset with pseudo-labeled dataset into , and conduct adversarial training based on . In Figure 6, we use adversarial training TRADES [36] (i.e., Eq. (2.2)) or Madry’s adversarial training [17] (i.e., Eq. (1)) to conduct experiments on CNN13 and ResNet10, where in Eq. (2.2) is set to 1 for all experiments by TRADES. Both Madry’s adversarial training and TRADES use PGD-10 to exploit adversarial examples. For CIFAR-10, and step size is 0.007. For SVHN, and step size is 0.007. Inputs are normalized between 0 and 1.

Figure 6: Performance comparisons between supervised oracle, robust self-training and our robust co-training. The first column represents adversarial training TRADES on CNN13 on SVHN dataset. The second column represents adversarial training TRADES on CNN13 on CIFAR-10 dataset. The third column represents Madry’s adversarial training on CNN13 on CIFAR-10 dataset. The fourth column represents Madry’s adversarial training on ResNet10 on CIFAR-10 dataset. The first row represents standard test accuracy evaluated by natural test data. The second row represents robust test accuracy evaluated by adversarial data (PGD-5). The third row represents the robust test accuracy evaluated by adversarial data (PGD-20).

To sum up, we compare three adversarial training methods leveraging unlabeled data in Figure 6.

  • Supervised oracle (red line): is labeled by experts achieving 100% correct labels to all unlabeled data.

  • Robust self-training (yellow line): is labeled by pre-determined annotator, which provides around 73% correct labels on unlabeled CIFAR-10 data and around 82% correct labels on unlabeled SVHN data (yellow line in Figure 5).

  • Robust co-training (blue line): is labeled by deep co-training. Depending on the amount of unlabeled data, deep co-training could give around 80% - 90% correct labels to unlabeled CIFAR-10 data and around 89% - 92% correct labels on unlabeled SVHN data (blue line in Figure 5).

To evaluate the performance, we calculate the standard test accuracy using natural test data, and robust test accuracy using its corresponding adversarial test data. Adversarial test data are generated by PGD-5 and PGD-20 respectively, with the and step size is 0.003. Figure 6 shows that, in terms of different datasets, adversarial training methods and network structures, the quality improvement of pseudo labels can obviously improve adversarial training, namely both standard test accuracy and robust test accuracy get improved significantly.

5 Conclusion

In this paper, we investigate the bottleneck of adversarial learning with unlabeled data, and find the affirmative answer “the quality of pseudo labels on unlabeled data". To break this bottleneck, we leverage deep co-training to boost the quality of pseudo labels, and thus propose robust co-training (RCT) for adversarial learning with unlabeled data. We conduct sufficient experiments on CIFAR-10 and SVHN datasets. Empirical results demonstrate that RCT can significantly outperform robust self-training (RST) in both standard test accuracy and robust test accuracy w.r.t. different datasets, different network structures, and different adversarial training. In future, we will investigate theory of RCT, and explore more robust adversarial learning methods.


MS was supported by JST CREST Grant Number JPMJCR1403.


  • [1] D. Arpit, S. Jastrzębski, N. Ballas, D. Krueger, E. Bengio, M. S. Kanwal, T. Maharaj, A. Fischer, A. Courville, Y. Bengio, et al. (2017) A closer look at memorization in deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 233–242. Cited by: §3.1.
  • [2] P. Bachman, O. Alsharif, and D. Precup (2014) Learning with pseudo-ensembles. In Advances in Neural Information Processing Systems, pp. 3365–3373. Cited by: §2.1.
  • [3] D. P. Bertsekas (1997) Nonlinear programming. Journal of the Operational Research Society 48 (3), pp. 334–334. Cited by: §2.2.
  • [4] A. Blum and T. Mitchell (1998) Combining labeled and unlabeled data with co-training. In

    Proceedings of the eleventh annual conference on Computational learning theory

    pp. 92–100. Cited by: §1, §2.1, §3.1.
  • [5] Y. Carmon, A. Raghunathan, L. Schmidt, P. Liang, and J. C. Duchi (2019) Unlabeled data improves adversarial robustness. arXiv preprint arXiv:1905.13736. Cited by: §1, §1, §1, §1, Figure 3, §3, §3, §4.1, §4.1, §4.2, §4.
  • [6] O. Chapelle, B. Schlkopf, and A. Zien (2010) Semi-supervised learning. 1st edition, The MIT Press. External Links: ISBN 0262514125, 9780262514125 Cited by: §2.1.
  • [7] C. Chen, A. Seff, A. Kornhauser, and J. Xiao (2015) Deepdriving: learning affordance for direct perception in autonomous driving. In

    Proceedings of the IEEE International Conference on Computer Vision

    pp. 2722–2730. Cited by: §1.
  • [8] J. M. Cohen, E. Rosenfeld, and J. Z. Kolter (2019) Certified adversarial robustness via randomized smoothing. See DBLP:conf/icml/2019, pp. 1310–1320. External Links: Link Cited by: §1, §1, §1, §2.2.
  • [9] P. Dillenbourg (1999) Collaborative learning: cognitive and computational approaches. advances in learning and instruction series.. ERIC. Cited by: §3.1.
  • [10] I. J. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and harnessing adversarial examples. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Cited by: §3.2, §4.1.
  • [11] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 770–778. Cited by: Figure 1, §1, §1.
  • [12] A. Krizhevsky (2009) Learning multiple layers of features from tiny images. Technical report Cited by: §1.
  • [13] S. Laine and T. Aila (2017) Temporal ensembling for semi-supervised learning. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, Cited by: §2.1, §4.1.
  • [14] M. Lécuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana (2019) Certified robustness to adversarial examples with differential privacy. See DBLP:conf/sp/2019, pp. 656–672. Cited by: §1, §2.2.
  • [15] D. Lee (2013) Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, Vol. 3, pp. 2. Cited by: §2.1, §3.1.
  • [16] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. Sánchez (2017)

    A survey on deep learning in medical image analysis

    Medical image analysis 42, pp. 60–88. Cited by: §1, §1.
  • [17] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2018) Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, External Links: Link Cited by: Figure 1, §1, §1, §1, §2.2, §2.2, §3.2, §4.2, Remark.
  • [18] D. McClosky, E. Charniak, and M. Johnson (2006) Effective self-training for parsing. In Proceedings of the main conference on human language technology conference of the North American Chapter of the Association of Computational Linguistics, pp. 152–159. Cited by: §2.1, §3.1.
  • [19] T. Miyato, S. Maeda, M. Koyama, and S. Ishii (2019) Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 41 (8), pp. 1979–1993. External Links: Link, Document Cited by: §1, §2.1, §2.2.
  • [20] A. Najafi, S. Maeda, M. Koyama, and T. Miyato (2019) Robustness to adversarial perturbations in learning from incomplete data. arXiv preprint arXiv:1905.13021. Cited by: §1, §1, §3.
  • [21] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng (2011) Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, Cited by: §4.
  • [22] A. Oliver, A. Odena, C. A. Raffel, E. D. Cubuk, and I. Goodfellow (2018) Realistic evaluation of deep semi-supervised learning algorithms. In Advances in Neural Information Processing Systems, pp. 3235–3246. Cited by: §3.1.
  • [23] S. Qiao, W. Shen, Z. Zhang, B. Wang, and A. Yuille (2018) Deep co-training for semi-supervised image recognition. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 135–152. Cited by: §1, §3.2, §4.1.
  • [24] A. Raghunathan, J. Steinhardt, and P. Liang (2018) Certified defenses against adversarial examples. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, Cited by: §2.2.
  • [25] L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, and A. Madry (2018) Adversarially robust generalization requires more data. In Advances in Neural Information Processing Systems, pp. 5014–5026. Cited by: §1, §1.
  • [26] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2014) Intriguing properties of neural networks. In International Conference on Learning Representations, External Links: Link Cited by: §1.
  • [27] A. Tarvainen and H. Valpola (2017) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in neural information processing systems, pp. 1195–1204. Cited by: §2.1.
  • [28] A. Torralba, R. Fergus, and W. T. Freeman (2008)

    80 million tiny images: a large data set for nonparametric object and scene recognition

    IEEE transactions on pattern analysis and machine intelligence 30 (11), pp. 1958–1970. Cited by: §1.
  • [29] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry (2019) Robustness may be at odds with accuracy. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, Cited by: §1.
  • [30] Y. Tsuzuku, I. Sato, and M. Sugiyama (2018) Lipschitz-Margin training: Scalable certification of perturbation invariance for deep neural networks. In Advances in Neural Information Processing Systems 31, pp. 6541–6550. Cited by: §1.
  • [31] J. Uesato, Huang,Po-Sen, A. Fawzi, and P. Kohli (2019) Are labels required for improving adversarial robustness?. arXiv preprint arXiv:1905.13725. Cited by: §1, §1, §3.
  • [32] G. Wahba (1990) Spline models for observational data. Vol. 59, Siam. Cited by: §1.
  • [33] Y. Wang, X. Ma, J. Bailey, J. Yi, B. Zhou, and Q. Gu (2019) On the convergence and robustness of adversarial training. In International Conference on Machine Learning, pp. 6586–6595. Cited by: §1, §1.
  • [34] E. Westra (2016) Modular programming with python. Packt Publishing Ltd. Cited by: §1.
  • [35] E. Wong and J. Z. Kolter (2018) Provable defenses against adversarial examples via the convex outer adversarial polytope. See DBLP:conf/icml/2018, pp. 5283–5292. Cited by: §1, §2.2.
  • [36] H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan (2019) Theoretically principled trade-off between robustness and accuracy. In International Conference on Machine Learning, pp. 7472–7482. External Links: Link Cited by: Figure 1, §1, §1, §1, §2.2, §2.2, §4.2, Remark.
  • [37] J. Zhang, B. Han, L. Wynter, B. K. H. Low, and M. S. Kankanhalli (2019) Towards robust resnet: A small step but a giant leap. In

    Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence

    pp. 4285–4291. Cited by: §1.
  • [38] X. Zhang, J. Zhao, and Y. LeCun (2015) Character-level convolutional networks for text classification. In Advances in neural information processing systems, pp. 649–657. Cited by: §1.
  • [39] D. Zhou, J. Huang, and B. Schölkopf (2005) Learning from labeled and unlabeled data on a directed graph. In Proceedings of the 22nd international conference on Machine learning, pp. 1036–1043. Cited by: §1.
  • [40] X. Zhu and X. Wu (2004) Class noise vs. attribute noise: a quantitative study. Artificial intelligence review 22 (3), pp. 177–210. Cited by: §3.1.