Log In Sign Up

ClustTR: Clustering Training for Robustness

This paper studies how encouraging semantically-aligned features during deep neural network training can increase network robustness. Recent works observed that Adversarial Training leads to robust models, whose learnt features appear to correlate with human perception. Inspired by this connection from robustness to semantics, we study the complementary connection: from semantics to robustness. To do so, we provide a tight robustness certificate for distance-based classification models (clustering-based classifiers), which we leverage to propose ClusTR (Clustering Training for Robustness), a clustering-based and adversary-free training framework to learn robust models. Interestingly, ClusTR outperforms adversarially-trained networks by up to 4% under strong PGD attacks. Moreover, it can be equipped with simple and fast adversarial training to improve the current state-of-the-art in robustness by 16%-29% on CIFAR10, SVHN, and CIFAR100.


Semantic Robustness of Models of Source Code

Deep neural networks are vulnerable to adversarial examples - small inpu...

Improving Adversarial Robustness by Enforcing Local and Global Compactness

The fact that deep neural networks are susceptible to crafted perturbati...

Towards Understanding Fast Adversarial Training

Current neural-network-based classifiers are susceptible to adversarial ...

Adversarial Robustness through Local Linearization

Adversarial training is an effective methodology for training deep neura...

Robust Neural Networks inspired by Strong Stability Preserving Runge-Kutta methods

Deep neural networks have achieved state-of-the-art performance in a var...

ROMark: A Robust Watermarking System Using Adversarial Training

The availability and easy access to digital communication increase the r...

Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations

Neural network classifiers can largely rely on simple spurious features,...

1 Introduction

Deep neural networks (DNNs) have demonstrated tremendous success in various fields, from computer vision

imagenet_classification_krizhevsky; long2015fully

and reinforcement learning

human_level_drl; atari_drl

to natural language processing

neural_machine_translation; transformers and speech recognition hinton2012deep. Despite this breakthrough in performance, robustness is becoming a rising concern in DNNs. Specifically, DNNs have been shown to be vulnerable to imperceptible input perturbations intriguing; explaining_harnessing, known as adversarial attacks, which can entirely alter their output. This vulnerability has popularized a new line of research known as network robustness. Robust DNNs should not only be accurate, but also resilient against input perturbations. Given the importance of the problem, a plethora of network robustness approaches have been proposed, including those based on regularization cisse2017parseval; trades; loss_curv_reg_robustness; logit_pairing, distillation distillation_papernot, and feature denoising feature_denoising among many others. In this paper, we focus our attention on the popular and effective adversarial training approach madryadv.

Adversarial training explicitly trains DNNs on adversarial attacks generated on-the-fly through projected gradient descent (PGD). This technique has proven to significantly improve network robustness, and has become a standard for training robust networks. Interestingly, and as a byproduct, adversarially-trained networks seem to learn features that are more semantically-aligned with human perception learning_perceptually_aligned; engstrom2019adversarial, to such a degree that the learnt DNNs can be used for several image synthesis tasks computer_vision_with_single_robust_classifier. The existence of a connection from robustness to semantics raises an interesting dual question:

While adversarial robustness can encourage more semantically-aligned priors on learnt DNN features, can one conversely achieve network robustness by encouraging the learnt features to be more semantically-aligned?

Learning more semantically-aligned features in DNNs remains an open problem. A promising direction for obtaining features with such properties is through Deep Metric Learning (DML) techniques. DML learns feature representations by preserving a notion of similarity between inputs and their feature representations deep_met_triplet_net; norouzi_conse

, and has achieved remarkable performance in face recognition


, image retrieval

frome_dist_fun, and zero-shot learning frome2013devise. The preservation of similarity that DML seeks often involves clustering semantically-similar instances. Hence, recent clustering-based losses deep_met_triplet_net; magnet have been designed with this objective in mind, showing significant progress in learning semantic representations that are also competitive in performance with modern classification approaches.

Figure 1: Closing the loop on robustness and semantics. Earlier work showed that adversarial training results in more semantically-aligned features, i.e. features of same-class instances tend to cluster together (left figure). We study the complementary path, i.e. the effect of learning more semantically-aligned features (via clustering) on network robustness (right figure).

Inspired by these developments, we theoretically show an intimate relation between semantics (through clustering approaches) and robustness, as illustrated in Figure 1. In particular, we show that, under certain continuity properties of the DNN, clustering-based classifiers enjoy a tight robustness radius against -bounded input perturbations. Furthermore, we observe that this radius can be maximized by optimizing a Clustering Loss, i.e. a loss that encourages clustering of semantically-similar instances in feature space. Inspired by this observation, we show that training DNNs with such a loss results in high-performing and robust classifiers. We enhance this clustering-based approach with standard techniques for DNN training, and dub this framework Clustering Training for Robustness (ClusTR). To validate the idea behind ClusTR, we experiment on several datasets and find that ClusTR can yield significant robustness gains. In summary, our contributions are three-fold:

  1. We study the connection from semantics to robustness by analyzing classifiers that employ clustering in representation space. We use this analysis to derive a tight robustness radius, under which all perturbations are unable to change the classifier’s predictions. Moreover, we show that a deep metric learning approach for semantic clustering that optimizes a Clustering Loss is directly related to maximizing the derived robustness radius.

  2. Motivated by our theoretical findings, we propose the ClusTR framework, which employs a popular Clustering Loss (the Magnet Loss magnet), to learn robust models without generating adversaries during training. We validate the theory behind ClusTR through extensive experiments and find that ClusTR results in a significant boost in robustness without relying on adversarial training. Specifically, we observe that classifiers learnt using ClusTR outperform (in robustness) adversarially-trained classifiers freeadv by and under strong PGD attacks on the CIFAR10 cifars and SVHN svhn datasets, respectively.

  3. We show that equipping ClusTR with a quick and cheap version of adversarial training can increase state-of-the-art robustness results against PGD attacks on CIFAR10, CIFAR100, and SVHN benchmarks by , , and , respectively. Interestingly, our proposed pipeline achieves this substantial improvement in robustness, while being at least faster in training than the current state-of-the-art. Lastly, we evaluate ClusTR-trained DNNs against adaptive attacks, showing consistent improvements over the state-of-the-art111Our code can be found at

2 Related Work

Adversarial Robustness. The existence of adversarial perturbations has dramatically increased security concerns in DNNs. Consequently, there has been a surge of research aiming at learning adversarially-robust models buckman2018thermometer; ma2018characterizing; cisse2017parseval. Despite its high computational cost, adversarial training madryadv

remains one of the most popular, successful and reliable techniques for attaining adversarial robustness. Furthermore, adversarial training was regularized by enforcing similarity between logits of both natural and adversarial pairs

logit_pairing. This work was further developed in TRADES trades, a framework that uses a theoretically-motivated loss resulting in large robustness gains. Moreover, regularization also studied the data-complexity perspective, demonstrating an inherent sample complexity barrier on robust learning more_data_robustness, and that pre-training or learning from unlabeled data can vastly improve robustness of adversarially-trained networks pretraining; carmon2019unlabeled. In this work, we tackle robustness from a complementary view, mainly by studying the effect of clustering-based classifiers on robustness.

Robust Features. Recent work demonstrated that networks trained adversarially enjoy an unexpected benefit: the learnt features tend to align with salient data characteristics and human perception robustness_odds_acc. Moreover, the learnt features, commonly referred to as robust features adversarial_ara_not_bugs, seem to be clustered in feature space, while being perceptually aligned learning_perceptually_aligned. Based on these findings, the power of such semantically-aligned features was harnessed to perform image synthesis tasks with a single robust classifier computer_vision_with_single_robust_classifier. In this paper, we take an orthogonal direction to robustness, in which we encourage robustness by training DNNs to specifically learn more semantically-aligned features via clustering.

Metric Learning. The idea of encouraging learnt features to be more semantically meaningful to the human visual system has been extensively studied in the metric learning community, where the goal is to learn a similarity measure in feature space that correlates with a similarity measure between inputs dist_metric_learning_clustering; dml_neighbor; deep_met_triplet_net; dml_survey. In such a setting, semantically-similar inputs (e.g. those belonging to the same class) are expected to be clustered together. This paradigm has shown remarkable performance in several tasks facenet; mikolov2013distributed; frome_dist_fun. Closely related to our work, the approach of mao2019metric used the Triplet Loss facenet to regularize learnt features and enhance network robustness. We complement the previous art with a theoretical justification on the intimate relation between robustness and the general family of metric-learning classifiers that subsumes the Triplet Loss as a special case. Namely, we find a connection between the Magnet Loss magnet and theoretical guarantees of network robustness.

3 From Robustness to Clustering Loss

Recent work has shown that adversarially-trained DNNs, while robust, also tend to learn more semantically-aligned features robustness_odds_acc; learning_perceptually_aligned. Inspired by these findings, we are interested in studying the converse implication, i.e. whether DNNs trained to learn such features enjoy robustness properties. To this end, we start by studying the robustness of a common family of classifiers used in deep metric learning deep_met_triplet_net; magnet, namely classifiers that are based on clustering semantically-similar inputs.

3.1 Robustness

Clustering-based classifiers. Consider a training set consisting of input-label pairs , where belongs to one of classes, and a parameterized function , which can be a DNN. A clustering-based classifier learns parameters such that clusters semantically-similar inputs (inputs with similar labels ) in feature space . That is, clusters each of the classes into different clusters ( may vary across classes). Hence, an input is assigned a label , if and only if, is closest, under some notion of distance, to one of the clusters representing class . To analyze the robustness of such classifiers, and without loss of generality, we consider a binary classification problem, where inputs belong to one of two classes, or , and each class is represented with a single cluster center, i.e. and . Let the cluster centers of and be and , respectively, in . Thus, is classified as , if and only if, , and as otherwise. Throughout this paper, we assume that is -Lipschitz continuous cisse2017parseval: , where denotes the norm.

Figure 2: Illustration of Theorem 1. For a classifier trained with a Clustering Loss, an instance is classified by assigning it to the class of the closest cluster to its feature representation . The resulting decision boundaries form a Voronoi diagram in feature space. As a consequence, the robustness radius in Theorem 1 is proportional to the distance to the decision boundary separating the two closest clusters to .

We are interested in the maximum norm of an input perturbation such that the clustering-based binary classifier assigns the same class to both and . The following theorem provides a bound on such a , denoted the robustness radius. The detailed proof is left for the Appendix.

Theorem 1.

Consider the clustering-based binary classifier that classifies as class , i.e. , with -Lipschitz . The classifier output for the perturbed input will not differ from , i.e. , for all perturbations that satisfy:


Proof Sketch.

It suffices to observe that the clustering-based classifier is equivalent to a linear classifier, operating in representation space, defined by the hyperplane

. The result is deduced from the Cauchy-Schwarz inequality and the Lipschitz continuity property of , where the bound is proportional to the distance to the hyperplane, as illustrated in Figure 2.

It is to be observed that the robustness radius is agnostic to the choice of and . That is to say, the robustness radius in Theorem 1 is not concerned with the accuracy of the classifier, but only with changes in the prediction under input perturbations. Therefore, the cluster centers and are often learnt jointly along with the classifier parameters , such that the feature representations of inputs belonging to class are close to some learnt , while being far from the cluster center representing the other class. Note that if the clustering is performed with K-means, then the cluster centers are the average features belonging to that class, i.e. .

Generalization to the Multi-Class Multi-Cluster Setting. We first consider the multi-class single-cluster case, i.e. , , where each class is represented by a single cluster center , as depicted in Figure 2. Analyzing the robustness around an input in this case is equivalent to analyzing the previously discussed binary classification case with respect to the two closest cluster centers i.e. and . For the multi-class multi-cluster case, where denotes the cluster of the class, it is sufficient to analyze the binary case between the closest cluster centers of two different classes. In this case, and . We leave the rest of the details for the Appendix.

3.2 Clustering Loss as a Robustness Regularizer

Theorem 1 provides a tight222Formal tightness analysis of the bound developed in Theorem 1 is included in the Appendix. robustness radius for each input. So, to attain both accurate and robust models, one can train DNNs to achieve accuracy, while simultaneously maximizing the robustness radius in Theorem 1 for every training input . Several observations can be made about the robustness radius. First, it is inversely proportional to the DNN’s Lipschitz constant , i.e. networks with smaller tend to enjoy better robustness. This is consistent with previous work that exploited this observation to enhance network robustness cisse2017parseval. In this paper, we focus on the term , and on learning parameters to maximize it, i.e. to push features far from cluster centers of different classes () and to pull features closer to cluster centers of their class (). As such, a general class of robustness-based clustering losses can be formulated as follows:


where is the class to which belongs. The function measures the separation between the feature of , i.e. , and the cluster centers of its class. Similarly, measures the separation between and the cluster centers of all other classes. The function combines the two measurements in an overall stable loss, so that minimization of the loss incites larger values for the numerator in Theorem 1. Note that iterative optimization of this loss requires updating . Hence, after every update, cluster centers can be recomputed by any clustering algorithm, e.g. K-means. Moreover, many losses commonly used in the deep metric learning literature NCM conform with Equation (2) as special cases, one of which is the popular Magnet Loss defined as:


where , , and . While the Magnet Loss was introduced to address performance issues in metric learning algorithms, our objective of learning more semantically-aligned features and our subsequent analysis of Theorem 1 suggest that this loss inherently encourages robustness. Regarding inference, DNNs trained with Magnet Loss

predict the class of a test input by computing a soft probability over the features

as follows:


Hence, is assigned to class . We refer the reader to magnet for more details. Next, we introduce ClusTR, a simple framework for training robust models based on our analytical findings.

3.3 ClusTR: Clustering Training for Robustness

Our theoretical study finds an intrinsic connection between clustering and robustness: clustering-based classifiers intrinsically possess a robustness radius. As such, optimizing a loss designed for clustering tends to maximize this robustness radius. We also observe that a Clustering Loss such as Equation (2), which is designed to induce robustness according to Theorem 1, can be reduced to the Magnet Loss of Equation (3) as a special case. Based on these observations, we propose Clustering Training for Robustness (ClusTR): a simple and theoretically-motivated framework for inducing robustness during DNN training without the need to generate adversaries. ClusTR exploits our theoretical findings by combining a Clustering Loss with simple DNN training techniques. For the Clustering Loss, ClusTR incorporates the well-studied Magnet Loss to induce semantic clustering of instances in feature space. Although effective in its task, this loss suffers from slow convergence magnet. ClusTR mitigates this issue by introducing a simple warm start initialization. For a given model and dataset, ClusTR first conducts nominal training, i.e. standard Cross Entropy training, until reasonable performance is achieved. Then, it removes the last linear layer and fine-tunes the resulting DNN by applying the Magnet Loss on the output of the penultimate layer. The Magnet Loss in ClusTR aims at optimizing the robustness radius of Theorem 1, while using a warm start initialization to increase convergence speed without hindering test set accuracy. In this work, we choose the Magnet Loss to be the Clustering Loss in ClusTR. However, the result in Theorem 1 is agnostic to this choice, so we expect our results to extend to other choices of Clustering Loss.

4 Experiments

Figure 3: Effect of warm start on certified accuracy. Figures (a)-(c) show the synthetic datasets, while Figures (b)-(d) show the effect of warm start in ClusTR on certified accuracy. In both datasets, warm start induces a larger robustness radius than training Magnet Loss with random initialization.

In this section, we conduct several experiments on synthetic and real datasets to validate the idea behind ClusTR. Specifically, we study (a) the effect of warm start on convergence speed and robustness, (b) how ClusTR-trained DNNs compare to adversarially-trained counterparts, and (c) how ClusTR can be equipped with a quick version of adversarial training to further enhance robustness.

4.1 Effect of warm start Initialization in ClusTR

Convergence. We assess the training convergence and the overall test accuracy performance for our proposed ClusTR-training of ResNet18 on CIFAR10 and SVHN. In CIFAR10, we observe that training without warm start (i.e. Magnet Loss only) requires 106 minutes to fully train, while introducing the warm start reduces the required training time to 83 minutes on a GTX 1080Ti GPU.

Robustness. We study the effect of the warm start initialization on robustness by conducting controlled synthetic experiments and computing exact robustness radii. We train a 3-layered neural network with 20 hidden units on the synthetic binary classification datasets depicted in Figures 3(a) and (c). On both datasets, we train (1) Magnet Loss with random initialization and (2) ClusTR. For simplicity, each class is represented with a single cluster, i.e. . Upon convergence, both models achieve accuracy. Given model predictions, we compute the robustness radius for each instance and report certified accuracy under varying radius in Figures 3(b) and (d). This is in line with common practice in the network certification literature randomized_smoothing. Note that certified accuracy at radius is defined as the percentage of instances that are both correctly classified and have a robustness radius larger than , as given by Theorem 1. We find that the ClusTR-trained DNNs, while accurate, also enjoy a larger robustness radius than DNNs trained with Magnet Loss without the warm start.

4.2 ClusTR Robustness and Comparison with State-of-the-Art

Setup and Implementation Details. In this section, we conduct experiments with ResNet18 on the CIFAR10, CIFAR100, and SVHN datasets. We train models using our proposed ClusTR framework. Specifically, we first conduct nominal training until we get a reasonable performance333Models with test accuracies on CIFAR10, CIFAR100 and SVHN, respectively.. We then remove the last linear layer and fine-tune the network by applying the Magnet Loss

on the output feature of the remaining DNN. Fine-tuning is done for 30 epochs on CIFAR10 and SVHN, and 60 epochs on CIFAR100. Following

magnet, we use -means kmeans++ to update cluster centers after each training epoch. To assess model robustness, we follow prior work and perform projected gradient descent (PGD) madryadv attacks with --bounded perturbations that take the following form:

where denotes the projection of the perturbed input onto the set ,

is the probability prediction vector computed from Equation (

4), and is the Cross Entropy loss. In all experiments, we perform PGD attacks with 10 random restarts around each input for 20 and 100 iterations, denoted by PGD and PGD, respectively. Following common practice in the literature freeadv; fastadv, we set the PGD step size to . We report the attacks with and leave experiments with other choices of for the Appendix.

Natural PGD PGD Natural PGD PGD
Nominal Training 95.01 0.00 0.00 98.38 0.00 0.00
Free AT freeadv 85.96 46.33 46.19 86.98 46.52 46.06
AT + Pre-Training pretraining 87.30 57.40 57.20 85.12 47.18 46.72
TRADES trades 84.92 56.61 56.43 91.63 57.45 55.28
Magnet Loss magnet 83.14 23.71 22.54 91.95 40.73 38.59
ClusTR 87.34 49.04 47.76 94.28 50.78 50.77
QTRADES 81.07 44.18 43.42 86.36 43.05 42.24
ClusTR + QTRADES 91.03 74.44 74.04 95.06 84.76 84.75
Table 1: Adversarial accuracy comparison on CIFAR10 and SVHN

. We compare ClusTR and ClusTR+QTRADES against Magnet Loss, Free Adversarial Training (Free AT), AT with ImageNet pre-training, TRADES, and QTRADES under

PGD attacks. ClusTR+QTRADES outperforms the adversarially-trained state-of-the-art by a large margin. All numbers are percentages.
Time (min.) 147 534 42 106 83 115
Table 2: Comparison of training time on CIFAR10. ClusTR boosts the convergence speed of Magnet Loss training through warm start. Note that CLUSTR+QTRADES is faster than both TRADES and Free AT, and that QTRADES is

faster than TRADES. Training time is computed on the same workstation using the same software platform (PyTorch

pytorch_neurips) and GPU (GTX 1080Ti).

Experiments on CIFAR10 and SVHN. We evaluate the robustness of nominal training (as baseline), the Magnet Loss (i.e. ClusTR without warm start), and ClusTR, and we compare against several approaches that achieve state-of-art robustness in this experimental setup, namely Free adversarial training (Free AT) freeadv with its reported best setting of 8 minibatch-replays that outperforms vanilla adversarial training madryadv, Adversarial Training with ImageNet pre-training (AT + PreTraining) that leverages external data to improve robustness, and TRADES trades, the current state-of-the-art method. Note that all the robustness methods in this comparison employ various forms of adversarial training.

We evaluate using both natural accuracy, i.e. test set accuracy on clean images, and PGD test accuracy. Table 1 reports these results. First, we observe that training with Magnet Loss only on clean images results in substantial gains in robustness compared to nominal training. In fact, this loss increases PGD accuracy from 0% to 23.71%, while natural accuracy drops from 95.01% to 83.14%. This result constitutes empirical evidence of the theoretical robustness properties we presented for clustering-based classifiers. Furthermore, training with ClusTR consistently outperforms Free AT in both natural and PGD accuracy for both CIFAR10 and SVHN. Specifically, ClusTR outperforms Free AT in PGD accuracy by and on CIFAR10 and SVHN, respectively, even though the former only trains with clean images. We note that ClusTR’s robustness gains over adversarial training are not accompanied with lower natural accuracy. In fact, the natural accuracy of ClusTR is 1% more in CIFAR10 and 7% more in SVHN.

These results show that the design of ClusTR inherently provides robustness properties without introducing adversaries during training. We complement this finding by studying the following question: Can equipping ClusTR with some form of adversarial training provide even more robustness? To do so, we equip ClusTR with a TRADES loss term, where the total loss becomes:


Note that the Cross Entropy based TRADES formulation trades is similar to Equation (5), but with the first term replaced with , where is the output logits of the last linear layer and is the true label. In order to keep the framework simple and computationally efficient, we compute a quickestimate of the adversary in Equation (5). Namely, we start from a random uniform initialization and perform a single PGD step as opposed to TRADES’ multiple iterations. We refer to this setup as QTRADES444The rest of the implementation details of QTRADES are left for the Appendix.. Formally, for an input , we construct an adversarial input by perturbing with uniform noise, i.e. , and then generate by:

While QTRADES alone only achieves slightly lower natural accuracy and robustness as compared to Free AT, Table 1 shows that equipping ClusTR with QTRADES sets new state-of-art robustness results on both datasets, outperforming all other methods. We observe that ClusTR+QTRADES achieves the highest natural accuracy among all methods with and , on CIFAR10 and SVHN respectively, thus improving upon the best competitor by on both datasets. Also, ClusTR+QTRADES surpasses current state-of-art methods by impressive margins: and under strong PGD attacks on CIFAR10 and SVHN, respectively.

Natural PGD PGD
Nominal Training 78.84 0.00 0.00
Free AT freeadv 62.13 25.88 25.58
AT+Pre-Training pretraining 59.23 34.22 33.91
TRADES trades 55.36 28.11 27.96
ClusTR+QTRADES 69.25 52.47 52.40
Table 3: Adversarial accuracy on CIFAR100. We compare ClusTR+QTRADES against Free AT, AT+Pre-Training, and TRADES under PGD attacks. Our proposed ClusTR+QTRADES framework surpasses all competition by a large margin. All numbers are percentages.

Experiments on CIFAR100. We extend our analysis of ClusTR+QTRADES to CIFAR100, and assess robustness with attacks. We report the results of this setup in Table 3, which shows that ClusTR+QTRADES significantly outperforms the strongest competitor by , thus setting a new state-of-art robustness result on CIFAR100. We note that these large gains in robustness also come with a substantial increase in natural accuracy. For CIFAR100, the total number of clusters is . Following how magnet tackles the large-cluster-number regime, ClusTR+QTRADES inference in this case does not consider all clusters, as in Equation (4), but only the nearest clusters. While we take in this experiment, we find that the choice of has marginal impact on robustness. We leave a comprehensive ablation of for the Appendix.

Adaptive Attacks. While going against the current paradigm in network robustness literature, it has been argued that PGD attacks are insufficient to demonstrate network robustness. Recent work shows that many defenses can be broken with carefully-crafted attacks obfus, dubbed adaptive attacks, tailored to break the underlying defense adaptive_attacks. Therefore, we construct an example of a potential powerful attack tailored to our trained networks. Namely, we construct adversaries that maximize the loss as opposed to the cross entropy loss in PGD. Similar to previous experiments, the attacks are performed with 10 random restarts for 100 iterations and . Note that this attack precisely targets the objective, with which our models are trained, thus, the attack is expected to be stronger. Indeed, running this adaptive attack lowers the robustness accuracy from to on CIFAR10, and from to on SVHN. Despite this drop, our ClusTR+QTRADES approach still outperforms the state-of-the-art by a substantial margin. It is essential to note here that this drop in robustness is considered to be rather marginal, as other defenses, when subjected to such tailored attacks, have their robustness drop close to 0, or lower than baseline robust models adaptive_attacks; obfus.

Training Time. We report the training time of the previous methods in Table 2. We note that ClusTR+QTRADES outperforms Free AT and TRADES both in terms of robustness and training time. The speedup is owed to two factors. First, the warm start initialization boosts the convergence of ClusTR compared to Magnet Loss. Second, QTRADES delivered its promises in being very efficient ( faster than TRADES). We leave the training time comparison on SVHN to the Appendix.

It is worthwhile to mention that our choice of QTRADES, out of the many adversarial training schemes with which ClusTR can be equipped, is motivated by (i) the theoretical support behind TRADES trades and (ii) QTRADES’ low computational cost. We also emphasize here that robustness could possibly be improved further by incorporating another adversarial training technique with ClusTR instead of QTRADES. We leave the search for this optimal choice for future work.

5 Conclusion

Inspired by work that observed a connection from robustness to semantics, this paper explores the complementary connection: from semantics to robustness. We showed that clustering-based classifiers inherently enjoy a tight robustness radius against input perturbations and this radius can be maximized by clustering semantically-similar instances in representation space. Motivated by these findings, we proposed ClusTR (Clustering Training for Robustness), a simple and theoretically-motivated framework for learning robust models without generating adversaries in training. Extensive experiments validated the theory motivating ClusTR and showed that ClusTR can achieve network robustness that is superior to adversarially-trained models. ClusTR can also be equipped with a quick version of adversarial training to set new state-of-the-art robustness results against strong adversarial attacks on three benchmark datasets, while also maintaining a training time that is more than faster than the current state-of-the-art robust training method.

6 Broader Impact

While the current performance of deep learning algorithms is unparalleled in several fields, the existence of adversarial examples hinders the inclusion of deep learning as a component in security-critical applications. This issue prevents industrial applications from safely including deep learning in systems related to autonomous cars, computer-aided surgical procedures and healthcare, among others, as deep learning components would constitute a security liability to potential malicious attacks. Hence, approaches addressing the robustness of deep learning, such as the one studied in this work, can make these algorithms more reliable, allowing the inclusion of powerful deep learning systems, and so permitting applications to enjoy their great performance. On the other hand, the addressing of adversarial robustness could prevent the usage of this security leak as a protection against deep learning systems designed with malicious intent.

Acknowledgments. This work was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research.


Appendix A Proof of Theorem 1

Theorem 1.

Consider the clustering-based binary classifier that classifies as class , i.e. , with -Lipschitz . The classifier output for the perturbed input will not differ from , i.e. , for all perturbations that satisfy:


It suffices that for to be classified as .


The inequality follows by Cauchy Schwarz and the Lipschitz of , i.e.

Thus, by rearranging the inequality in 6, the bound on stated in the Theorem guarantees , completing the proof.

a.1 Generalization to multi-class multi-cluster case

The analysis that leads to Theorem 1, based on the single-cluster binary classification problem ( and ), can be extended to the multi-cluster multi-class case ( and ). This extension is achieved by reducing the multi-class multi-cluster case to the single-cluster binary classification problem we studied. Namely, denote by the cluster of the class, and select the centroids and as follows:


These assignments state that: (i) the selected centroids are from different class, hence fooling the classifier is well-defined, and (ii) the centroids are the two nearest centroids to that are from different classes.

Appendix B Decision Boundaries as a Voronoi Diagram

Here, we show that the decision boundaries of such a classifier forms Voronoi diagram that is constructed around the cluster centers. Following the earlier notation, and for the multi-class classifier, where each class is clustered in a single cluster with the center , one can characterize the decision boundaries between each two classes and as follows:

which is precisely the definition of the Voronoi diagram for the metric space over the cluster centers and .

Appendix C Tightness Analysis.

Proposition 1.

Consider the clustering-based binary classifier that classifies as class , i.e. , with -Lipschitz . If

then there exist a direction along which the classifier is fooled, i.e.


We start by observing that the clustering-based classifier that classifies as when and as otherwise, has decision boundaries given by the set . That is, the clustering-based classifier is equivalent to the linear classifier, in the feature, such that is classified as when and as otherwise. Thus, we have that if belongs to then it suffices to show that there exists , satisfying the norm bound in the proposition, such that to prove the statement. We have that


The last equality follows from Equation (6). Now, consider the choice of such that is in the direction , in particular, . Substituting back in Equation (7), we have:

Lastly, note that for any satisfying the bound in the proposition we have , i.e. is classified as completing the proof.

Appendix D Implementation Details

Next, we describe the implementation details of ClusTR, along with details regarding QTRADES. Note that the supplementary material zip file includes the implementation reproducing our results.

Architecture. We use a ResNet18 resnets modified to accept input images. The size of the output of the network in the penultimate layer, i.e. the feature dimension, is set to for all experiments.

Optimization. For the warm start stage of training ClusTR, we use the Adam optimizer kingma2014adam for 90 epochs with learning rate of that is multiplied by at epochs 30 and 60 with cross entropy loss. After that, we fine-tune the DNN with the Magnet Loss with a learning rate of for another 30 epochs for CIFAR10 and 60 epochs for CIFAR100 and SVHN.


Images are normalized by their channel-wise mean and standard deviation. For CIFAR10 and CIFAR100. We apply standard data augmentation of random

crops with a padding of 4. For SVHN, we do not employ any data augmentation.

Magnet Loss. Following Rippel et al. magnet, we compute a stochastic approximation of the Magnet Loss. Hence, Magnet Loss training requires sampling neighborhoods of points in representation space, rather than independent samples. These neighborhoods are defined by a number of clusters and a number of samples per cluster. This sampling procedure does not guarantee that every instance will be sampled, nor that an instance shall be sampled only once. Therefore, we define an epoch as passing as many instances as there are available in the dataset, regardless if some instances were repeated or some instances were seen more than once. For sampling, we set the total number of sampled clusters to 12, and the number of samples per cluster to 20. Hence, the total amount of samples in each batch of each batch is . Cluster assignments are recomputed at the end of every epoch with the K-means clustering algorithm with the K-means initialization. We run grid search for optimizing the parameter in the Magnet Loss. We set to for ClusTR and ClusTR+QTRADES on CIFAR10; to for ClusTR and to for ClusTR+QTRADES on SVHN; to for ClusTR+QTRADES on CIFAR100.

QTRADES. We initialize the adversary by adding uniform noise in to the original instance, computing Cross Entropy between the original and adversarial instances and following one step of gradient ascent for Cross Entropy. The result of gradient ascent is always clipped so that the adversarial instances lies in image space, i.e. . The total loss with which the network is trained is a weighted sum of the Clustering Loss and the Cross Entropy between the original and adversarial instances. We cross validate over the regularization term balancing the two terms in Equation (5). We set to on CIFAR10, to on SVHN, and to on CIFAR100.

Appendix E Additional Experiments

e.1 Combining CE with Distance-Based Classifier

The robustness radius in Theorem 1 holds for any clustering-based classifier of features produced by a Lipschitz-continuous function . Therefore, we start by addressing the following question: if robustness is the aim, can one replace the last layer of a nominally-trained DNN with a clustering-based classifier to achieve robustness? Addressing this question is essential to establish the necessity of enforcing clustering during training, i.e. training with ClusTR. To answer this question, we study a nominally-trained ResNet18 on CIFAR10, which achieves an accuracy of . We observe that directly applying K-means on the representations of the penultimate layer, and performing classification according to Equation (4) achieves an accuracy of , i.e. a performance drop of over . As adversaries will aim at changing the classifier’s predictions, the highest adversarial accuracy that this classifier can attain is upper bounded by . This result demonstrates that features learnt through nominal training are not spatially configured for clustering-based classification. Hence, this result establishes that exploiting the benefits of clustering-based classification requires to explicitly enforce clustering during DNN training.

e.2 Results of PGD Attacks with Other Values.

81.99 81.54 87.48 87.47 60.15 59.77
57.67 57.05 80.04 80.00 33.32 33.25
35.88 34.98 71.56 71.45 17.76 17.65
Table 4: Performance of ClusTR+QTRADES on CIFAR10, CIFAR100 and SVHN. We report the PGD Accuracy of ClusTR+QTRADES on more Values where we show that the robustness of the resultant model is agnostic from the choice of .

Table 4 reports the adversarial accuracies ClusTR + QTRADES under PGD attacks with since we reported the extensive results and comparison for in the main paper. Note that the robustness of our model is not limited to a specific value of .

e.3 Training Time on SVHN

Analogous to Table 2, we report training time comparison for various methods in Table 5. The reported times are the times it takes to for the models to converge based on the stopping criterion discussed in the earlier section or toward the last epoch. Note that ClusTR converges significantly faster than training with Magnet Loss with random initialization. Moreover, ClusTR+QTRADES improves on TRADES, the-state-of-the-art, in both PGD test accuracy and in training time.

Time (min.) 25 763 12 150 52 192
Table 5: Comparison of training time on SVHN. Training time is computed on the same workstation using the same software platform (PyTorch pytorch_neurips) and GPU (GTX 1080Ti).

e.4 Ablation on

Figure 4: Effect of on PGD Test Accuracy. Note that with , i.e. the assumption in our theoretical analysis, our methods outperforms the state-of-the-art. Moreover, it can be seen that considering only about of the total number of clusters yields the best performance.

ClusTR predicts the class of an input as a soft nearest cluster through Equation (4). The probabilities can also be computed by only considering the nearest clusters, as reported in the Experiments Section. Next, we report the effect of varying in terms of the natural and adversarial accuracies.

Figure 4 depicts the behavior of clean and adversarial accuracies with varying on CIFAR10. We observe that the effect of varying on both CIFAR10 and SVHN is negligible . The best PGD accuracy for both CIFAR10 and SVHN under the strong PGD attack was and , respectively (corresponding to ). On the other hand, this effect seems to be stronger on CIFAR100. It is worthwhile to mention that more than 50% of the choices of yields better robustness than the state of the art. Moreover, with which is exact setup of our theoretical result in Theorem 1, ClusTR+QTRADES surpasses the state of the art on all of the datasets by a significant margin. Finally, the best PGD accuracy on CIFAR100 is 53.25% with .