1 Introduction
Deep neural networks (DNNs) have demonstrated tremendous success in various fields, from computer vision
imagenet_classification_krizhevsky; long2015fullyhuman_level_drl; atari_drlto natural language processing
neural_machine_translation; transformers and speech recognition hinton2012deep. Despite this breakthrough in performance, robustness is becoming a rising concern in DNNs. Specifically, DNNs have been shown to be vulnerable to imperceptible input perturbations intriguing; explaining_harnessing, known as adversarial attacks, which can entirely alter their output. This vulnerability has popularized a new line of research known as network robustness. Robust DNNs should not only be accurate, but also resilient against input perturbations. Given the importance of the problem, a plethora of network robustness approaches have been proposed, including those based on regularization cisse2017parseval; trades; loss_curv_reg_robustness; logit_pairing, distillation distillation_papernot, and feature denoising feature_denoising among many others. In this paper, we focus our attention on the popular and effective adversarial training approach madryadv.Adversarial training explicitly trains DNNs on adversarial attacks generated onthefly through projected gradient descent (PGD). This technique has proven to significantly improve network robustness, and has become a standard for training robust networks. Interestingly, and as a byproduct, adversariallytrained networks seem to learn features that are more semanticallyaligned with human perception learning_perceptually_aligned; engstrom2019adversarial, to such a degree that the learnt DNNs can be used for several image synthesis tasks computer_vision_with_single_robust_classifier. The existence of a connection from robustness to semantics raises an interesting dual question:
While adversarial robustness can encourage more semanticallyaligned priors on learnt DNN features, can one conversely achieve network robustness by encouraging the learnt features to be more semanticallyaligned?
Learning more semanticallyaligned features in DNNs remains an open problem. A promising direction for obtaining features with such properties is through Deep Metric Learning (DML) techniques. DML learns feature representations by preserving a notion of similarity between inputs and their feature representations deep_met_triplet_net; norouzi_conse
, and has achieved remarkable performance in face recognition
facenetfrome_dist_fun, and zeroshot learning frome2013devise. The preservation of similarity that DML seeks often involves clustering semanticallysimilar instances. Hence, recent clusteringbased losses deep_met_triplet_net; magnet have been designed with this objective in mind, showing significant progress in learning semantic representations that are also competitive in performance with modern classification approaches.Inspired by these developments, we theoretically show an intimate relation between semantics (through clustering approaches) and robustness, as illustrated in Figure 1. In particular, we show that, under certain continuity properties of the DNN, clusteringbased classifiers enjoy a tight robustness radius against bounded input perturbations. Furthermore, we observe that this radius can be maximized by optimizing a Clustering Loss, i.e. a loss that encourages clustering of semanticallysimilar instances in feature space. Inspired by this observation, we show that training DNNs with such a loss results in highperforming and robust classifiers. We enhance this clusteringbased approach with standard techniques for DNN training, and dub this framework Clustering Training for Robustness (ClusTR). To validate the idea behind ClusTR, we experiment on several datasets and find that ClusTR can yield significant robustness gains. In summary, our contributions are threefold:

We study the connection from semantics to robustness by analyzing classifiers that employ clustering in representation space. We use this analysis to derive a tight robustness radius, under which all perturbations are unable to change the classifier’s predictions. Moreover, we show that a deep metric learning approach for semantic clustering that optimizes a Clustering Loss is directly related to maximizing the derived robustness radius.

Motivated by our theoretical findings, we propose the ClusTR framework, which employs a popular Clustering Loss (the Magnet Loss magnet), to learn robust models without generating adversaries during training. We validate the theory behind ClusTR through extensive experiments and find that ClusTR results in a significant boost in robustness without relying on adversarial training. Specifically, we observe that classifiers learnt using ClusTR outperform (in robustness) adversariallytrained classifiers freeadv by and under strong PGD attacks on the CIFAR10 cifars and SVHN svhn datasets, respectively.

We show that equipping ClusTR with a quick and cheap version of adversarial training can increase stateoftheart robustness results against PGD attacks on CIFAR10, CIFAR100, and SVHN benchmarks by , , and , respectively. Interestingly, our proposed pipeline achieves this substantial improvement in robustness, while being at least faster in training than the current stateoftheart. Lastly, we evaluate ClusTRtrained DNNs against adaptive attacks, showing consistent improvements over the stateoftheart^{1}^{1}1Our code can be found at github.com/clustrofficialaccount/ClusTRClusteringTrainingForRobustness..
2 Related Work
Adversarial Robustness. The existence of adversarial perturbations has dramatically increased security concerns in DNNs. Consequently, there has been a surge of research aiming at learning adversariallyrobust models buckman2018thermometer; ma2018characterizing; cisse2017parseval. Despite its high computational cost, adversarial training madryadv
remains one of the most popular, successful and reliable techniques for attaining adversarial robustness. Furthermore, adversarial training was regularized by enforcing similarity between logits of both natural and adversarial pairs
logit_pairing. This work was further developed in TRADES trades, a framework that uses a theoreticallymotivated loss resulting in large robustness gains. Moreover, regularization also studied the datacomplexity perspective, demonstrating an inherent sample complexity barrier on robust learning more_data_robustness, and that pretraining or learning from unlabeled data can vastly improve robustness of adversariallytrained networks pretraining; carmon2019unlabeled. In this work, we tackle robustness from a complementary view, mainly by studying the effect of clusteringbased classifiers on robustness.Robust Features. Recent work demonstrated that networks trained adversarially enjoy an unexpected benefit: the learnt features tend to align with salient data characteristics and human perception robustness_odds_acc. Moreover, the learnt features, commonly referred to as robust features adversarial_ara_not_bugs, seem to be clustered in feature space, while being perceptually aligned learning_perceptually_aligned. Based on these findings, the power of such semanticallyaligned features was harnessed to perform image synthesis tasks with a single robust classifier computer_vision_with_single_robust_classifier. In this paper, we take an orthogonal direction to robustness, in which we encourage robustness by training DNNs to specifically learn more semanticallyaligned features via clustering.
Metric Learning. The idea of encouraging learnt features to be more semantically meaningful to the human visual system has been extensively studied in the metric learning community, where the goal is to learn a similarity measure in feature space that correlates with a similarity measure between inputs dist_metric_learning_clustering; dml_neighbor; deep_met_triplet_net; dml_survey. In such a setting, semanticallysimilar inputs (e.g. those belonging to the same class) are expected to be clustered together. This paradigm has shown remarkable performance in several tasks facenet; mikolov2013distributed; frome_dist_fun. Closely related to our work, the approach of mao2019metric used the Triplet Loss facenet to regularize learnt features and enhance network robustness. We complement the previous art with a theoretical justification on the intimate relation between robustness and the general family of metriclearning classifiers that subsumes the Triplet Loss as a special case. Namely, we find a connection between the Magnet Loss magnet and theoretical guarantees of network robustness.
3 From Robustness to Clustering Loss
Recent work has shown that adversariallytrained DNNs, while robust, also tend to learn more semanticallyaligned features robustness_odds_acc; learning_perceptually_aligned. Inspired by these findings, we are interested in studying the converse implication, i.e. whether DNNs trained to learn such features enjoy robustness properties. To this end, we start by studying the robustness of a common family of classifiers used in deep metric learning deep_met_triplet_net; magnet, namely classifiers that are based on clustering semanticallysimilar inputs.
3.1 Robustness
Clusteringbased classifiers. Consider a training set consisting of inputlabel pairs , where belongs to one of classes, and a parameterized function , which can be a DNN. A clusteringbased classifier learns parameters such that clusters semanticallysimilar inputs (inputs with similar labels ) in feature space . That is, clusters each of the classes into different clusters ( may vary across classes). Hence, an input is assigned a label , if and only if, is closest, under some notion of distance, to one of the clusters representing class . To analyze the robustness of such classifiers, and without loss of generality, we consider a binary classification problem, where inputs belong to one of two classes, or , and each class is represented with a single cluster center, i.e. and . Let the cluster centers of and be and , respectively, in . Thus, is classified as , if and only if, , and as otherwise. Throughout this paper, we assume that is Lipschitz continuous cisse2017parseval: , where denotes the norm.
We are interested in the maximum norm of an input perturbation such that the clusteringbased binary classifier assigns the same class to both and . The following theorem provides a bound on such a , denoted the robustness radius. The detailed proof is left for the Appendix.
Theorem 1.
Consider the clusteringbased binary classifier that classifies as class , i.e. , with Lipschitz . The classifier output for the perturbed input will not differ from , i.e. , for all perturbations that satisfy:
(1) 
Proof Sketch.
It suffices to observe that the clusteringbased classifier is equivalent to a linear classifier, operating in representation space, defined by the hyperplane
. The result is deduced from the CauchySchwarz inequality and the Lipschitz continuity property of , where the bound is proportional to the distance to the hyperplane, as illustrated in Figure 2.It is to be observed that the robustness radius is agnostic to the choice of and . That is to say, the robustness radius in Theorem 1 is not concerned with the accuracy of the classifier, but only with changes in the prediction under input perturbations. Therefore, the cluster centers and are often learnt jointly along with the classifier parameters , such that the feature representations of inputs belonging to class are close to some learnt , while being far from the cluster center representing the other class. Note that if the clustering is performed with Kmeans, then the cluster centers are the average features belonging to that class, i.e. .
Generalization to the MultiClass MultiCluster Setting. We first consider the multiclass singlecluster case, i.e. , , where each class is represented by a single cluster center , as depicted in Figure 2. Analyzing the robustness around an input in this case is equivalent to analyzing the previously discussed binary classification case with respect to the two closest cluster centers i.e. and . For the multiclass multicluster case, where denotes the cluster of the class, it is sufficient to analyze the binary case between the closest cluster centers of two different classes. In this case, and . We leave the rest of the details for the Appendix.
3.2 Clustering Loss as a Robustness Regularizer
Theorem 1 provides a tight^{2}^{2}2Formal tightness analysis of the bound developed in Theorem 1 is included in the Appendix. robustness radius for each input. So, to attain both accurate and robust models, one can train DNNs to achieve accuracy, while simultaneously maximizing the robustness radius in Theorem 1 for every training input . Several observations can be made about the robustness radius. First, it is inversely proportional to the DNN’s Lipschitz constant , i.e. networks with smaller tend to enjoy better robustness. This is consistent with previous work that exploited this observation to enhance network robustness cisse2017parseval. In this paper, we focus on the term , and on learning parameters to maximize it, i.e. to push features far from cluster centers of different classes () and to pull features closer to cluster centers of their class (). As such, a general class of robustnessbased clustering losses can be formulated as follows:
(2) 
where is the class to which belongs. The function measures the separation between the feature of , i.e. , and the cluster centers of its class. Similarly, measures the separation between and the cluster centers of all other classes. The function combines the two measurements in an overall stable loss, so that minimization of the loss incites larger values for the numerator in Theorem 1. Note that iterative optimization of this loss requires updating . Hence, after every update, cluster centers can be recomputed by any clustering algorithm, e.g. Kmeans. Moreover, many losses commonly used in the deep metric learning literature NCM conform with Equation (2) as special cases, one of which is the popular Magnet Loss defined as:
(3) 
where , , and . While the Magnet Loss was introduced to address performance issues in metric learning algorithms, our objective of learning more semanticallyaligned features and our subsequent analysis of Theorem 1 suggest that this loss inherently encourages robustness. Regarding inference, DNNs trained with Magnet Loss
predict the class of a test input by computing a soft probability over the features
as follows:(4) 
Hence, is assigned to class . We refer the reader to magnet for more details. Next, we introduce ClusTR, a simple framework for training robust models based on our analytical findings.
3.3 ClusTR: Clustering Training for Robustness
Our theoretical study finds an intrinsic connection between clustering and robustness: clusteringbased classifiers intrinsically possess a robustness radius. As such, optimizing a loss designed for clustering tends to maximize this robustness radius. We also observe that a Clustering Loss such as Equation (2), which is designed to induce robustness according to Theorem 1, can be reduced to the Magnet Loss of Equation (3) as a special case. Based on these observations, we propose Clustering Training for Robustness (ClusTR): a simple and theoreticallymotivated framework for inducing robustness during DNN training without the need to generate adversaries. ClusTR exploits our theoretical findings by combining a Clustering Loss with simple DNN training techniques. For the Clustering Loss, ClusTR incorporates the wellstudied Magnet Loss to induce semantic clustering of instances in feature space. Although effective in its task, this loss suffers from slow convergence magnet. ClusTR mitigates this issue by introducing a simple warm start initialization. For a given model and dataset, ClusTR first conducts nominal training, i.e. standard Cross Entropy training, until reasonable performance is achieved. Then, it removes the last linear layer and finetunes the resulting DNN by applying the Magnet Loss on the output of the penultimate layer. The Magnet Loss in ClusTR aims at optimizing the robustness radius of Theorem 1, while using a warm start initialization to increase convergence speed without hindering test set accuracy. In this work, we choose the Magnet Loss to be the Clustering Loss in ClusTR. However, the result in Theorem 1 is agnostic to this choice, so we expect our results to extend to other choices of Clustering Loss.
4 Experiments
In this section, we conduct several experiments on synthetic and real datasets to validate the idea behind ClusTR. Specifically, we study (a) the effect of warm start on convergence speed and robustness, (b) how ClusTRtrained DNNs compare to adversariallytrained counterparts, and (c) how ClusTR can be equipped with a quick version of adversarial training to further enhance robustness.
4.1 Effect of warm start Initialization in ClusTR
Convergence. We assess the training convergence and the overall test accuracy performance for our proposed ClusTRtraining of ResNet18 on CIFAR10 and SVHN. In CIFAR10, we observe that training without warm start (i.e. Magnet Loss only) requires 106 minutes to fully train, while introducing the warm start reduces the required training time to 83 minutes on a GTX 1080Ti GPU.
Robustness. We study the effect of the warm start initialization on robustness by conducting controlled synthetic experiments and computing exact robustness radii. We train a 3layered neural network with 20 hidden units on the synthetic binary classification datasets depicted in Figures 3(a) and (c). On both datasets, we train (1) Magnet Loss with random initialization and (2) ClusTR. For simplicity, each class is represented with a single cluster, i.e. . Upon convergence, both models achieve accuracy. Given model predictions, we compute the robustness radius for each instance and report certified accuracy under varying radius in Figures 3(b) and (d). This is in line with common practice in the network certification literature randomized_smoothing. Note that certified accuracy at radius is defined as the percentage of instances that are both correctly classified and have a robustness radius larger than , as given by Theorem 1. We find that the ClusTRtrained DNNs, while accurate, also enjoy a larger robustness radius than DNNs trained with Magnet Loss without the warm start.
4.2 ClusTR Robustness and Comparison with StateoftheArt
Setup and Implementation Details. In this section, we conduct experiments with ResNet18 on the CIFAR10, CIFAR100, and SVHN datasets. We train models using our proposed ClusTR framework. Specifically, we first conduct nominal training until we get a reasonable performance^{3}^{3}3Models with test accuracies on CIFAR10, CIFAR100 and SVHN, respectively.. We then remove the last linear layer and finetune the network by applying the Magnet Loss
on the output feature of the remaining DNN. Finetuning is done for 30 epochs on CIFAR10 and SVHN, and 60 epochs on CIFAR100. Following
magnet, we use means kmeans++ to update cluster centers after each training epoch. To assess model robustness, we follow prior work and perform projected gradient descent (PGD) madryadv attacks with bounded perturbations that take the following form:where denotes the projection of the perturbed input onto the set ,
is the probability prediction vector computed from Equation (
4), and is the Cross Entropy loss. In all experiments, we perform PGD attacks with 10 random restarts around each input for 20 and 100 iterations, denoted by PGD and PGD, respectively. Following common practice in the literature freeadv; fastadv, we set the PGD step size to . We report the attacks with and leave experiments with other choices of for the Appendix.CIFAR10  SVHN  
Natural  PGD  PGD  Natural  PGD  PGD  
Nominal Training  95.01  0.00  0.00  98.38  0.00  0.00 
Free AT freeadv  85.96  46.33  46.19  86.98  46.52  46.06 
AT + PreTraining pretraining  87.30  57.40  57.20  85.12  47.18  46.72 
TRADES trades  84.92  56.61  56.43  91.63  57.45  55.28 
Magnet Loss magnet  83.14  23.71  22.54  91.95  40.73  38.59 
ClusTR  87.34  49.04  47.76  94.28  50.78  50.77 
QTRADES  81.07  44.18  43.42  86.36  43.05  42.24 
ClusTR + QTRADES  91.03  74.44  74.04  95.06  84.76  84.75 
. We compare ClusTR and ClusTR+QTRADES against Magnet Loss, Free Adversarial Training (Free AT), AT with ImageNet pretraining, TRADES, and QTRADES under
PGD attacks. ClusTR+QTRADES outperforms the adversariallytrained stateoftheart by a large margin. All numbers are percentages.Free AT  TRADES  QTRADES  Magnet  ClusTR  ClusTR+QTRADES  

Time (min.)  147  534  42  106  83  115 
faster than TRADES. Training time is computed on the same workstation using the same software platform (PyTorch
pytorch_neurips) and GPU (GTX 1080Ti).Experiments on CIFAR10 and SVHN. We evaluate the robustness of nominal training (as baseline), the Magnet Loss (i.e. ClusTR without warm start), and ClusTR, and we compare against several approaches that achieve stateofart robustness in this experimental setup, namely Free adversarial training (Free AT) freeadv with its reported best setting of 8 minibatchreplays that outperforms vanilla adversarial training madryadv, Adversarial Training with ImageNet pretraining (AT + PreTraining) that leverages external data to improve robustness, and TRADES trades, the current stateoftheart method. Note that all the robustness methods in this comparison employ various forms of adversarial training.
We evaluate using both natural accuracy, i.e. test set accuracy on clean images, and PGD test accuracy. Table 1 reports these results. First, we observe that training with Magnet Loss only on clean images results in substantial gains in robustness compared to nominal training. In fact, this loss increases PGD accuracy from 0% to 23.71%, while natural accuracy drops from 95.01% to 83.14%. This result constitutes empirical evidence of the theoretical robustness properties we presented for clusteringbased classifiers. Furthermore, training with ClusTR consistently outperforms Free AT in both natural and PGD accuracy for both CIFAR10 and SVHN. Specifically, ClusTR outperforms Free AT in PGD accuracy by and on CIFAR10 and SVHN, respectively, even though the former only trains with clean images. We note that ClusTR’s robustness gains over adversarial training are not accompanied with lower natural accuracy. In fact, the natural accuracy of ClusTR is 1% more in CIFAR10 and 7% more in SVHN.
These results show that the design of ClusTR inherently provides robustness properties without introducing adversaries during training. We complement this finding by studying the following question: Can equipping ClusTR with some form of adversarial training provide even more robustness? To do so, we equip ClusTR with a TRADES loss term, where the total loss becomes:
(5) 
Note that the Cross Entropy based TRADES formulation trades is similar to Equation (5), but with the first term replaced with , where is the output logits of the last linear layer and is the true label. In order to keep the framework simple and computationally efficient, we compute a quickestimate of the adversary in Equation (5). Namely, we start from a random uniform initialization and perform a single PGD step as opposed to TRADES’ multiple iterations. We refer to this setup as QTRADES^{4}^{4}4The rest of the implementation details of QTRADES are left for the Appendix.. Formally, for an input , we construct an adversarial input by perturbing with uniform noise, i.e. , and then generate by:
While QTRADES alone only achieves slightly lower natural accuracy and robustness as compared to Free AT, Table 1 shows that equipping ClusTR with QTRADES sets new stateofart robustness results on both datasets, outperforming all other methods. We observe that ClusTR+QTRADES achieves the highest natural accuracy among all methods with and , on CIFAR10 and SVHN respectively, thus improving upon the best competitor by on both datasets. Also, ClusTR+QTRADES surpasses current stateofart methods by impressive margins: and under strong PGD attacks on CIFAR10 and SVHN, respectively.
CIFAR100  
Natural  PGD  PGD  
Nominal Training  78.84  0.00  0.00 
Free AT freeadv  62.13  25.88  25.58 
AT+PreTraining pretraining  59.23  34.22  33.91 
TRADES trades  55.36  28.11  27.96 
ClusTR+QTRADES  69.25  52.47  52.40 
Experiments on CIFAR100. We extend our analysis of ClusTR+QTRADES to CIFAR100, and assess robustness with attacks. We report the results of this setup in Table 3, which shows that ClusTR+QTRADES significantly outperforms the strongest competitor by , thus setting a new stateofart robustness result on CIFAR100. We note that these large gains in robustness also come with a substantial increase in natural accuracy. For CIFAR100, the total number of clusters is . Following how magnet tackles the largeclusternumber regime, ClusTR+QTRADES inference in this case does not consider all clusters, as in Equation (4), but only the nearest clusters. While we take in this experiment, we find that the choice of has marginal impact on robustness. We leave a comprehensive ablation of for the Appendix.
Adaptive Attacks. While going against the current paradigm in network robustness literature, it has been argued that PGD attacks are insufficient to demonstrate network robustness. Recent work shows that many defenses can be broken with carefullycrafted attacks obfus, dubbed adaptive attacks, tailored to break the underlying defense adaptive_attacks. Therefore, we construct an example of a potential powerful attack tailored to our trained networks. Namely, we construct adversaries that maximize the loss as opposed to the cross entropy loss in PGD. Similar to previous experiments, the attacks are performed with 10 random restarts for 100 iterations and . Note that this attack precisely targets the objective, with which our models are trained, thus, the attack is expected to be stronger. Indeed, running this adaptive attack lowers the robustness accuracy from to on CIFAR10, and from to on SVHN. Despite this drop, our ClusTR+QTRADES approach still outperforms the stateoftheart by a substantial margin. It is essential to note here that this drop in robustness is considered to be rather marginal, as other defenses, when subjected to such tailored attacks, have their robustness drop close to 0, or lower than baseline robust models adaptive_attacks; obfus.
Training Time. We report the training time of the previous methods in Table 2. We note that ClusTR+QTRADES outperforms Free AT and TRADES both in terms of robustness and training time. The speedup is owed to two factors. First, the warm start initialization boosts the convergence of ClusTR compared to Magnet Loss. Second, QTRADES delivered its promises in being very efficient ( faster than TRADES). We leave the training time comparison on SVHN to the Appendix.
It is worthwhile to mention that our choice of QTRADES, out of the many adversarial training schemes with which ClusTR can be equipped, is motivated by (i) the theoretical support behind TRADES trades and (ii) QTRADES’ low computational cost. We also emphasize here that robustness could possibly be improved further by incorporating another adversarial training technique with ClusTR instead of QTRADES. We leave the search for this optimal choice for future work.
5 Conclusion
Inspired by work that observed a connection from robustness to semantics, this paper explores the complementary connection: from semantics to robustness. We showed that clusteringbased classifiers inherently enjoy a tight robustness radius against input perturbations and this radius can be maximized by clustering semanticallysimilar instances in representation space. Motivated by these findings, we proposed ClusTR (Clustering Training for Robustness), a simple and theoreticallymotivated framework for learning robust models without generating adversaries in training. Extensive experiments validated the theory motivating ClusTR and showed that ClusTR can achieve network robustness that is superior to adversariallytrained models. ClusTR can also be equipped with a quick version of adversarial training to set new stateoftheart robustness results against strong adversarial attacks on three benchmark datasets, while also maintaining a training time that is more than faster than the current stateoftheart robust training method.
6 Broader Impact
While the current performance of deep learning algorithms is unparalleled in several fields, the existence of adversarial examples hinders the inclusion of deep learning as a component in securitycritical applications. This issue prevents industrial applications from safely including deep learning in systems related to autonomous cars, computeraided surgical procedures and healthcare, among others, as deep learning components would constitute a security liability to potential malicious attacks. Hence, approaches addressing the robustness of deep learning, such as the one studied in this work, can make these algorithms more reliable, allowing the inclusion of powerful deep learning systems, and so permitting applications to enjoy their great performance. On the other hand, the addressing of adversarial robustness could prevent the usage of this security leak as a protection against deep learning systems designed with malicious intent.
Acknowledgments. This work was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research.
References
Appendix A Proof of Theorem 1
Theorem 1.
Consider the clusteringbased binary classifier that classifies as class , i.e. , with Lipschitz . The classifier output for the perturbed input will not differ from , i.e. , for all perturbations that satisfy:
Proof.
It suffices that for to be classified as .
(6)  
The inequality follows by Cauchy Schwarz and the Lipschitz of , i.e.
Thus, by rearranging the inequality in 6, the bound on stated in the Theorem guarantees , completing the proof.
∎
a.1 Generalization to multiclass multicluster case
The analysis that leads to Theorem 1, based on the singlecluster binary classification problem ( and ), can be extended to the multicluster multiclass case ( and ). This extension is achieved by reducing the multiclass multicluster case to the singlecluster binary classification problem we studied. Namely, denote by the cluster of the class, and select the centroids and as follows:
and
These assignments state that: (i) the selected centroids are from different class, hence fooling the classifier is welldefined, and (ii) the centroids are the two nearest centroids to that are from different classes.
Appendix B Decision Boundaries as a Voronoi Diagram
Here, we show that the decision boundaries of such a classifier forms Voronoi diagram that is constructed around the cluster centers. Following the earlier notation, and for the multiclass classifier, where each class is clustered in a single cluster with the center , one can characterize the decision boundaries between each two classes and as follows:
which is precisely the definition of the Voronoi diagram for the metric space over the cluster centers and .
Appendix C Tightness Analysis.
Proposition 1.
Consider the clusteringbased binary classifier that classifies as class , i.e. , with Lipschitz . If
then there exist a direction along which the classifier is fooled, i.e.
Proof.
We start by observing that the clusteringbased classifier that classifies as when and as otherwise, has decision boundaries given by the set . That is, the clusteringbased classifier is equivalent to the linear classifier, in the feature, such that is classified as when and as otherwise. Thus, we have that if belongs to then it suffices to show that there exists , satisfying the norm bound in the proposition, such that to prove the statement. We have that
(7)  
The last equality follows from Equation (6). Now, consider the choice of such that is in the direction , in particular, . Substituting back in Equation (7), we have:
Lastly, note that for any satisfying the bound in the proposition we have , i.e. is classified as completing the proof.
∎
Appendix D Implementation Details
Next, we describe the implementation details of ClusTR, along with details regarding QTRADES. Note that the supplementary material zip
file includes the implementation reproducing our results.
Architecture. We use a ResNet18 resnets modified to accept input images. The size of the output of the network in the penultimate layer, i.e. the feature dimension, is set to for all experiments.
Optimization. For the warm start stage of training ClusTR, we use the Adam optimizer kingma2014adam for 90 epochs with learning rate of that is multiplied by at epochs 30 and 60 with cross entropy loss. After that, we finetune the DNN with the Magnet Loss with a learning rate of for another 30 epochs for CIFAR10 and 60 epochs for CIFAR100 and SVHN.
Preprocessing.
Images are normalized by their channelwise mean and standard deviation. For CIFAR10 and CIFAR100. We apply standard data augmentation of random
crops with a padding of 4. For SVHN, we do not employ any data augmentation.
Magnet Loss. Following Rippel et al. magnet, we compute a stochastic approximation of the Magnet Loss. Hence, Magnet Loss training requires sampling neighborhoods of points in representation space, rather than independent samples. These neighborhoods are defined by a number of clusters and a number of samples per cluster. This sampling procedure does not guarantee that every instance will be sampled, nor that an instance shall be sampled only once. Therefore, we define an epoch as passing as many instances as there are available in the dataset, regardless if some instances were repeated or some instances were seen more than once. For sampling, we set the total number of sampled clusters to 12, and the number of samples per cluster to 20. Hence, the total amount of samples in each batch of each batch is . Cluster assignments are recomputed at the end of every epoch with the Kmeans clustering algorithm with the Kmeans initialization. We run grid search for optimizing the parameter in the Magnet Loss. We set to for ClusTR and ClusTR+QTRADES on CIFAR10; to for ClusTR and to for ClusTR+QTRADES on SVHN; to for ClusTR+QTRADES on CIFAR100.
QTRADES. We initialize the adversary by adding uniform noise in to the original instance, computing Cross Entropy between the original and adversarial instances and following one step of gradient ascent for Cross Entropy. The result of gradient ascent is always clipped so that the adversarial instances lies in image space, i.e. . The total loss with which the network is trained is a weighted sum of the Clustering Loss and the Cross Entropy between the original and adversarial instances. We cross validate over the regularization term balancing the two terms in Equation (5). We set to on CIFAR10, to on SVHN, and to on CIFAR100.
Appendix E Additional Experiments
e.1 Combining CE with DistanceBased Classifier
The robustness radius in Theorem 1 holds for any clusteringbased classifier of features produced by a Lipschitzcontinuous function . Therefore, we start by addressing the following question: if robustness is the aim, can one replace the last layer of a nominallytrained DNN with a clusteringbased classifier to achieve robustness? Addressing this question is essential to establish the necessity of enforcing clustering during training, i.e. training with ClusTR. To answer this question, we study a nominallytrained ResNet18 on CIFAR10, which achieves an accuracy of . We observe that directly applying Kmeans on the representations of the penultimate layer, and performing classification according to Equation (4) achieves an accuracy of , i.e. a performance drop of over . As adversaries will aim at changing the classifier’s predictions, the highest adversarial accuracy that this classifier can attain is upper bounded by . This result demonstrates that features learnt through nominal training are not spatially configured for clusteringbased classification. Hence, this result establishes that exploiting the benefits of clusteringbased classification requires to explicitly enforce clustering during DNN training.
e.2 Results of PGD Attacks with Other Values.
CIFAR10  SVHN  CIFAR100  

PGD  PGD  PGD  PGD  PGD  PGD  
81.99  81.54  87.48  87.47  60.15  59.77  
57.67  57.05  80.04  80.00  33.32  33.25  
35.88  34.98  71.56  71.45  17.76  17.65 
Table 4 reports the adversarial accuracies ClusTR + QTRADES under PGD attacks with since we reported the extensive results and comparison for in the main paper. Note that the robustness of our model is not limited to a specific value of .
e.3 Training Time on SVHN
Analogous to Table 2, we report training time comparison for various methods in Table 5. The reported times are the times it takes to for the models to converge based on the stopping criterion discussed in the earlier section or toward the last epoch. Note that ClusTR converges significantly faster than training with Magnet Loss with random initialization. Moreover, ClusTR+QTRADES improves on TRADES, thestateoftheart, in both PGD test accuracy and in training time.
Free AT  TRADES  QTRADES  Magnet  ClusTR  ClusTR+QTRADES  

Time (min.)  25  763  12  150  52  192 
e.4 Ablation on
ClusTR predicts the class of an input as a soft nearest cluster through Equation (4). The probabilities can also be computed by only considering the nearest clusters, as reported in the Experiments Section. Next, we report the effect of varying in terms of the natural and adversarial accuracies.
Figure 4 depicts the behavior of clean and adversarial accuracies with varying on CIFAR10. We observe that the effect of varying on both CIFAR10 and SVHN is negligible . The best PGD accuracy for both CIFAR10 and SVHN under the strong PGD attack was and , respectively (corresponding to ). On the other hand, this effect seems to be stronger on CIFAR100. It is worthwhile to mention that more than 50% of the choices of yields better robustness than the state of the art. Moreover, with which is exact setup of our theoretical result in Theorem 1, ClusTR+QTRADES surpasses the state of the art on all of the datasets by a significant margin. Finally, the best PGD accuracy on CIFAR100 is 53.25% with .