# Sparse DNNs with Improved Adversarial Robustness

Deep neural networks (DNNs) are computationally/memory-intensive and vulnerable to adversarial attacks, making them prohibitive in some real-world applications. By converting dense models into sparse ones, pruning appears to be a promising solution to reducing the computation/memory cost. This paper studies classification models, especially DNN-based ones, to demonstrate that there exists intrinsic relationships between their sparsity and adversarial robustness. Our analyses reveal, both theoretically and empirically, that nonlinear DNN-based classifiers behave differently under l_2 attacks from some linear ones. We further demonstrate that an appropriately higher model sparsity implies better robustness of nonlinear DNNs, whereas over-sparsified models can be more difficult to resist adversarial examples.

• 25 publications
• 179 publications
• 64 publications
• 29 publications
03/13/2021

### Learning Defense Transformers for Counterattacking Adversarial Examples

Deep neural networks (DNNs) are vulnerable to adversarial examples with ...
04/21/2020

### EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness against Adversarial Attacks

Ensuring robustness of Deep Neural Networks (DNNs) is crucial to their a...
06/11/2021

### Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks

Despite the great successes achieved by deep neural networks (DNNs), rec...
05/31/2019

### L0 Regularization Based Neural Network Design and Compression

We consider complexity of Deep Neural Networks (DNNs) and their associat...
03/22/2021

### Fast Approximate Spectral Normalization for Robust Deep Neural Networks

Deep neural networks (DNNs) play an important role in machine learning d...
05/20/2022

### Robust Sensible Adversarial Learning of Deep Neural Networks for Image Classification

The idea of robustness is central and critical to modern statistical ana...
09/29/2019

### Libraries of hidden layer activity patterns can lead to better understanding of operating principles of deep neural networks

Deep neural networks (DNNs) can outperform human brains in specific task...

## 1 Introduction

Although deep neural networks (DNNs) have advanced the state-of-the-art of many artificial intelligence techniques, some undesired properties may hinder them from being deployed in real-world applications. With continued proliferation of deep learning powered applications, one major concern raised recently is the heavy computation and storage burden that DNN models shall lay upon mobile platforms. Such burden stems from substantially redundant feature representations and parameterizations

Denil2013 . To address this issue and make DNNs less resource-intensive, a variety of solutions have been proposed. In particular, it has been reported that more than 90% of connections in a well-trained DNN can be removed using pruning strategies Han2015 ; Guo2016 ; Ullrich2017 ; Molchanov2017 ; Neklyudov2017 , while no accuracy loss is observed. Such a remarkable network sparsity leads to considerable compressions and speedups on both GPUs and CPUs Park2017 . Aside from being efficient, sparse representations are theoretically attractive Candes2006 ; Donoho2006 and have made their way into tremendous applications over the past decade.

Orthogonal to the inefficiency issue, it has also been discovered that DNN models are vulnerable to adversarial examples—maliciously generated images which are perceptually similar to benign ones but can fool classifiers to make arbitrary predictions Szegedy2014 ; Carlini2017 . Furthermore, generic regularizations (e.g., dropout and weight decay) do not really help on resisting adversarial attacks Goodfellow2015 . Such undesirable property may prohibit DNNs from being applied to security-sensitive applications. The cause of this phenomenon seems mysterious and remains to be an open question. One reasonable explanation is the local linearity of modern DNNs Goodfellow2015 . Quite a lot of attempts, including adversarial training Goodfellow2015 ; Tramer2018 ; Madry2018 , knowledge distillation Papernot2016 , detecting and rejecting Lu2017 , and some gradient masking techniques like randomization Xie2018 , have been made to ameliorate this issue and defend adversarial attacks.

It is crucial to study potential relationships between the inefficiency (i.e., redundancy) and adversarial robustness of classifiers, in consideration of the inclination to avoid “robbing Peter to pay Paul”, if possible. Towards shedding light on such relationships, especially for DNNs, we provide comprehensive analyses in this paper from both the theoretical and empirical perspectives. By introducing reasonable metrics, we reveal, somewhat surprising, that there is a discrepancy between the robustness of sparse linear classifiers and nonlinear DNNs, under attacks. Our results also demonstrate that an appropriately higher sparsity implies better robustness of nonlinear DNNs, whereas over-sparsified models can be more difficult to resist adversarial examples, under both the and circumstances.

## 2 Related Works

In light of the “Occam’s razor” principle, we presume there exists intrinsic relationships between the sparsity and robustness of classifiers, and thus perform a comprehensive study in this paper. Our theoretical and empirical analyses shall cover both linear classifiers and nonlinear DNNs, in which the middle-layer activations and connection weights can all become sparse.

The (in)efficiency and robustness of DNNs have seldom been discussed together, especially from a theoretical point of view. Very recently, Gopalakrishnan et al. Gopalakrishnan2018 ; Marzi2018 propose to sparsify the input representations as a defense and provide provable evidences on resisting attacks. Though intriguing, their theoretical analyses are limited to only linear and binary classification cases. Contemporaneous with our work, Wang et al. Wang2018 and Ye et al. Ye2018 experimentally discuss how pruning shall affect the robustness of some DNNs but surprisingly draw opposite conclusions. Galloway et al. Galloway2018 focus on binary DNNs instead of the sparse ones and show that the difficulty of performing adversarial attacks on binary networks DNNs remains as that of training.

To some extent, several very recent defense methods also utilize the sparsity of DNNs. For improved model robustness, Gao et al. Gao2017 attempt to detect the feature activations exclusive to the adversarial examples and prune them away. Dhillon et al. Dhillon2018 choose an alternative way that prunes activations stochastically to mask gradients. These methods focus only on the sparsity of middle-layer activations and pay little attention to the sparsity of connections.

## 3 Sparsity and Robustness of Classifiers

This paper aims at analyzing and exploring potential relationships between the sparsity and robustness of classifiers to untargeted white-box adversarial attacks, from both theoretical and practical perspectives. To be more specific, we consider models which learn parameterized mappings , when given a set of labelled training samples for supervision. Similar to a bunch of other theoretical efforts, our analyses start from linear classifiers and will be generalized to nonlinear DNNs later in Section 3.2.

Generally, the sparsity of a DNN model can be considered in two aspects: the sparsity of connections among neurons and the sparsity of neuron activations. In particular, the sparsity of activations also include that of middle-layer activations and inputs, which can be treated as a special case. Knowing that the input sparsity has been previously discussed

Gopalakrishnan2018 , we shall focus primarily on the weight and activation sparsity for nonlinear DNNs and just study the weight sparsity for linear models.

### 3.1 Linear Models

For simplicity of notation, we first give theoretical results for binary classifiers with , in which . We also ignore the bias term for clarity. Notice that can be simply rewritten as in which , so all our theoretical results in the sequel apply directly to linear cases with bias. Given ground-truth labels , a classifier can be effectively trained by minimizing some empirical loss

using a logistic sigmoid function like softplus:

Goodfellow2015 .

Adversarial attacks typically minimize an norm (e.g., , , and ) of the required perturbation under certain (box) constraints. Though not completely equivalent with the distinctions in our visual domain, such norms play a crucial role in evaluating adversarial robustness. We study both the and attacks in this paper. With an ambition to totalize them, we propose to evaluate the robustness of linear models using the following metrics that describe the ability of resisting them respectively:

 Binary: r∞:=Ex,y(1y=sgn(wTˇx)),r2:=Ex,y(1y=^y⋅d(x,~x)). (1)

Here we let and be the adversarial examples generated by applying the fast gradient sign (FGS) Goodfellow2015 and DeepFool Moosavi2016 methods as representatives. Without box constraints on the image domain, they can be regarded as the optimal and attacks targeting on the linear classifiers Marzi2018 ; Moosavi2016 . Function calculates the Euclidean distance between two

-dimensional vectors and we know that

.

The introduced two metrics evaluate robustness of classifiers from two different perspectives: calculates the expected accuracy on (FGS) adversarial examples and measures a decision margin between benign examples from the two classes. For both of them, higher value indicates stronger adversarial robustness. Note that unlike some metrics calculating (maybe normalized) Euclidean distances between all pairs of benign and adversarial examples, our omits the originally misclassfied examples, which makes more sense if the classifiers are imperfect in the sense of prediction accuracy. We will refer to , which is the conditional expectation for class .

Be aware that although there exists attack-agnostic guarantees on the model robustness Hein2017 ; Weng2018 , they are all instance-specific. Instead of generalizing them to the entire input space for analysis, we focus on the proposed statistical metrics and present their connections to the guarantees later in Section 3.2. Some other experimentally feasible metrics shall be involved in Section 4. The following theorem sheds light on intrinsic relationships between the described robustness metrics and norms of .

###### Theorem 3.1.

(The sparsity and robustness of binary linear classifiers). Suppose that for , and an obtained linear classifier achieves the same expected accuracy on different classes, then we have

 r2=t2⋅wT(μ+1−μ−1)∥w∥2andr∞≤t2⋅wT(μ+1−μ−1)ϵ∥w∥1. (2)
###### Proof.

For , we first rewrite it in the form of . We know from assumptions that and , so we further get

 r∞=∑k=±1t2Pr(k⋅wTx>ϵ∥w∥1|y=k,^y=k), (3)

by using the law of total probability and substituting

with . Lastly the result follows after using the Markov’s inequality.

As for , the proof is straightforward by similarly casting its definition into the sum of conditional expectations. That is,

 r2=∑k=±1t2Ex|y,^y(|wTx|∥w∥2∣∣∣y=k,^y=k). (4)

Theorem 3.1 indicates clear relationships between the sparsity and robustness of linear models. In terms of , optimizing the problem gives rise to a sparse solution of . By duality, maximizing the squared upper bound of also resembles solving a sparse PCA problem d2008 . Reciprocally, we might also concur that a highly sparse implies relatively robust classification results. Nevertheless, it seems that the defined has nothing to do with the sparsity of . It gets maximized iff approaches or , however, sparsifying probably does not help on reaching this goal. In fact, under some assumptions about data distributions, the dense reference model can be nearly optimal in the sense of . We will see this phenomenon remains in multi-class linear classifications in Theorem 3.2 but does not remain in nonlinear DNNs in Section 3.2. One can check Section 4.1 and 4.2 for some experimental discussions in more details.

Having realized that the robustness of binary linear classifiers is closely related to , we now turn to multi-class cases with the ground truth and prediction , in which indicates the -th column of a matrix . Here the training objective calculates the cross-entropy loss between ground truth labels and outputs of a softmax function. The introduced two metrics shall be slightly modified to:

 Multi-class: r∞:=Ex,y(1y=argmaxk(wTkˇx)),r2:=Ex,y(1y=^y⋅d(x,~x)). (5)

Likewise, and are the FGS and DeepFool adversarial examples under multi-class circumstances, in which and is carefully chosen such that is minimized. Denote an averaged classifier by , we provide upper bounds for both and in the following theorem.

###### Theorem 3.2.

(The sparsity and robustness of multi-class linear classifiers). Suppose that for , and an obtained linear classifier achieves the same expected accuracy on different classes, then we have

 r2≤tcc∑k=1(wk−¯w)Tμk∥wk−¯w∥2andr∞≤tcc∑k=1(wk−¯w)Tμkϵ∥wk−¯w∥1 (6)

under two additional assumptions: (I) FGS achieves higher per-class success rates than a weaker perturbation like , (II) the FGS perturbation does not correct misclassifications.

We present in Theorem 3.2 similar bounds for multi-class classifiers to that provided in Theorem 3.1, under some mild assumptions. Our proof is deferred to the supplementary material. We emphasize that the two additional assumptions are intuitively acceptable. First, increasing the classification loss in a more principled way, say using FGS, ought to diminish the expected accuracy more effectively. Second, with high probability, an original misclassification cannot be fixed using the FGS method, as one intends to do precisely the opposite.

Similarly, the presented bound for also implies sparsity, though it is the sparsity of . In fact, this is directly related with the sparsity of , considering that the classifiers can be post-processed to subtract their average simultaneously whilst the classification decision won’t change for any possible input. Particularly, Theorem 3.2 also partially suits linear DNN-based classifications. Let the classifier be factorized in a form of , it is evident to see that higher sparsity of the multipliers encourages higher probability of a sparse .

### 3.2 Deep Neural Networks

A nonlinear feedforward DNN is usually specified by a directed acyclic graph  Cisse2017

with a single root node for final outputs. According to the forward propagation rule, the activation value of each internal (and also output) node is calculated based on its incoming nodes and learnable weights corresponding to the edges. Nonlinear activation functions are incorporated to ensure the capacity. With biases, some nodes output a special value of one. We omit them for simplicity reasons as before.

Classifications are performed by comparing the prediction scores corresponding to different classes, which means . Benefit from some very recent theoretic efforts Hein2017 ; Weng2018 , we can directly utilize well-established robustness guarantees for nonlinear DNNs. Let us first denote by a close ball centred at with radius and then denote by the (best) local Lipschitz constant of function over a fixed , if there exists one. It has been proven that the following lemma offers a reasonable lower bound for the required norm of instance-specific perturbations when all classifiers are Lipschitz continuous Weng2018 .

###### Proposition 3.1.

Weng2018 Let and , then for any , and a set of Lipschitz continuous functions , with

 ∥Δx∥p≤min{mink≠^yg^y(x)−gk(x)Lkq,x,R}:=γ, (7)

it holds that , which means the classification decision does not change on .

Here the introduced is basically an instance-specific lower bound that guarantees the robustness of multi-class classifiers. We shall later discuss its connections with our s, for , and now we try providing a local Lipschitz constant (which may not be the smallest) of function , to help us delve deeper into the robustness of nonlinear DNNs. Without loss of generality, we will let the following discussion be made under a fixed radius and a given instance .

Some modern DNNs can be structurally very complex. Let us simply consider a multi-layer perceptron (MLP) parameterized by a series of weight matrices

, in which and . Discussions about networks with more advanced architectures like convolutions, pooling and skip connections can be directly generalized Bartlett2017 . Specifically, we have

 gk(xi)= wTkσ(WTd−1σ(…σ(WT1xi))), (8)

in which and

is the nonlinear activation function. Here we mostly focus on “ReLU networks” with rectified-linear-flavoured nonlinearity, so the neuron activations in middle-layers are naturally sparse. Due to clarity reasons, we discuss the weight and activation sparsities separately. Mathematically, we let

and for be the layer-wise activations. We will refer to

 Dj(x):=diag(1Wj[:,1]Taj−1>0,…,1Wj[:,nj]Taj−1>0), (9)

which is a diagonal matrix whose entries taking value one correspond to nonzero activations within the -th layer, and , which is a binary mask corresponding to each (possibly sparse) . Along with some analyses, the following lemma and theorem present intrinsic relationships between the adversarial robustness and (both weight and activation) sparsity of nonlinear DNNs.

###### Lemma 3.1.

(A local Lipschitz constant for ReLU networks). Let , then for any , and , the local Lipschitz constant of function satisfies

 Lkq,x≤∥w^y−wk∥qsupx′∈Bp(x,R)d−1∏j=1(∥Dj(x′)∥p∥Wj∥p). (10)
###### Theorem 3.3.

(The sparsity and robustness of nonlinear DNNs). Let the weight matrix be represented as , in which are independent Bernoulli random variables and , for . Then for any and , it holds that

 EM1,…,Md−1(Lk2,x)≤c2⋅(1−η(α1,…,αd−1;x)) (11)

and

 EM1,…,Md−1(Lk1,x)≤c1⋅(1−η(α1,…,αd−1;x)), (12)

in which function is monotonically increasing w.r.t. each , and are two constants.

###### Proof Sketch.

Function defined on is bounded from above and below, thus we know there exists an satisfying

 Lkq,x≤∥w^y−wk∥q∏j∥Dj(^x)∥p∥Wj∥p. (13)

Particularly, is fulfilled iff (i.e., it equals 1 for ). Under the assumptions on , we know that the entries of are independent of each other, thus

in which is a newly introduced scalar that equals or less equals to the probability of the -th neuron being deactivated. In this manner, we can recursively define the function and it is easy to validate its monotonicity. Additionally, we prove that holds for and the result follows. See the supplementary material for a detailed proof. ∎

In Lemma 3.1 we introduce probably smaller local Lipschitz constants than the commonly known ones (i.e., and ), and subsequently in Theorem 3.3 we build theoretical relationships between and the network sparsity, for (i.e., ). Apparently, is prone to get smaller if any weight matrix gets more sparse. It is worthy noting that the local Lipschitz constant is of great importance in evaluating the robustness of DNNs, and it is effective to regularize DNNs by just minimizing , or equivalently for differentiable continuous functions Hein2017 . Thus we reckon, when the network is over-parameterized, an appropriately higher weight sparsity implies a larger and stronger robustness. There are similar conclusions if gets more sparse.

Recall that in the linear binary case, we apply the DeepFool adversarial example when evaluating the robustness using . It is not difficult to validate that the equality holds for such and , which means the DeepFool perturbation ideally minimizes the Euclidean norm and helps us measure a lower bound in this regard. This can be directly generalized to multi-class classifiers. Unlike which represents a margin, our is basically an expected accuracy. Nevertheless, we also know that a perturbation of shall successfully fool the classifiers if .

## 4 Experimental Results

In this section, we conduct experiments to testify our theoretical results. To be consistent, we still start from linear models and turn to nonlinear DNNs afterwards. As previously discussed, we perform both and attacks on the classifiers to evaluate their adversarial robustness. In addition to the FGS Goodfellow2015 and DeepFool Moosavi2016 attacks which have been thoroughly discussed in Section 3, we introduce two more attacks in this section for extensive comparisons of the model robustness.

We use the FGS and randomized FGS (rFGS) Tramer2018 methods to perform attacks. As a famous attack, FGS has been widely exploited in the literature. In order to generate adversarial examples, it calculates the gradient of training loss w.r.t. benign inputs and uses its sign as perturbations, in an element-wise manner. The rFGS attack is a computationally efficient alternative to multi-step attacks with an ability of breaking adversarial training-based defences. We keep its hyper-parameters fixed for all experiments in this paper. For attacks, we choose DeepFool and the C&W’s attack Carlini2017 . DeepFool linearises nonlinear classifiers locally and approximates the optimal perturbations iteratively. C&W’s method casts the problem of constructing adversarial examples as optimizing an objective function without constraints, such that some recent gradient-descent-based solvers can be adopted. On the base of different attacks, four and values can be calculated for each classification model.

### 4.1 The Sparse Linear Classifier Behaves Differently under l∞ and l2 Attacks

In our experiments on linear classifiers, both the binary and multi-class scenarios shall be evaluated. We choose the well-established MNIST dataset as a benchmark, which consists of 70,000 images of handwritten digits. According to the official test protocol, 10,000 of them should be used for performance evaluation and the remaining 60,000 for training. For experiments on the binary cases, we randomly choose a pair of digits (e.g., “0” and “8” or “1” and “7”) as positive and negative classes. Linear classifiers are trained following our previous discussions and utilizing the softplus function: . Parameters and

are randomly initialized and learnt by means of stochastic gradient descent with momentum. For the “1” and “7” classification case, we train 10 reference models from different initializations and achieve a prediction accuracy of

on the benign test set. For the classification of all 10 classes, we train 10 references similarly and achieve a test-set accuracy of .

To produce models with different weight sparsities, we use a progressive pruning strategy Han2015 . That being said, we follow a pipeline of iteratively pruning and re-training. Within each iteration, a portion () of nonzero entries of , whose magnitudes are relatively small in comparison with the others, will be directly set to zero and shall never be activated again. After times of such “pruning”, we shall collect models from all 10 dense references. Here we set so the achieved final percentage of zero weights should be . We calculate the prediction accuracies on adversarial examples (i.e., ) under different attacks and the average Euclidean norm of required perturbations (i.e., ) under different attacks to evaluate the adversarial robustness of different models in practice. For attacks, we set .

Figure 1 illustrates how our metrics of robustness vary with the weight sparsity. We only demonstrate the variability of the first 12 points (from left to right) on each curve, to make the bars more resolvable. The upper and lower subfigures correspond to binary and multi-class cases, respectively. Obviously, the experimental results are consistent with our previous theoretical ones. While sparse linear models are prone to be more robust in the sense of , their robustness maintains similar or becomes even slightly weaker than the dense references, until there emerges inevitable accuracy degradations on benign examples (i.e., when may drop as well). We also observe from Figure 1 that, in both the binary and multi-class cases, starts decreasing much earlier than the benign-set accuracy. Though very slight in the binary case, the degradation of actually occurs after the first round of pruning (from to with DeepFool incorporated, and from to with the C&W’s attack).

### 4.2 Sparse Nonlinear DNNs Can be Consistently More Robust

Regarding nonlinear DNNs, we follow the same experimental pipeline as described in Section 4.1. We train MLPs with 2 hidden fully-connected layers and convolutional networks with 2 convolutional layers, 2 pooling layers and 2 fully-connected layers as references on MNIST, following the “LeNet-300-100” and “LeNet-5” architectures in network compression papers Han2015 ; Guo2016 ; Ullrich2017 ; Molchanov2017

. We also follow the training policy suggested by Caffe

Jia2014 and train network models for 50,000 iterations with a batch size of 64 such that the training cross-entropy loss does not decrease any longer. The well-trained reference models achieve much higher prediction accuracies (LeNet-300-100: and LeNet-5: ) than previous tested linear ones on the benign test set.

#### Weight sparsity.

Then we prune the dense references and illustrate some major results regarding the robustness and weight sparsity in Figure 2

(a)-(d). (See Figure 3 in our supplementary material for results under rFGS and the C&W’s attack.) Weight matrices/tensors within each layer is uniformly pruned so the network sparsity should be approximately equal to the layer-wise sparsity. As expected, we observe similar results to previous linear cases in the context of our

but significantly different results in . Unlike previous linear models which behave differently under and attacks, nonlinear DNN models show a consistent trend of adversarial robustness with respect to the sparsity. In particular, we observe increased and values under different attacks when continually pruning the models, until the sparsity reaches some thresholds and leads to inevitable capacity degradations. For additional verifications, we calculate the CLEVER Weng2018 scores that approximate attack-agnostic lower bounds of the norms of required perturbations (in Table 3 in the supplementary material).

Experiments are also conducted on CIFAR-10, in which deeper nonlinear networks can be involved. We train 10 VGG-like network models Neklyudov2017 (each incorporates 12 convolutional layers and 2 fully-connected layers) and 10 ResNet models He2016 (each incorporates 31 convolutional layers and a single fully-connected layers) from scratch. Such deep architectures lead to average prediction accuracies of and . Still, we prune dense network models in the progressive manner and illustrate quantitative relationships between the robustness and weight sparsity in Figure 2 (e)-(h). The first and last layers in each network are kept dense to avoid early accuracy degradation on the benign set. The same observations can be made. Note that the ResNets are capable of resisting some DeepFool examples, for which the second and subsequent iterations make little sense and can be disregarded.

#### Activation sparsity.

Having testified relationship between the robustness and weight sparsity of nonlinear DNNs, we now examine the activation sparsity. As previously mentioned, the middle-layer activations of ReLU incorporated DNNs are naturally sparse. We simply add a norm regularization of weight matrices/tensors to the learning objective to encourage higher sparsities and calculate and accordingly. Experiments are conducted on MNIST. Table 1 summarizes the results, in which “Sparsity ()” indicates the percentage of deactivated (i.e., zero) neurons feeding to the last fully-connected layer. Here the and values are calculated using the FGS and DeepFool attacks, respectively. Apparently, we still observe positive correlations between the robustness and (activation) sparsity in a certain range.

### 4.3 Avoid “Over-pruning”

We discover from Figure 2 that the sharp decrease of the adversarial robustness, especially in the sense of , may occur in advance of the benign-set accuracy degradation. Hence, it can be necessary to evaluate the adversarial robustness of DNNs during an aggressive surgery, even though the prediction accuracy of compressed models may remain competitive with their references on benign test-sets. To further explore this, we collect some off-the-shelf sparse models (including a compressed LeNet-300-100 and a compressed LeNet-5) Guo2016 and their corresponding dense references from the Internet and hereby evaluate their and robustness. Table 2 compares the robustness of different models. Obviously, these extremely sparse models are more vulnerable to the DeepFool attack, and what’s worse, the over pruned LeNet-5 seems also more vulnerable to FGS, which suggests researchers to take care and avoid “over-pruning” if possible. One might also discover the fact with other pruning methods.

## 5 Conclusions

In this paper, we study some intrinsic relationships between the adversarial robustness and the sparsity of classifiers, both theoretically and empirically. By introducing plausible metrics, we demonstrate that unlike some linear models which behave differently under and attacks, sparse nonlinear DNNs can be consistently more robust to both of them than their corresponding dense references, until their sparsity reaches certain thresholds and inevitably causes harm to the network capacity. Our results also demonstrate that such sparsity, including sparse connections and middle-layer neuron activations, can be effectively imposed using network pruning and regularization of weight tensors.

## Acknowledgement

We would like to thank anonymous reviewers for their constructive suggestions. Changshui Zhang is supported by NSFC (Grant No. 61876095, No. 61751308 and No. 61473167) and Beijing Natural Science Foundation (Grant No. L172037).

## References

• [1] Peter L Bartlett, Dylan J Foster, and Matus J Telgarsky. Spectrally-normalized margin bounds for neural networks. In NIPS, 2017.
• [2] Emmanuel J Candès, Justin Romberg, and Terence Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2):489–509, 2006.
• [3] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In SP, 2017.
• [4] Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. In ICML, 2017.
• [5] Alexandre d’Aspremont, Francis Bach, and Laurent El Ghaoui.

Optimal solutions for sparse principal component analysis.

JMLR, 9(July):1269–1294, 2008.
• [6] Misha Denil, Babak Shakibi, Laurent Dinh, Marc’Aurelio Ranzato, and Nando De Freitas. Predicting parameters in deep learning. In NIPS, 2013.
• [7] Guneet S Dhillon, Kamyar Azizzadenesheli, Zachary C Lipton, Jeremy Bernstein, Jean Kossaifi, Aran Khanna, and Anima Anandkumar. Stochastic activation pruning for robust adversarial defense. In ICLR, 2018.
• [8] David L Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289–1306, 2006.
• [9] Angus Galloway, Graham W Taylor, and Medhat Moussa.

Attacking binarized neural networks.

In ICLR, 2018.
• [10] Ji Gao, Beilun Wang, Zeming Lin, Weilin Xu, and Yanjun Qi. Deepcloak: Masking deep neural network models for robustness against adversarial samples. In ICLR Workshop, 2017.
• [11] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015.
• [12] Soorya Gopalakrishnan, Zhinus Marzi, Upamanyu Madhow, and Ramtin Pedarsani. Combating adversarial attacks using sparse representations. In ICLR Workshop, 2018.
• [13] Yiwen Guo, Anbang Yao, and Yurong Chen. Dynamic network surgery for efficient dnns. In NIPS, 2016.
• [14] Song Han, Jeff Pool, John Tran, and William Dally. Learning both weights and connections for efficient neural network. In NIPS, 2015.
• [15] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
• [16] Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classifier against adversarial manipulation. In NIPS, 2017.
• [17] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. In MM, 2014.
• [18] Jiajun Lu, Theerasit Issaranon, and David Forsyth. Safetynet: Detecting and rejecting adversarial examples robustly. In ICCV, 2017.
• [19] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018.
• [20] Zhinus Marzi, Soorya Gopalakrishnan, Upamanyu Madhow, and Ramtin Pedarsani. Sparsity-based defense against adversarial attacks on linear classifiers. arXiv preprint arXiv:1801.04695, 2018.
• [21] Dmitry Molchanov, Arsenii Ashukha, and Dmitry Vetrov. Variational dropout sparsifies deep neural networks. In ICML, 2017.
• [22] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. DeepFool: a simple and accurate method to fool deep neural networks. In CVPR, 2016.
• [23] Kirill Neklyudov, Dmitry Molchanov, Arsenii Ashukha, and Dmitry P Vetrov. Structured bayesian pruning via log-normal multiplicative noise. In NIPS, 2017.
• [24] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In SP, 2016.
• [25] Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey. Faster cnns with direct sparse convolutions and guided pruning. In ICLR, 2017.
• [26] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014.
• [27] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. In ICLR, 2018.
• [28] Karen Ullrich, Edward Meeds, and Max Welling. Soft weight-sharing for neural network compression. In ICLR, 2017.
• [29] Luyu Wang, Gavin Weiguang Ding, Ruitong Huang, Yanshuai Cao, and Yik Chau Lui. Adversarial robustness of pruned neural networks. In ICLR Workshop submission, 2018.
• [30] Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and Luca Daniel. Evaluating the robustness of neural networks: An extreme value theory approach. In ICLR, 2018.
• [31] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. In ICLR, 2018.
• [32] Shaokai Ye, Siyue Wang, Xiao Wang, Bo Yuan, Wujie Wen, and Xue Lin.

Defending DNN adversarial attacks with pruning and logits augmentation.

In ICLR Workshop submission, 2018.