We provide recovery guarantees for compressible signals that have been corrupted with noise and extend the framework introduced in  to defend neural networks against -norm and -norm attacks. Concretely, for a signal that is approximately sparse in some transform domain and has been perturbed with noise, we provide guarantees for accurately recovering the signal in the transform domain. We can then use the recovered signal to reconstruct the signal in its original domain while largely removing the noise. Our results are general as they can be directly applied to most unitary transforms used in practice and hold for both -norm bounded noise and -norm bounded noise. In the case of -norm bounded noise, we prove recovery guarantees for Iterative Hard Thresholding (IHT) and Basis Pursuit (BP). For the case of -norm bounded noise, we provide recovery guarantees for BP. These guarantees theoretically bolster the defense framework introduced in  for defending neural networks against adversarial inputs. Finally, we experimentally demonstrate this defense framework using both IHT and BP against the One Pixel Attack , Carlini-Wagner and attacks , Jacobian Saliency Based attack , and the DeepFool attack  on CIFAR-10 , MNIST , and Fashion-MNIST  datasets. This expands beyond the experimental demonstrations of .
Signal measurements are often corrupted due to measurement errors and can even be corrupted due to adversarial noise injection. Supposing some structure on the measurement mechanism, is it possible for us to retrieve the original signal from a corrupted measurement? Indeed, it is generally possible to do so using the theory of Compressive Sensing 
if certain constraints on the measurement mechanism and the signal hold. In order to make the question more concrete, let us consider the class of machine learning problems where the inputs are compressible (i.e., approximately sparse) in some domain. For instance, images and audio signals are known to be compressible in their frequency domain and machine learning algorithms have been shown to perform exceedingly well on classification tasks that take such signals as input[12, 23]. However, it was found in  that neural networks can be easily forced into making incorrect predictions with high-confidence by adding adversarial perturbations to their inputs; see also [24, 9, 19, 4]. Further, the adversarial perturbations that led to incorrect predictions were shown to be very small (in either -norm or -norm) and often imperceptible to human beings. For this class of machine learning tasks, we show that it is possible to recover original inputs from adversarial inputs and defend the neural network.
In this paper, we first provide recovery guarantees for compressible signals that have been corrupted by noise bounded in either -norm or -norm. Then we extend the framework introduced in  to defend neural networks against -norm and -norm attacks. In the case of -norm attacks on neural networks, the adversary can perturb a bounded number of elements in the input but has no restriction on how much each element is perturbed in absolute value. In the case of -norm attacks, the adversary can perturb as many elements as they choose as long as the
-norm of the perturbation vector is bounded. Our recovery guarantees cover both cases and provide a partial theoretical explanation for the robustness of the defense framework against adversarial inputs. Our contributions can be summarized as follows:
We provide recovery guarantees for IHT and BP when the noise budget is bounded in -norm.
We provide recovery guarantees for BP when the noise budget is bounded in the -norm.
We extend the framework introduced in  to defend neural networks against -norm bounded and -norm bounded attacks.
The paper is organized as follows. We present the defense framework introduced in , which we call Compressive Recovery Defense (CRD), in Section 3.1. We present our main theoretical results (i.e. the recovery guarantees) in Section 3.2 and compare these results to related work in Section 3.3. We establish the Restricted Isometry Property (RIP) in Section 4 provide the proofs of our main results in Sections 5 and 6. We show that CRD can be used to defend against -norm and -norm bounded attacks in Section 7 and conclude the paper in Section 8.
Let be a vector in and let with . The support of , denoted by , is set of indices of the non-zero entries of , that is, . The -norm of , denoted , is defined to be the number of non-zero entries of , i.e. . We say that is -sparse if . We denote by either the sub-vector in consisting of the entries indexed by or the vector in that is formed by starting with and setting the entries with indices outside of to zero. For example, if and , then is either or . In the latter case, note . It will always be clear from the context which meaning is intended. If is a matrix, we denote by the column sub-matrix of consisting of the columns indexed by .
We use to denote a -sparse vector in consisting of the largest (in absolute value) entries of with all other entries zero. For example, if then . Note that may not be uniquely defined. In contexts where a unique meaning for is needed, we can choose out of all possible candidates according to a predefined rule (such as the lexicographic order). We also define .
Let with , then is called -sparse if is -sparse and is -sparse. We define , which is a -sparse vector in . Again, may not be uniquely defined, but when a unique meaning for is needed (such as in Algorithm 1), we can choose out of all possible candidates according to a predefined rule.
3 Main Results
In this section we outline the problem and the framework introduced in , state our main theorems, and compare our results to related work.
3.1 Compressive Recovery Defense (CRD)
Consider an image classification problem in which a machine learning classifier takes an image re-constructed from its largest Fourier co-efficients as input and outputs a classification decision. Letbe the image vector (we can assume the image is of size for instance). Then, letting
be the unitary Discrete Fourier Transform (DFT) matrix, we get the Fourier coefficients ofas .
It is well known that natural images are approximately sparse in the frequency domain and therefore we can assume that is -sparse, that is . In our example of the image classification problem, this means that our machine learning classifier can accept as input the image reconstructed from , and still output the correct decision. That is, the machine learning classifier can accept as input and still output the correct decision. Now, suppose an adversary corrupts the original image and we observe . Noting that can also be written as , we are interested in recovering an approximation to upon observing , such that when we feed as input to the classifier, it can still output the correct classification decision.
More generally, this basic framework can be used for adversarial inputs in any input domain, as long as there exists a matrix such that , where is approximately sparse and for some . If we can recover an approximation to with bounds on the recovery error, then we can use to reconstruct an approximation to with controlled error.
This general framework was proposed by . Moving forward, we refer to this general framework as Compressive Recovery Defense (CRD) and utilize it to defend neural networks against and -norm attacks. As observed in , in Algorithm 1, can be initialized randomly to defend against a reverse-engineering attack. In the case of Algorithm 2, the minimization problem can be posed as a Second Order Cone Programming (SOCP) problem and it appears non-trivial to create a reverse engineering attack that will retain the adversarial noise through the recovery and reconstruction process.
Our main results are stated below. Theorem 1 and Theorem 2 provide bounds on the recovery error with Algorithm 1 and Algorithm 2 respectively when the noise is bounded in -norm. Theorem 3 covers the case when the noise is bounded in the -norm. We start with providing bounds on the approximation error using IHT when the noise is bounded in -norm.
The result above applies to unitary transformations such as the Fourier Transform, Cosine Transform, Sine Transform, Hadamard Transform, and other wavelet transforms. Since the constant in the above bound can be made arbitrarily small, the recovery error in equations (2) and (3) depends primarily on which is small for sparse signals.
Next, we consider the recovery error when using BP instead of IHT. Providing bounds BP is useful as there are cases 111 As shown in Section 7.1.1 and Section 7.2.2 when (i) BP provides recovery guarantees against a larger noise budget than IHT and (ii) BP leads to a better reconstruction than IHT.
Let , where is a unitary matrix with and is the identity matrix. Let , and let be positive integers. Define
If and , then for a solution of Algorithm 2, we have the error bound
where we write with .
Our final result covers the case when the noise is bounded in -norm. Note that the result covers all unitary matrices and removes the restriction on the magnitude of their elements. We will utilize this result in defending against -norm attacks.
Let be a unitary matrix and let , where is -sparse and . If , then for a solution of Algorithm 2, we have the error bound
3.3 Comparison to Related Work
We now summarize research efforts for the problem of defending neural networks against adversarial inputs.
The authors of  introduced the CRD framework which inspired this work. The main theorem (Theorem 2.2) of  is an analog of our Theorem 1 and provides a similar bound the approximation error for recovery via IHT. First note that the statement of the Theorem 2.2 of  is missing the required hypothesis . This hypothesis appears in Lemma 3.6 of , which is used to prove Theorem 2.2, but it appears to have been accidentally dropped from the statement of Theorem 2.2. We note that, by making the constants explicit, the proof of Lemma 3.6 of  gives the same restricted isometry property that we do in Theorem 6. Therefore, the guarantees we obtain for IHT are essentially the same as in . The main difference is that, to derive recovery guarantees for IHT from the restricted isometry property, we utilize Theorem 7 below (which is a modified version of Theorem 6.18 of ) while the authors of  utilize Theorem 3.4 in  (which is taken from ).
Other works that provide guarantees include  and  where the authors frame the problem as one of regularizing the Lipschitz constant of a network and provide a lower bound on the norm of the perturbation required to change the classifier decision. The authors of  use robust optimization to perturb the training data and provide a training procedure that updates parameters based on worst case perturbations. A similar approach to  is  in which the authors use robust optimization to provide lower bounds on the norm of adversarial perturbations on the training data. In , the authors use techniques from Differential Privacy  in order to augment the training procedure of the classifier to improve robustness to adversarial inputs. Another approach using randomization is  in which the authors add i.i.d Gaussian noise to the input and provide guarantees of maintaining classifier predictions as long as the -norm of the attack vector is bounded by a function that depends on the output of the classifier.
Most defenses against adversarial inputs do not come with theoretical guarantees. Instead, a large body of research has focused on finding practical ways to improve robustness to adversarial inputs by either augmenting the training data , using adversarial inputs from various networks , or by reducing the dimensionality of the input . For instance,  use robust optimization to make the network robust to worst case adversarial perturbations on the training data. However, the effectiveness of their approach is determined by the amount and quality of training data available and its similarity to the distribution of the test data. An approach similar to ours but without any theoretical guarantees is 
. In this work, the authors use Generative Adversarial Networks (GANs) to estimate the distribution of the training data and during inference, use a GAN to reconstruct an input that is most similar to a given test input and is not adversarial.
4 Restricted Isometry Property
All of our recovery guarantees are based on the following theorem which establishes a restricted isometry property for certain structured matrices. First, we give some definitions.
Let be a matrix in , let , and let . We say that satisfies the -restricted isometry property (or M-RIP) with constant if
for all .
We define to be the set of all -sparse vectors in and define to be the collection of subsets of of cardinality less than or equal to . Note that is the collection of supports of vectors in . Similarly, we define to be the set of -sparse vectors in . In other words, is the following subset of :
We define to be the following collection of subsets of :
Note that is the collection of supports of vectors in .
Let , where is a unitary matrix with and is the identity matrix. Then
for all . In other words, satisfies the -RIP property with constant .
In this proof, if denotes an matrix in , then
denote the eigenvalues ofordered so that . It suffices to fix an and prove (7) for all non-zero .
is normal, there is an orthonormal basis of eigenvectorsfor , where corresponds to the eigenvalue . For any non-zero , we have for some , so
Thus it will suffice to prove that for all . Moreover,
So far we have not used the structure of , but now we must.
Observe that is a block diagonal matrix with two diagonal blocks of the form and . Therefore the three matrices , , and have the same non-zero eigenvalues. Moreover, is simply the matrix with those rows not indexed by deleted. The hypotheses on imply that the entries of satisfy So the Gershgorin disc theorem implies that each eigenvalue of and (hence) of satisfies .
5 Iterative Hard Thresholding
Now we utilize the result of Theorem 6 to prove recovery guarantees for the following Iterative Hard Thresholding algorithm.
Let be a matrix. Let be positive integers and suppose is a -RIP constant for and that is a -RIP constant for . Let , , , and . Letting , we have the approximation error bound
where and . In particular, if , then and ; the latter implies that the first term on the right goes to zero as goes to .
Theorem 7 is a modification of Theorem 6.18 of . More specifically, Theorem 6.18 of  considers , , and in place of and and and any dimension in place of . The proofs are very similar, so we omit the proof of Theorem 7.
Proof of Theorem 1.
Now let , which gives . Noting that , we can use the same reasoning as used in . We first define which means and since , we have . Since the support of is at most and since , we can use the fact that for a -sparse vector , to get the bound:
6 Basis Pursuit
Next we introduce the Basis Pursuit algorithm and prove its recovery guarantees for -norm and -norm noise.
We begin by stating some definitions that will be required in the proofs of the main theorems.
The matrix satisfies the robust null space property with constants , and norm if for every set with and for every we have
The matrix satisfies the robust null space property of order with constants , and norm if for every set with and for every we have
Note that if then this is simply the robust null space property.
The proof of Theorem 2 requires the following theorem (whose full proof is given in the cited work).
Theorem 10 (Theorem 4.33 in ).
Let be the columns of , let with largest absolute entries supported on , and let with . For with , assume that:
and that there exists a vector with such that
If , then a minimizer of subject to satisfies:
where and .
We will need another Lemma before proving Theorem 2.
Let , if for all , then, , for any .
Let be given. Then for any , we have
We can re-write this as : . Noting that is Hermitian, we have:
Proof of Theorem 2.
First note that by Theorem 6, satisfies the - property with constant . Therefore, by Lemma 11, for any , we have . Since is a positive semi-definite matrix, it has only non-negative eigenvalues that lie in the range . Since by assumption, is injective. Thus, we can set: and get:
where and we have used the following facts: since , we get that
and that the largest singular value ofis less than . Now let , then . Now we need to bound the value . Denoting row of by the vector , we see that it has at most non-zero entries and that for . Therefore, for any element , we have:
Defining and , we get and also observe that . Therefore, all the hypotheses of Theorem 10 have been satisfied. Note that , where . Therefore, setting , we use the fact combined with the bound in Theorem 10 to get (4):
where we write with . ∎
We note that since Algorithm 2 is not adapted to the structure of the matrix in the statement of Theorem 2, one can expect the guarantees to be weaker. We now focus on proving Theorem 3. In order to do so, we will need to state a some lemmas that will be used in the main proof.
If a matrix satisfies the robust null space property for , with card, then it satisfies the robust null space property for with constants .
For any , . Then, using the fact that , we get: