1.1 Introduction on Compressed Sensing with Corruptions
Compressed sensing (CS) has been well-studied in recent years [9, 19]. This novel theory asserts that a sparse or approximately sparse signal can be acquired by taking just a few non-adaptive linear measurements. This fact has numerous consequences which are being explored in a number of fields of applied science and engineering. In CS, the acquisition procedure is often represented as , where is called the sensing matrix and
is the vector of measurements or observations. It is now well-established that the solutionto the optimization problem
is guaranteed to be the original signal
with high probability, providedis sufficiently sparse and obeys certain conditions. A typical result is this: if has iid Gaussian entries, then exact recovery occurs provided [10, 18, 37] for some positive numerical constant . Here is another example, if is a matrix with rows randomly selected from the DFT matrix, the condition becomes .
This paper discusses a natural generalization of CS, which we shall refer to as compressed sensing with corruptions. We assume that some entries of the data vector are totally corrupted but we have absolutely no idea which entries are unreliable. We still want to recover the original signal efficiently and accurately. Formally, we have the mathematical model
where and . The number of nonzero coefficients in is and similarly for . As in the above model, is an
sensing matrix, usually sampled from a probability distribution. The problem of recovering(and hence ) from has been recently studied in the literature in connection with some interesting applications. We discuss a few of them.
Clipping. Signal clipping frequently appears because of nonlinearities in the acquisition device [27, 38]. Here, one typically measures rather than , where is always a nonlinear map. Letting , we thus observe . Nonlinearities usually occur at large amplitudes so that for those components with small amplitudes, we have . This means that is sparse and, therefore, our model is appropriate. Just as before, locating the portion of the data vector that has been clipped may be difficult because of additional noise.
CS for networked data. In a sensor network, different sensors will collect measurements of the same signal independently (they each measure ) and send the outcome to a center hub for analysis [23, 30]. By setting as the row vectors of , this is just . However, typically some sensors will fail to send the measurements correctly, and will sometimes report totally meaningless measurements. Therefore, we collect , where models recording errors.
1.2 Introduction on matrix completion with corruptions
Matrix completion (MC) bears some similarity with CS. Here, the goal is to recover a low-rank matrix from a small fraction of linear measurements. For simplicity, we suppose the matrix is square as above (the general case is similar). The standard model is that we observe where and
, for example. Here one minimizes the nuclear norm — the sum of all the singular values— to recover the original low rank matrix. We discuss below an improved result due to Gross  (with a slight difference).
Define for some by meaning that
are iid Bernoulli random variables with parameter. Then the solution to
is guaranteed to be exactly with high probability, provided .
Here, is a positive numerical constant, is the rank of , and is an incoherence parameter introduced in  which is only dependent of .
This paper is concerned with the situation in which some entries may have been corrupted. Therefore, our model is that we observe
where and are the same as before and is supported on . Just as in CS, this model has broad applicability. For example, Wu et al. used this model in photometric stereo . This problem has also been introduced in  and is related to recent work in separating a low-rank from a sparse component [14, 4, 24, 13, 43]. A typical result is that the solution to
1.3 Main results
This section introduces three models and three corresponding recovery results. The proofs of these results are deferred to Section 2 for Theorem 1.1, Section 3 for Theorem 1.2 and Section 4 for Theorem 1.3.
1.3.1 CS with iid matrices [Model 1]
satisfies with probability at least . This holds universally; that is to say, for all vectors and obeying and . Here , , and are numerical constants.
In the above statement, the matrix is random. Everything else is deterministic. The reader will notice that the number of nonzero entries is on the same order as that needed for recovery from clean data [10, 19, 3, 37], while the condition of implies that one can tolerate a constant fraction of possibly adversarial errors. Moreover, our convex optimization is related to LASSO  and Basis Pursuit .
1.3.2 CS with general sensing matrices [Model 2]
In this model, and
where are iid copies of a random vector whose
distribution obeys the following two properties: 1)
; 2) . This model has
been introduced in  and includes a lot of the stochastic
models used in the literature. Examples include partial DFT matrices,
matrices with iid entries, certain random convolutions  and so on.
In this model, we assume that and in (1.2) have fixed support denoted by and , and with cardinality and . In the remainder of the paper, is the restriction of to indices in and is the restriction of to . Our main assumption here concerns the sign sequences: the sign sequences of and are independent of each other, and each is a sequence of symmetric iid variables.
For the model above, the solution to (1.3), with , is exact with probability at least , provided that and . Here , and are some numerical constants.
Above, and have fixed supports and random signs. However, by a
recent de-randomization technique first introduced in ,
exact recovery with random supports and fixed signs would also
hold. We will explain this de-randomization technique in the proof of Theorem 1.3. In some specific models, such as independent rows from the DFT matrix, could be a numerical constant, which implies the proportion of corruptions is also a constant. An open problem is whether Theorem 1.2 still holds in the case where and have both fixed supports and signs. Another open problem is to know whether the
result would hold under more general conditions about as in
 in the case where has both random support and random signs.
We emphasize that the sparsity condition is a little stronger than the optimal result available in the noise-free literature [9, 7]), namely,. The extra logarithmic factor appears to be important in the proof which we will explain in Section 3, and a third open problem is whether or not it is possible to remove this factor.
Here we do not give a sensitivity analysis for the recovery procedure as in Model 1. Actually by applying a similar method introduced in  to our argument in Section 3, a very good error bound could be obtained in the noisy case. However, technically there is little novelty but it will make our paper very long. Therefore we decide to only discuss the noiseless case and focus on the sampling rate and corruption ratio.
1.3.3 MC from corrupted entries [Model 3]
We assume is of rank and write its reduced SVD as , where and . Let be the smallest quantity such that for all ,
1. Fix an by matrix , whose entries are either or .
2. Define for a constant satisfying . Specifically speaking, are iid Bernoulli random variables with parameter .
3. Conditioning on , assume that are independent events with . This implies that .
4. Define . Then we have
5. Let be supported on , and .
Under Model 3.1, suppose and . Moreover, suppose and denote as the optimal solution to the problem (1.6). Then we have with probability at least for some numerical constant , provided the numerical constants is sufficiently small and is sufficiently large.
In this model is available while , and are not known explicitly from the observation . By the assumption , we can use to approximate . From the following proof we can see that is not required to be exactly for the exact recovery. The power of our result is that one can recover a low-rank matrix from a nearly minimal number of samples even when a constant proportion of these samples has been corrupted.
We only discuss the noiseless case for this model. Actually by a method similar to 
, a suboptimal estimation error bound can be obtained by a slight modification of our argument. However, it is of little interest technically and beyond the optimal result whenis large. There are other suboptimal results for matrix completion with noise, such as , but the error bound is not tight when the additional noise is small. We want to focus on the noiseless case in this paper and leave the problem with noise for future work.
The values of are chosen for theoretical guarantee of exact recovery in Theorem 1.1, 1.2 and 1.3. In practice, is usually taken by cross validation.
1.4 Comparison with existing results, relative works and our contribution
In this section we will compare Theorems 1.1, 1.2 and 1.3
with existing results in the literature.
We begin with Model 1. In , Wright and Ma discussed a model where the sensing matrix has independent columns with common mean and normal perturbations with variance . They chose , and proved that with high probability provided , and has random signs. Here is much smaller than . We notice that since the authors of  talked about a different model, which is motivated by , it may not be comparable with ours directly. However, for our motivation of CS with corruptions, we assume satisfy a symmetric distribution and get better sampling rate.
A bit later, Laska et al.  and Li et al.  also studied this problem. By setting , both papers establish that for Gaussian (or sub-Gaussian) sensing matrices , if , then the recovery is exact. This follows from the fact that obeys a restricted isometry property known to guarantee exact recovery of sparse vectors via minimization. Furthermore, the sparsity requirement about is the same as that found in the standard CS literature, namely, . However, the result does not allow a positive fraction of corruptions. For example, if , we have , which will go to zero as goes to zero.
As for Model 2, an interesting piece of work  (and later  on the noisy case) appeared during the preparation of this paper. These papers discuss models in which
is formed by selecting rows from an orthogonal matrix with low incoherence parameter, which is the minimum value such that for any . The main result states that selecting gives exact recovery under the following assumptions: 1) the rows of are chosen from an orthogonal matrix uniformly at random; 2) is a random signal with independent signs and equally likely to be either ; 3) the support of is chosen uniformly at random. (By the de-randomization technique introduced in  and used in , it would have been sufficient to assume that the signs of are independent and take on the values with equal probability). Finally, the sparsity conditions require and , which are nearly optimal, for the best known sparsity condition when is . In other words, the result is optimal up to an extra factor of ; the sparsity condition about is of course nearly optimal.
However, the model for does not include some models frequently discussed in the literature such as subsampled tight or continuous frames. Against this background, a recent paper of Candès and Plan  considers a very general framework, which includes a lot of common models in the literature. Theorem 1.2 in our paper is similar to Theorem 1 in . It assumes similar sparsity conditions, but is based on this much broader and more applicable model introduced in . Notice that, we require whereas  requires . Therefore, we improve the condition by a factor of , which is always at least and can be as large as . However, our result imposes , which is worse than by the same factor. In , the parameter depends upon , while our is only a function of and . This is why the results differ, and we prefer to use a value of that does not depend on because in some applications, an accurate estimate of may be difficult to obtain. In addition, we use different techniques of proof which the clever golfing scheme of  is exploited.
Sparse approximation is another problem of underdetermined linear system where the dictionary matrix is always assumed to be deterministic. Readers interested in this problem (which always requires stronger sparsity conditions) may also want to study the recent paper  by Studer et al. There, the authors introduce a more general problem of the form , and analyzed the performance of -recovery techniques by using ideas which have been popularized under the name of generalized uncertainty principles in the basis pursuit and sparse approximation literature.
As for Model 3, Theorem 1.3 is a significant extension of the results presented in , in which the authors have a stringent requirement . In a very recent and independent work , the authors consider a model where both and are unions of stochastic and deterministic subsets, while we only assume the stochastic model. We recommend interested readers to read the paper for the details. However, only considering their results on stochastic and , a direct comparison shows that the number of samples we need is less than that in this reference. The difference is several logarithmic factors. Actually, the requirement of in our paper is optimal even for clean data in the literature of MC. Finally, we want to emphasize that the random support assumption is essential in Theorem 1.3 when the rank is large. Examples can be found in .
We wish to close our introduction with a few words concerning the techniques of proof we shall use. The proof of Theorem 1.1 is based on the concept of restricted isometry, which is a standard technique in the literature of CS. However, our argument involves a generalization of the restricted isometry concept. The proofs of Theorems 1.2 and 1.3 are based on the golfing scheme, an elegant technique pioneered by David Gross , and later used in [32, 4, 7] to construct dual certificates. Our proof leverages results from . However, we contribute novel elements by finding an appropriate way to phrase sufficient optimality conditions, which are amenable to the golfing scheme. Details are presented in the following sections.
2 A Proof of Theorem 1.1
In the proof of Theorem 1.1, we will see the notation . Here is a -dimensional vector, is a subset of and we also use to represent the subspace of all -dimensional vectors supported on .
Then is the projection of onto the subspace , which is to
keep the value of on the support and to change other elements into
zeros. In this section we use the notation “” of “floor function” to represent the integer part of any real number.
First we generalize the concept of the restricted isometry property (RIP)  for the convenience to prove our theorem:
For any matrix , define the RIP-constant by the infimum value of such that
holds for any with and with .
For any and such that , and , , we have
Proof First, we suppose . By the definition of , we have
By the above inequalities, we have , and hence by homogeneity, we have without the norm assumption.
Suppose with RIP-constant ()and is between and . Then for any with , any with , and any with the solution to the optimization problem (1.7) satisfies .
Proof Suppose and . Then by (1.7) we have
It is easy to check that the original satisfies the inequality constraint in (1.7), so we have
Then it suffices to show .
Suppose with such that . Denote where and . Moreover, suppose contains the indices of the largest (in the sense of absolute value) coefficients of , contains the indices of the largest coefficients of , and so on. Similarly, define such that and , and divide in the same way. By this setup, we easily have
On the other hand, by the assumption and , we have,
By the definition of , the fact and Lemma 2.2, we have
Therefore, by , we have
We now cite a well-known result in the literature of CS, e.g. Theorem 5.2 of .
Suppose is a random matrix defined in model 1. Then for any , there exist such that with probability at least ,
holds universally for any with .
Let be an matrix whose entries are independent standard normal random variables. Then for every , with probability at least , one has .
We now prove Theorem 1.1:
Proof Suppose , are two constants independent of and , and their values will be specified later. Set and . We want to bound the RIP-constant for the matrix when is sufficiently small. For any with and with , and any with , any with , we have
By Lemma 2.4, assuming , with probability at least we have
holds universally for any such and .
Now we we fix and , and we want to bound . By Lemma 2.5, we actually have
with probability at least .
Then with probability at least , inequality 2.8 holds universally for any satisfying and satisfying . By , we have , where only depends on and as , and hence
Similarly, because , we have , where only depends on and as , and hence . Therefore, inequality 2.8 holds universally for any such
and with probability at least .
Combined with 2.7, we have