1 Introduction
1.1 Introduction on Compressed Sensing with Corruptions
Compressed sensing (CS) has been wellstudied in recent years [9, 19]. This novel theory asserts that a sparse or approximately sparse signal can be acquired by taking just a few nonadaptive linear measurements. This fact has numerous consequences which are being explored in a number of fields of applied science and engineering. In CS, the acquisition procedure is often represented as , where is called the sensing matrix and
is the vector of measurements or observations. It is now wellestablished that the solution
to the optimization problem(1.1) 
is guaranteed to be the original signal
with high probability, provided
is sufficiently sparse and obeys certain conditions. A typical result is this: if has iid Gaussian entries, then exact recovery occurs provided [10, 18, 37] for some positive numerical constant . Here is another example, if is a matrix with rows randomly selected from the DFT matrix, the condition becomes [9].This paper discusses a natural generalization of CS, which we shall refer to as compressed sensing with corruptions. We assume that some entries of the data vector are totally corrupted but we have absolutely no idea which entries are unreliable. We still want to recover the original signal efficiently and accurately. Formally, we have the mathematical model
(1.2) 
where and . The number of nonzero coefficients in is and similarly for . As in the above model, is an
sensing matrix, usually sampled from a probability distribution. The problem of recovering
(and hence ) from has been recently studied in the literature in connection with some interesting applications. We discuss a few of them.
Clipping. Signal clipping frequently appears because of nonlinearities in the acquisition device [27, 38]. Here, one typically measures rather than , where is always a nonlinear map. Letting , we thus observe . Nonlinearities usually occur at large amplitudes so that for those components with small amplitudes, we have . This means that is sparse and, therefore, our model is appropriate. Just as before, locating the portion of the data vector that has been clipped may be difficult because of additional noise.

CS for networked data. In a sensor network, different sensors will collect measurements of the same signal independently (they each measure ) and send the outcome to a center hub for analysis [23, 30]. By setting as the row vectors of , this is just . However, typically some sensors will fail to send the measurements correctly, and will sometimes report totally meaningless measurements. Therefore, we collect , where models recording errors.
1.2 Introduction on matrix completion with corruptions
Matrix completion (MC) bears some similarity with CS. Here, the goal is to recover a lowrank matrix from a small fraction of linear measurements. For simplicity, we suppose the matrix is square as above (the general case is similar). The standard model is that we observe where and
The problem is to recover the original matrix , and there have been many papers studying this problem in recent years, see [33, 8, 12, 26, 21]
, for example. Here one minimizes the nuclear norm — the sum of all the singular values
[20]— to recover the original low rank matrix. We discuss below an improved result due to Gross [21] (with a slight difference).Define for some by meaning that
are iid Bernoulli random variables with parameter
. Then the solution to(1.4) 
is guaranteed to be exactly with high probability, provided .
Here, is a positive numerical constant, is the rank of , and is an incoherence parameter introduced in [8] which is only dependent of .
This paper is concerned with the situation in which some entries may
have been corrupted. Therefore, our model is that we observe
(1.5) 
where and are the same as before and is supported on . Just as in CS, this model has broad applicability. For example, Wu et al. used this model in photometric stereo [42]. This problem has also been introduced in [4] and is related to recent work in separating a lowrank from a sparse component [14, 4, 24, 13, 43]. A typical result is that the solution to
(1.6) 
is guaranteed to be the true pair with high probability under some assumptions about [4, 16]. We will compare them with our result in Section 1.4.
1.3 Main results
This section introduces three models and three corresponding recovery results. The proofs of these results are deferred to Section 2 for Theorem 1.1, Section 3 for Theorem 1.2 and Section 4 for Theorem 1.3.
1.3.1 CS with iid matrices [Model 1]
Theorem 1.1
Suppose that is an random matrix whose entries are iid Gaussian variables with mean
and variance
, the signal to acquire is , and our observation is where and . Then by choosing , the solution to(1.7) 
satisfies with probability at least . This holds universally; that is to say, for all vectors and obeying and . Here , , and are numerical constants.
In the above statement, the matrix is random. Everything else is deterministic. The reader will notice that the number of nonzero entries is on the same order as that needed for recovery from clean data [10, 19, 3, 37], while the condition of implies that one can tolerate a constant fraction of possibly adversarial errors. Moreover, our convex optimization is related to LASSO [35] and Basis Pursuit [15].
1.3.2 CS with general sensing matrices [Model 2]
In this model, and
where are iid copies of a random vector whose
distribution obeys the following two properties: 1)
; 2) . This model has
been introduced in [7] and includes a lot of the stochastic
models used in the literature. Examples include partial DFT matrices,
matrices with iid entries, certain random convolutions [34] and so on.
In this model, we assume that and in (1.2) have fixed support denoted by and , and with
cardinality and . In the remainder of the paper,
is the restriction of to indices in and is the
restriction of to . Our main assumption here concerns the sign
sequences: the sign sequences of and are independent of
each other, and each is a sequence of symmetric iid variables.
Theorem 1.2
For the model above, the solution to (1.3), with , is exact with probability at least , provided that and . Here , and are some numerical constants.
Above, and have fixed supports and random signs. However, by a
recent derandomization technique first introduced in [4],
exact recovery with random supports and fixed signs would also
hold. We will explain this derandomization technique in the proof of Theorem 1.3. In some specific models, such as independent rows from the DFT matrix, could be a numerical constant, which implies the proportion of corruptions is also a constant. An open problem is whether Theorem 1.2 still holds in the case where and have both fixed supports and signs. Another open problem is to know whether the
result would hold under more general conditions about as in
[6] in the case where has both random support and random signs.
We emphasize that the sparsity condition is a little stronger than the optimal result available
in the noisefree literature [9, 7]),
namely,. The extra logarithmic
factor appears to be important in the proof which we will explain in Section 3, and a third open problem
is whether or not it is possible to remove this factor.
Here we do not give a sensitivity analysis for the recovery procedure as in Model 1. Actually by applying a similar method introduced in [7] to our argument in Section 3, a very good error bound could be obtained in the noisy case. However, technically there is little novelty but it will make our paper very long. Therefore we decide to only discuss the noiseless case and focus on the sampling rate and corruption ratio.
1.3.3 MC from corrupted entries [Model 3]
We assume is of rank and write its reduced SVD as , where and . Let be the smallest quantity such that for all ,
This model is the same as that originally introduced in [8], and later used in [21, 32, 12, 4, 16]. We observe , where and is supported on . Here we assume that satisfy the following model:
Model 3.1:
1. Fix an by matrix , whose entries are either or .
2. Define for a constant satisfying . Specifically speaking, are iid Bernoulli random variables with parameter .
3. Conditioning on , assume that are independent events with . This implies that .
4. Define . Then we have
5. Let be supported on , and .
Theorem 1.3
Under Model 3.1, suppose and . Moreover, suppose and denote as the optimal solution to the problem (1.6). Then we have with probability at least for some numerical constant , provided the numerical constants is sufficiently small and is sufficiently large.
In this model is available while , and are not known explicitly from the observation . By the assumption , we can use to approximate . From the following proof we can see that is not required to be exactly for the exact recovery. The power of our result is that one can recover a lowrank matrix from a nearly minimal number of samples even when a constant proportion of these samples has been corrupted.
We only discuss the noiseless case for this model. Actually by a method similar to [6]
, a suboptimal estimation error bound can be obtained by a slight modification of our argument. However, it is of little interest technically and beyond the optimal result when
is large. There are other suboptimal results for matrix completion with noise, such as [1], but the error bound is not tight when the additional noise is small. We want to focus on the noiseless case in this paper and leave the problem with noise for future work.The values of are chosen for theoretical guarantee of exact recovery in Theorem 1.1, 1.2 and 1.3. In practice, is usually taken by cross validation.
1.4 Comparison with existing results, relative works and our contribution
In this section we will compare Theorems 1.1, 1.2 and 1.3
with existing results in the literature.
We begin with Model 1. In [40], Wright and Ma discussed a model where the sensing matrix has independent columns with common mean and normal perturbations with variance . They chose , and proved that
with high probability provided , and has random signs. Here is much smaller than . We notice that since the authors of [40] talked about a different model, which is motivated by [41], it may not be comparable with ours directly. However, for our motivation of CS with corruptions, we assume satisfy a symmetric distribution and get better sampling rate.
A bit later, Laska et al. [28] and Li et al. [29] also
studied this problem. By setting , both papers
establish that for Gaussian (or subGaussian) sensing matrices , if , then the recovery
is exact. This follows from the fact that obeys a restricted
isometry property known to guarantee exact recovery of sparse vectors
via minimization. Furthermore, the sparsity
requirement about is the same as that found in the standard CS
literature, namely, . However, the
result does not allow a positive fraction of
corruptions. For example, if , we have , which will go to zero as goes to zero.
As for Model 2, an interesting piece of
work [30] (and later [31] on the noisy case) appeared during the preparation of this paper. These
papers discuss models in which
is formed by selecting rows from an orthogonal matrix with low incoherence parameter
, which is the minimum value such that for any . The main result states that selecting gives exact recovery under the following assumptions: 1) the rows of are chosen from an orthogonal matrix uniformly at random; 2) is a random signal with independent signs and equally likely to be either ; 3) the support of is chosen uniformly at random. (By the derandomization technique introduced in [4] and used in [30], it would have been sufficient to assume that the signs of are independent and take on the values with equal probability). Finally, the sparsity conditions require and , which are nearly optimal, for the best known sparsity condition when is . In other words, the result is optimal up to an extra factor of ; the sparsity condition about is of course nearly optimal.However, the model for does not include some models frequently discussed in the literature such as subsampled tight or continuous frames. Against this background, a recent paper of Candès and Plan [7] considers a very general framework, which includes a lot of common models in the literature. Theorem 1.2 in our paper is similar to Theorem 1 in [30]. It assumes similar sparsity conditions, but is based on this much broader and more applicable model introduced in [7]. Notice that, we require whereas [30] requires . Therefore, we improve the condition by a factor of , which is always at least and can be as large as . However, our result imposes , which is worse than by the same factor. In [30], the parameter depends upon , while our is only a function of and . This is why the results differ, and we prefer to use a value of that does not depend on because in some applications, an accurate estimate of may be difficult to obtain. In addition, we use different techniques of proof which the clever golfing scheme of [21] is exploited.
Sparse approximation is another problem of underdetermined linear system where the dictionary matrix is always assumed to be deterministic. Readers interested in this problem (which always requires stronger sparsity conditions) may also want to study the recent paper [38] by Studer et al. There, the authors introduce a more general problem of the form , and analyzed the performance of recovery techniques by using ideas which have been popularized under the name of generalized uncertainty principles in the basis pursuit and sparse approximation literature.
As for Model 3, Theorem 1.3 is a significant extension of the results presented in [4], in which the authors have a stringent requirement . In a very recent and independent work [16], the authors consider a model where both and are unions of stochastic and deterministic subsets, while we only assume the stochastic model. We recommend interested readers to read the paper for the details. However, only considering their results on stochastic and , a direct comparison shows that the number of samples we need is less than that in this reference. The difference is several logarithmic factors. Actually, the requirement of in our paper is optimal even for clean data in the literature of MC. Finally, we want to emphasize that the random support assumption is essential in Theorem 1.3 when the rank is large. Examples can be found in [24].
We wish to close our introduction with a few words concerning the techniques of proof we shall use. The proof of Theorem 1.1 is based on the concept of restricted isometry, which is a standard technique in the literature of CS. However, our argument involves a generalization of the restricted isometry concept. The proofs of Theorems 1.2 and 1.3 are based on the golfing scheme, an elegant technique pioneered by David Gross [21], and later used in [32, 4, 7] to construct dual certificates. Our proof leverages results from [4]. However, we contribute novel elements by finding an appropriate way to phrase sufficient optimality conditions, which are amenable to the golfing scheme. Details are presented in the following sections.
2 A Proof of Theorem 1.1
In the proof of Theorem 1.1, we will see the notation . Here is a dimensional vector, is a subset of and we also use to represent the subspace of all dimensional vectors supported on .
Then is the projection of onto the subspace , which is to
keep the value of on the support and to change other elements into
zeros. In this section we use the notation “” of “floor function” to represent the integer part of any real number.
First we generalize the concept of the restricted isometry property (RIP) [11] for the convenience to prove our theorem:
Definition 2.1
For any matrix , define the RIPconstant by the infimum value of such that
holds for any with and with .
Lemma 2.2
For any and such that , and , , we have
Proof First, we suppose . By the definition of , we have
and
By the above inequalities, we have , and hence by homogeneity, we have without the norm assumption.
Lemma 2.3
Suppose with RIPconstant ()and is between and . Then for any with , any with , and any with the solution to the optimization problem (1.7) satisfies .
Proof Suppose and . Then by (1.7) we have
It is easy to check that the original satisfies the inequality constraint in (1.7), so we have
(2.1) 
Then it suffices to show .
Suppose with such that . Denote where and . Moreover, suppose contains the indices of the largest (in the sense of absolute value) coefficients of , contains the indices of the largest coefficients of , and so on. Similarly, define such that and , and divide in the same way. By this setup, we easily have
(2.2) 
and
(2.3) 
On the other hand, by the assumption and , we have,
(2.4) 
and similarly,
(2.5) 
By inequalities (2.1), (2.4) and (2.5), we have
(2.6) 
By the definition of , the fact and Lemma 2.2, we have
Moreover, since
we have
Therefore, by , we have
Since
we have
We now cite a wellknown result in the literature of CS, e.g. Theorem 5.2 of [3].
Lemma 2.4
Suppose is a random matrix defined in model 1. Then for any , there exist such that with probability at least ,
holds universally for any with .
Also, we cite a wellknow result which can give a bound for the biggest singular value of random matrix, e.g. [17] and [39].
Lemma 2.5
Let be an matrix whose entries are independent standard normal random variables. Then for every , with probability at least , one has .
We now prove Theorem 1.1:
Proof
Suppose , are two constants independent of and , and their values will be specified later. Set and . We
want to bound the RIPconstant for the
matrix when is sufficiently small. For any with and with , and
any with , any with , we have
By Lemma 2.4, assuming , with probability at least we have
(2.7) 
holds universally for any such and .
Now we we fix and , and we want to bound . By
Lemma 2.5, we actually have
(2.8) 
with probability at least .
Then with probability at least , inequality 2.8 holds universally for any satisfying and satisfying . By , we have , where only depends on and as , and hence
.
Similarly, because , we have , where only depends on and as , and hence . Therefore, inequality 2.8 holds universally for any such
and with probability at least .
Combined with 2.7, we have
Comments
There are no comments yet.