I Introduction
Let be a lowrank matrix [], and let be sparse (, counts the nonzero entries of its matrix argument). Given a compression matrix with , and observations
(1) 
the present paper deals with the recovery of . This task is of interest e.g., to unveil anomalous flows in backbone networks [LCD04, MMG11, zggr05], to extract the timevarying foreground from a sequence of compressed video frames [Branuick_nips11], or, to identify active brain regions from undersampled functional magnetic resonance imagery (fMRI) [Vaswani_Allerton_11]. In addition, this fundamental problem is found at the crossroads of compressive sampling (CS), and the timely lowrankplussparse matrix decompositions.
In the absence of the lowrank component (), one is left with an underdetermined sparse signal recovery problem; see e.g., [CT05, rauhut] and the tutorial account [candes_tutorial]. When
, the formulation boils down to principal components pursuit (PCP), also referred to as robust principal component analysis (PCA)
[CLMW09, CSPW11, bayes_rpca]. For this idealized noisefree setting, sufficient conditions for exact recovery are available for both of the aforementioned special cases. However, the superposition of a lowrank and a compressed sparse matrix in (1) further challenges identifiability of . In the presence of ‘dense’ noise, stable reconstruction of the lowrank and sparse matrix components is possible via PCP [zlwcm10, Outlier_pursuit]. Earlier efforts dealing with the recovery of sparse vectors in noise led to similar performance guarantees; see e.g., [bickel09] and references therein. Even when is nonzero, one could envision a CS variant where the measurements are corrupted with correlated (lowrank) noise [Vaswani_Allerton_11]. Last but not least, when and is noisy, the recovery ofsubject to a rank constraint is nothing else than PCA – arguably, the workhorse of highdimensional data analysis
[J02].The main contribution of this paper is to establish that given and in (1), for small enough and one can exactly recover by solving the nonsmooth convex optimization problem
where is a tuning parameter; is the nuclear norm of ( stands for the
th singular value); and,
denotes the norm. The aforementioned norms are convex surrogates to the rank and norm, respectively, which albeit natural as criteria they are NPhard to optimize [l_0_NP_hard, rank_NP_Duro]. Recently, a greedy algorithm for recovering lowrank and sparse matrices from compressive measurements was put forth in [Branuick_nips11]. However, convergence of the algorithm and its error performance are only assessed via numerical simulations. A recursive algorithm capable of processing data in real time can be found in [Vaswani_Allerton_11], which attains good performance in practice but does not offer theoretical guarantees.A deterministic approach along the lines of [CSPW11] is adopted first to derive conditions under which (1) is locally identifiable (Section II). Introducing a notion of incoherence between the additive components and , and resorting to the restricted isometry constants of [CT05], sufficient conditions are obtained to ensure that (P1) succeeds in exactly recovering the unknowns (Section IIIA). Intuitively, the results here assert that if and are sufficiently small, the nonzero entries of are sufficiently spread out, and subsets of columns of behave as isometries, then (P1) exactly recovers . As a byproduct, recovery results for PCP and CS are also obtained by specializing the aforesaid conditions accordingly (Section IIIB). The proof of the main result builds on Lagrangian duality theory [Bers, Boyd], to first derive conditions under which is the unique optimal solution of (P1) (Section IVA). In a nutshell, satisfaction of the optimality conditions is tantamount to the existence of a valid dual certificate. Stemming from the unique challenges introduced by , the dual certificate construction procedure of Section IVB is markedly distinct from the direct sum approach in [CSPW11], and the (random) golfing scheme of [CLMW09]. Section V shows that lowrank, sparse, and compression matrices drawn from certain random ensembles satisfy the sufficient conditions for exact recovery with high probability.
Two iterative algorithms for solving (P1) are developed in Section VI, which are based on the accelerated proximal grandient (APG) method [nesterov83, nesterov05, fista, rpca_proximal], and the alternatingdirection method of multipliers (ADMoM) [Bertsekas_Book_Distr, Boyd]. Numerical tests corroborate the exact recovery claims, and the effectiveness of (P1) in unveiling traffic volume anomalies from real network data (Section VII). Section VIII concludes the paper with a summary and a discussion of limitations, possible extensions, and interesting future directions. Technical details are deferred to the Appendix.
Ia Notational conventions
Bold uppercase (lowercase) letters will denote matrices (column vectors), and calligraphic letters will denote sets. Operators , , , , , , , and will denote transposition, matrix pseudo inverse, matrix trace, matrix vectorization, diagonal matrix, spectral radius, minimum singular value, and Kronecker product, respectively; will be used for the cardinality of a set and the magnitude of a scalar. The identity matrix will be represented by and its th column by ; while denotes the vector of all zeros, and . The norm of vector is for . For matrices define the trace inner product . Also, recall that is the Frobenious norm, is the norm, is the norm, and is the nuclear norm. In addition, denotes the induced norm, and likewise for the induced norm . For the linear operator , define the operator norm , which subsumes the spectral norm . Define also the support set . The indicator function equals one when , and zero otherwise.
Ii Local Identifiability
The first issue to address is model identifiability, meaning that there are unique lowrank and sparse matrices satisfying (1). If there exist multiple decompositions of into with lowrank and sparse , there is no hope of recovering from the data. For instance, if the null space of the fat matrix contains sparse matrices, there may exist a sparse perturbation such that is still sparse and is a legitimate solution. Another problematic case arises when there is a sparse perturbation such that is spanned by the row or column spaces of . Then, has the same rank as and may still be sparse. As a result, one may pick as another valid solution. Dealing with such identifiability issues is the subject of this section.
Let
denote the singular value decomposition (SVD) of
, and consider the subspaces: s1) of matrices in either the column or row space of ; s2) of matrices in with support contained in the support of ; and s3) . For notational brevity, s1)s3) will be henceforth denoted as . Noteworthy properties of these subspaces are: i) both and , hence it is possible to directly compare elements from them; ii) and ; and iii) if is added to , then .For now, assume that the subspaces and are also known. This extra information helps identifiability of (1), because potentially troublesome solutions are limited to a restricted class. If or , that candidate solution is not admissible since it is known a priori that and . Under these assumptions, the following lemma puts forth the necessary and sufficient conditions guaranteeing unique decomposability of according to (1) – a notion known as local identifiability [CLMW09].
Lemma 1: Matrix uniquely decomposes into if and only if , and .
Proof:
Since by definition and , one can represent every element in the subspaces and as and , respectively, where and . Assume that , and suppose by contradiction that there exist nonzero perturbations such that . Then, , meaning that and belong to the same subspace, which contradicts the assumption. Conversely, suppose there exists a nonzero . Clearly, is a feasible solution where and . This contradicts the uniqueness assumption. In addition, the condition ensures that only when for .
In words, (1) is locally identifiable if and only if the subspaces and intersect transversally, and the sparse matrices in are not annihilated by . This last condition is unique to the setting here, and is not present in [CLMW09] or [CSPW11].
Remark 1 (Projection operators)
Operator () denotes the orthogonal projection of onto the subspace (orthogonal complement ). It simply sets those elements of not in to zero. Likewise, () denotes the orthogonal projection of onto the subspace (orthogonal complement ). Let and denote, respectively, projection onto the column and row spaces of . It can be shown that , while the projection onto the complement subspace is . In addition, the following identities
(2) 
of orthogonal projection operators such as , will be invoked throughout the paper.
Iia Incoherence measures
Building on Lemma II, alternative sufficient conditions are derived here to ensure local identifiability. To quantify the overlap between and , consider the incoherence parameter
(3) 
for which it holds that . The lower bound is achieved when and are orthogonal, while the upper bound is attained when contains a nonzero element. Assuming , then represents the cosine of the angle between and [Deutsch]. From Lemma II, it appears that guarantees . As it will become clear later on, tighter conditions on will prove instrumental to guarantee exact recovery of by solving (P1).
To measure the incoherence among subsets of columns of , which is tightly related to the second condition in Lemma II, the restricted isometry constants (RICs) come handy [CT05]. The constant measures the extent to which a subset of columns of behaves like an isometry. It is defined as the smallest value satisfying
(4) 
for every with and for some positive normalization constant [CT05]. For later use, introduce which measures ‘how orthogonal’ are the subspaces generated by two disjoint column subsets of , with cardinality and . Formally, is the smallest value that satisfies
(5) 
for every , where and . The normalization constant plays the same role as in . A wide family of matrices with small RICs have been introduced in e.g., [CT05].
All the elements are now in place to state this section’s main result.
Proposition 1: Assume that each column of contains at most nonzero elements. If and , then and .
Proof:
Suppose the intersection in nontrivial, meaning that there exists nonzero matrices and satisfying . Vectorizing the last equation and relying on the identity , one obtains a linear system of equations
(6) 
where . Define an matrix and the matrix . The corresponding coefficients are and . Then, (6) implies there exists a such that .
Consider two cases: i) , and ii) . Under i) , and thus for some nonzero with where . Therefore, if , implies that , which is a contradiction. For ii) implies that there is no with and such that , since otherwise which leads to .
Iii Exact Recovery via Convex Optimization
In addition to , there are other incoherence measures which play an important role in the conditions for exact recovery. Consider a feasible solution , where and thus . It may then happen that and , while , challenging identifiability when and are unknown. Similar complications will arise if has a sparse row space that could be confused with the row space of . These issues motivate defining
where . The maximum of is attained when is in the column [row] space of for some . Small values of and imply that the column and row spaces of do not contain the columns of and sparse vectors, respectively.
Another identifiability issue arises when for some sparse matrix . In this case, each column of is spanned by a few columns of . Consider the parameter
A small value of implies that each column of is spanned by sufficiently many columns of . To understand this property, suppose for simplicity that all nonzero singular values of are identical and equal to , say. The th column of is then , and its projection onto the th column of is
Since the energy of is somehow allocated along the directions , if all the aforementioned projections can be made arbitrarily small, then sufficiently many nonzero terms in the expansion are needed to account for all this energy.
Iiia Main result
Theorem 1: Consider given matrices and obeying , with and . Assume that every row and column of has at most nonzero elements, and that has orthonormal rows. If the following conditions
 I)

; and
 II)

hold, where
then there exists for which the convex program (P1) exactly recovers .
Note that I) alone is already more stringent than the pair of conditions and needed for local identifiability (cf. Proposition IIA). Satisfaction of the conditions in Theorem IIIA hinges upon the values of the incoherence parameters , and the RICs and . In particular, are increasing functions of these parameters, and it is readily observed from I) and II) that the smaller are, the more likely the conditions are met. Furthermore, the incoherence parameters are increasing functions of the rank and sparsity level . The RIC is also an increasing function of , the maximum number of nonzero elements per row/column of . Therefore, for sufficiently small values of , the sufficient conditions of Theorem IIIA can be indeed satisfied.
It is worth noting that not only , but also the position of the nonzero entries in plays an important role in satisfying I) and II). This is manifested through , for which a small value indicates the entries of are sufficiently spread out, i.e., most entries do not cluster along a few rows or columns of . Moreover, no restriction is placed on the magnitude of these entries, since as seen later on it is only the positions that affect optimal recovery via (P1).
Remark 2 (Row orthonormality of )
Assuming is equivalent to supposing that is fullrank. This is because for a full rowrank , one can premultiply both sides of (1) with to obtain with orthonormal rows.
IiiB Induced recovery results for principal components pursuit and compressed sensing
Before delving into the proof of the main result, it is instructive to examine how the sufficient conditions in Theorem IIIA simplify for the subsumed PCP and CS problems. In PCP one has , which implies and . To obtain sufficient conditions expressed only in terms of , one can borrow the coherence conditions of [CLMW09] and readily arrive at the following result.
Corollary 1: Consider given obeying , with and . Suppose the coherence conditions , , and hold for some positive constant . If is sufficiently small such that the following conditions
 )

; and
 )

hold, where
then there exists for which the convex program (P1) with exactly recovers .
In Section V, random matrices drawn from natural ensembles are shown to satisfy I) and II) with high probability. In this case, it is possible to arrive at simpler conditions (depending only on , , and the matrix dimensions) for exact recovery in the context of PCP; see Remark 6. Corollary IIIB, on the other hand, offers general conditions stemming from a purely deterministic approach.
In the CS setting one has , which implies . As a result, Theorem IIIA simply boils down to a RICdependent sufficient condition for the exact recovery of as stated next.
Corollary 2: Consider given matrices and obeying . Assume that the number of nonzero elements per column of does not exceed . If
(7) 
holds, then (P1) with exactly recovers .
To place in context, consider normalizing the rows of . For such a compression matrix it is known that , see e.g., [rauhut]. Using this bound together with (7), one arrives at the stricter condition . This last condition is identical to the one reported in [donoho_elad_cs], which guarantees the success of norm minimization in recovering sparse solutions to underdetermined systems of linear equations. The conditions have been improved in recent works; see e.g., [rauhut] and references therein.
Iv Proof of the Main Result
In what follows, conditions are first derived under which is the unique optimal solution of (P1). In essence, these conditions are expressed in terms of certain dual certificates. Then, Section IVB deals with the construction of a valid dual certificate.
Iva Unique optimality conditions
Recall the nonsmooth optimization problem (P1), and its Lagrangian
(8) 
where is the matrix of dual variables (multipliers) associated with the constraint in (P1). From the characterization of the subdifferential for nuclear and norm (see e.g., [Boyd]), the subdifferential of the Lagrangian at is given by (recall that )
(9)  
(10) 
The optimality conditions for (P1) assert that is an optimal (not necessarily unique) solution if and only if
This can be shown equivalent to finding the pair that satisfies: i) ; ii) ; and iii) . In general, i)iii) may hold for multiple solution pairs. However, the next lemma asserts that a slight tightening of the optimality conditions i)iii) leads to a unique optimal solution for (P1). See Appendix A for a proof.
Lemma 2: Assume that each column of contains at most nonzero elements, as well as and . If there exists a dual certificate satisfying
 C1)

 C2)

 C3)

 C4)

then is the unique optimal solution of (P1).
The remainder of the proof deals with the construction of a dual certificate that meets C1)C4). To this end, tighter conditions [I) and II) in Theorem IIIA] for the existence of are derived in terms of the incoherence parameters and the RICs. For the special case , the conditions in Lemma IVA boil down to those in [CSPW11, Prop. 2] for PCP. However, the dual certificate construction techniques used in [CSPW11] do not carry over to the setting considered here, where a compression matrix is present.
IvB Dual certificate construction
Condition C1) in Lemma IVA implies that , for arbitrary (cf. Remark 1). Upon defining and , C1) and C2) are equivalent to .
To express in terms of the unrestricted matrix , first vectorize to obtain . Define and an matrix formed with those rows of associated with those elements in . Likewise, define which collects the remaining rows from such that for a suitable row permutation matrix . Finally, let be the vector of length containing those elements of with indices in . With these definitions, C1) and C2) can be expressed as
To upperbound the lefthand side of C3) in terms of , use the assumption to arrive at
Similarly, the lefthand side of C4) can be bounded as
In a nutshell, if one can find such that
 c1)

 c2)

 c3)

hold for some positive , then C1)C4) would be satisfied as well.
The final steps of the proof entail: i) finding an appropriate candidate solution such that ) holds; and ii) deriving conditions in terms of the incoherence parameters and RICs that guarantee meets the required bounds in ) and ) for a range of values. The following lemma is instrumental to accomplishing i), and its proof can be found in Appendix B.
Lemma 3: Assume that each column of contains at most nonzero elements, as well as and . Then matrix has full row rank, and its minimum singular value is bounded below as
According to Lemma IVB, the leastnorm (LN) solution exists, and is given by
(11) 
Remark 3 (Candidate dual certificate)
From the arguments at the beginning of this section, the candidate dual certificate is .
The LN solution is an attractive choice, since it facilitates satisfying ) and ) which require norms of to be small. Substituting the LN solution (11) into the left hand side of ) yields (define for notational brevity)
(12) 
Moreover, substituting (11) in the left hand side of ) results in
(13) 
Next, upperbounds are obtained for and ; see Appendix C for a proof.
Lemma 4: Assume that each column and row of contains at most nonzero elements. If and hold, then