1 Introduction
Exploiting the inherent structure of data for the recovery of relevant information is at the heart of data analysis. In this paper, we analyze a scenario where a data matrix arises as a result of a superposition of a rank component , and a dictionary sparse component expressed here as . Here, is an a priori known dictionary with normalized columns, and is the unknown sparse coefficient matrix with at most total nonzeros. Specifically, we will study the following model,
(1) 
and identify the conditions under which the components and , can be successfully recovered, given and .
A wide range of problems can be expressed in the form described above. Perhaps the most celebrated of these is principal component analysis (PCA)
[1], which can be viewed as a special case of eq.(1), with the matrix set to zero. In the absence of the component , the problem reduces to that of sparse recovery [2, 3, 4]; See [5] and references therein for an overview of related works. The popular framework of Robust PCA tackles a case when the dictionaryis an identity matrix
[6, 7]; variants include [8, 9, 10, 11]. In addition, other variants of Robust PCA, such as Outlier Pursuit
[12], where and the sparse component is column sparse, and randomized adaptive sensing approaches [13, 14, 15, 16, 17], have also been explored.Our work is most closely related to [18], which explores the application of the model shown in eq.(1) to detect traffic anomalies, and focuses on a case where the dictionary is overcomplete, i.e., fat. The model described therein, is applicable to a case where the rows of are orthogonal, e.g., , and the coefficient matrix , has at most nonzero elements per row and column. In this paper, we analyze a more general case, where we relax some of the aforementioned assumptions for the fat case, and develop an analogous analysis for the thin case. Specifically, this paper makes the following contributions towards guaranteeing the recovery of and in eq.(1). First, we analyze the thin case, where we assume to be a frame [19] with a global sparsity of at most ; See [20] for a brief overview of frames. Next, for the fat case, we extend the analysis presented in [18], and assume that the dictionary satisfies the restricted isometry property (RIP) of order , with a global sparsity of at most , and a column sparsity of at most . Consequently, we eliminate the sparsity constraint on the rows of the coefficient matrix and the orthogonality constraint on the rows of the dictionary required by [18].
The model shown in eq.(1) is propitious in a number of applications. For example, it can be used for target identification in hyperspectral imaging, and in topic modeling applications to identify documents with certain properties. Further, in source separation tasks, a variant of this model was used in singing voice separation in [21, 22]. Further, we can also envision source separation tasks where is not lowrank, but can in turn be modeled as being sparse in a known [23] or unknown [24] dictionary.
The rest of the paper is organized as follows. We formulate the problem, introduce the notation and describe various considerations on the structure of the component matrices in section 2. In section 3, we present our main result and a proof sketch, followed by numerical simulations in section 4. Finally, we conclude in section 5 with some insights on future work.
2 Problem Formulation
Our aim is to recover the lowrank component , and the sparse coefficient matrix , given the dictionary , and samples generated according to the model described in eq.(1). Utilizing the assumed structure of the components and , we consider the following convex problem for ,
(2) 
where, denotes the nuclear norm, and refers to the  norm, which serve as convex relaxations of rank and sparsity (i.e. norm), respectively. Depending upon the number of dictionary elements, in , we analyze the problem described above for two cases – a) when , the thin case, and b) when , the fat case. For the thin case, we assume that the rows of the dictionary to comprise a frame
, i.e. for any vector
, we have(3) 
where and are the lower and upper frame bounds, respectively, with . Next, for , the fat case, we assume that obeys the restricted isometry property (RIP) of order , i.e. for any sparse vector , we have
(4) 
where, is the restricted isometry constant (RIC).
The aim of this paper is to answer the following question – Given , under what conditions can we recover and from the mixture ? We observe that there are a few ways we can run into trouble right away, namely– a) the dictionary sparse part is lowrank, and b) the low rank part, , is sparse in the dictionary, . Indeed, these relationships take the center stage in our analysis. We begin by defining a few relevant subspaces, similar to those used in [18], which will help us to formalize the said relationships.
Let the pair be the solution to the problem shown in eq.(2). We define as the linear space of matrices spanning the row and column spaces of the lowrank component . Specifically, let
denote the singular value decomposition of
, then the space is defined asNext, let be the space spanned by matrices that have the same support (location of nonzero elements) as , and let be defined as
In addition, we denote the corresponding complements of the spaces described above by appending ‘’. Next, let the orthogonal projection operator(s) onto the space of matrice(s) defined above be , and , respectively. Further, we will use and to denote the projection matrices corresponding to the column and row spaces of , respectively, i.e., implying the following for any matrix ,
As alluded to previously, there are indeed situations under which we cannot hope to recover the matrices and . To identify these scenarios, we will employ various notions of incoherence. We define the incoherence between the lowrank part, , and the dictionary sparse part, as,
where is small when these components are incoherent (good for recovery). The next two measures of incoherence can be interpreted as a way to avoid the cases where for , (a) resembles the dictionary , and (b) resembles the sparse coefficient matrix . In this case, the lowrank part essentially mimics the dictionary sparse component. To this end, similar to [18], we define respectively the following to measure these properties,
where . Also, we define .
3 Main Result
In this section, we present the conditions under which solving the problem stated in eq.(2) will successfully recover the true matrices and . As discussed in the previous section, the structure of the dictionary plays a crucial role in the analysis of the two paradigms, i.e. the thin case and the fat case. Consequently, we provide results corresponding to these cases separately. We begin by introducing a few definitions and assumptions applicable to both cases. To simplify the analysis we assume that . Specifically, we will assume that , where is a constant. In addition, our analysis is applicable to the case when .
Definition D.1.
We define
where, and are defined as,
and
.
Further, we define and as,
Definition D.2.
Assumption A.1.
Assumption A.2.
Let , then
Assumption A.3.
For as above,
Theorem 1.
Consider a superposition of a lowrank matrix of rank , and a dictionary sparse component , wherein the sparse coefficient matrix has at most nonzeros, i.e., , and , with parameters , , and . Then, solving the formulation shown in eq.(2) will recover matrices and if the following conditions hold for any , as defined in D.1 and D.2, respectively.
Thm. 1 establishes the sufficient conditions for the existence of s to guarantee recovery of for both the thin and the fat case. For both cases, we see that the conditions are closely related to the various incoherence measures , and between the lowrank part, , the dictionary, , and the sparse component . Further, we observe that the theorem imposes an an upperbound on the global sparsity, i.e., . This is similar to what was reported in [12], and seems a result of the deterministic analysis presented here. Further, the condition shown in assumption A.1, i.e., , translates to a relationship between rank , and the sparsity , namely,
(5) 
, for the thin case, and
(6) 
, for the fat case. These relationships are indeed what we observe in empirical evaluations; this will be revisited in the next section. We now present a brief proof sketch of the results presented in this section.
3.1 Proof Sketch
We use dual certificate construction procedure to prove the main result in Thm. 1 [25]. To this end, we start by constructing a dual certificate for the convex problem shown in eq.(2). In our analysis, we use for the spectral norm, here denotes the maximum singular value of the matrix , , and . The following lemma shows the conditions the dual certificate needs to satisfy.
Lemma 2 (from Lemma 2 in [18] and Thm. 3 in [12]).
If there exists a dual certificate satisfying
then the pair is the unique solution of eq (2).
We will now proceed with the construction of the dual certificate which satisfies the conditions outlined by conditions C14 by Lemma 2. Using the analysis similar to [18] (section V. B.), we construct the dual certificate as
for arbitrary . The condition C2 then translates to
Let and , then we can write the equation above as,
Note that . Now, let , and let denote the rows of that correspond to support of , and correspond to the remaining rows of . Further, let be a length vector containing elements of corresponding to support of . Using these definitions and results, we conclude
This implies that . Now, we look at the condition C3, i.e. , this is where our analysis departs from [18]; we write
where we have used the fact that and . Now, as is the pseudoinverse of , i.e., , we have that , where is the smallest singular value of . Therefore, we have
To obtain an upper bound on , we will now present the following lemmata.
Lemma 3.
The lower bound on is given by
Lemma 4.
Upper bound on is given by
Assembling the results of the lemmata to obtain the upper bound on and consequently C3, we arrive at the expression for defined in D.2. Now, we move onto finding conditions under which C4 is satisfied by our dual certificate. For this we will bound . From eq.(16) in [18] we have,
(7) 
where . Our aim now will be to bound and for our case. For this, we present the following lemmata.
Lemma 5 (from eq.(17) in [18] ).
The upper bound on is given by .
Lemma 6.
The upper bound on is given by , where is as defined in D.1.
Substituting these in eq.(7) and C4, we have
The expression above is further upper bounded by due to C4. Here, and are as defined in D.1. Hence, we arrive at the following lower bound for ,
4 Simulations
Recovery of 


(a)  (b) 
Recovery of 

(c)  (d) 
Recovery of and 

(e)  (f) 
Our analysis in the previous section shows that depending upon the size of the dictionary , if the conditions of Thm. 1 are met, a convex program which solves eq.(2) will recover the components and . In this section, we empirically evaluate the claims of Thm. 1. To this end, we employ the accelerated proximal gradient algorithm outlined in Algorithm 1 of [18] to analyze the phase transition in rank and sparsity for different sizes of the dictionary . For our analysis, we consider the case where . Here, we generate the lowrank part by outer product of two random matrices of sizes and
, with entries drawn from the standard normal distribution. In addition, the nonzero entries (
in number), of the sparse component , are drawn from the Rademacher distribution, also the dictionary is drawn from the standard normal distribution, then its columns are normalized. Phase transition in rank and sparsity over trials for dictionaries of sizes (thin) and (fat), corresponding to our theoretical results, and for all admissible levels of sparsity are shown in Fig. 1 and Fig. 2, respectively.
Recovery of 


(a)  (b) 
Recovery of 

(c)  (d) 
Fig. 1 shows the recovery of the lowrank part in panels (ab), while panels (cd) show the recovery for the sparse component , for , respectively. Next, panels (ef) show the region of overlap between the lowrank recovery and sparse recovery plots, for and , respectively. This corresponds to the region in which both and are recovered successfully. Further, the red line in panels (a) and (b) show the trend predicted by our analysis, i.e., eq.(5) and eq.(6), with the parameters handtuned for best fit. Indeed, the empirical relationship between rank and sparsity has the same trend as predicted by Thm. 1.
Similarly, Fig. 2 shows the recovery of the lowrank part in panels (ab), while panels (cd) show the recovery for the sparse component for , respectively, for a much wider range of global sparsity . Indeed, these phase transition plots show that we can successfully recover the components for sparsity levels much greater than those put forth by the theorem. This can be attributed to the deterministic analysis presented here. To this end, we conjecture that a randomized analysis of the problem can potentially improve the results presented here.
5 Conclusions
We analyze a dictionary based generalization of Robust PCA. Specifically, we extend the theoretical guarantees presented in [18] to a setting wherein the dictionary may have arbitrary number of columns, and the coefficient matrix has global sparsity of , i.e. . We generalize the results by assuming to be a frame for the thin case, to obey the RIP condition for the fat case, and eliminate the orthogonality constraints on the rows of the dictionary and the sparsity constraint on the rows of the coefficient matrix (as in [18]), rendering the results useful for a potentially wide range of applications. Further, we provide empirical evaluations via phase transition plots in rank and sparsity corresponding to our theoretical results. We draw motivations from the promising phase transition plots, beyond the sparsity level tolerated by our analysis, and propose randomized analysis of the problem to improve the upperbound on the sparsity as a future work.
References
 [1] I. Jolliffe, Principal component analysis, Wiley Online Library, 2002.
 [2] B. K. Natarajan, “Sparse approximate solutions to linear systems,” SIAM journal on computing, vol. 24, no. 2, pp. 227–234, 1995.
 [3] D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomic decomposition,” IEEE Transactions on Information Theory, vol. 47, no. 7, pp. 2845–2862, 2001.

[4]
E. J. Candès and T. Tao,
“Decoding by linear programming,”
IEEE Transactions on Information Theory, vol. 51, no. 12, pp. 4203–4215, 2005.  [5] H. Rauhut, “Compressive sensing and structured random matrices,” Theoretical foundations and numerical methods for sparse recovery, vol. 9, pp. 1–92, 2010.
 [6] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of the ACM (JACM), vol. 58, no. 3, pp. 11, 2011.
 [7] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky, “Ranksparsity incoherence for matrix decomposition,” SIAM Journal on Optimization, vol. 21, no. 2, pp. 572–596, 2011.
 [8] Z. Zhou, X. Li, J. Wright, E. J. Candès, and Y. Ma, “Stable principal component pursuit,” in Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on. IEEE, 2010, pp. 1518–1522.
 [9] X. Ding, L. He, and L. Carin, “Bayesian robust principal component analysis,” IEEE Transactions on Image Processing, vol. 20, no. 12, pp. 3419–3430, 2011.
 [10] J. Wright, A. Ganesh, K. Min, and Y. Ma, “Compressive principal component pursuit,” Information and Inference, vol. 2, no. 1, pp. 32–68, 2013.
 [11] Y. Chen, A. Jalali, S. Sanghavi, and C. Caramanis, “Lowrank matrix recovery from errors and erasures,” IEEE Transactions on Information Theory, vol. 59, no. 7, pp. 4324–4337, 2013.
 [12] H. Xu, C. Caramanis, and S. Sanghavi, “Robust PCA via outlier pursuit,” in Advances in Neural Information Processing Systems, 2010, pp. 2496–2504.
 [13] X. Li and J. Haupt, “Identifying outliers in large matrices via randomized adaptive compressive sampling,” Trans. Signal Processing, vol. 63, no. 7, pp. 1792–1807, 2015.
 [14] X. Li and J. Haupt, “Locating salient groupstructured image features via adaptive compressive sensing,” in IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2015, pp. 393–397.
 [15] X. Li and J. Haupt, “Outlier identification via randomized adaptive compressive sampling,” in IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), 2015, pp. 3302–3306.
 [16] M. Rahmani and G. Atia, “Randomized robust subspace recovery for high dimensional data matrices,” arXiv preprint arXiv:1505.05901, 2015.
 [17] X. Li and J. Haupt, “A refined analysis for the sample complexity of adaptive compressive outlier sensing,” in IEEE Statistical Signal Processing Workshop (SSP), June 2016, pp. 1–5.
 [18] M. Mardani, G. Mateos, and G. B. Giannakis, “Recovery of lowrank plus compressed sparse matrices with application to unveiling traffic anomalies,” IEEE Transactions on Information Theory, vol. 59, no. 8, pp. 5186–5205, 2013.
 [19] R. J. Duffin and A. C. Schaeffer, “A class of nonharmonic Fourier series,” Transactions of the American Mathematical Society, vol. 72, no. 2, pp. 341–366, 1952.
 [20] C. Heil, “What is … a frame?,” Notices of the American Mathematical Society, vol. 60, no. 6, June/July 2013.
 [21] P. S. Huang, S. D. Chen, P. Smaragdis, and M. J. Hasegawa, “Singingvoice separation from monaural recordings using robust principal component analysis,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, 2012, pp. 57–60.
 [22] P. Sprechmann, A. M. Bronstein, and G. Sapiro, “Realtime online singing voice separation from monaural recordings using robust lowrank modeling.,” in ISMIR, 2012, pp. 67–72.
 [23] J. L. Starck, Y. Moudden, J. Bobin, M. Elad, and D. L. Donoho, “Morphological component analysis,” in Optics & Photonics 2005. International Society for Optics and Photonics, 2005, pp. 59140Q–59140Q.
 [24] S. Rambhatla and J. Haupt, “Semiblind source separation via sparse representations and online dictionary learning,” in Signals, Systems and Computers, 2013 Asilomar Conference on. IEEE, 2013, pp. 1687–1691.
 [25] S. Rambhatla, X. Li, and J. Haupt, “A dictionary based generalization of robust PCA with applications,” (In preparation), 2016.
Comments
There are no comments yet.