Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies

04/30/2012
by   Morteza Mardani, et al.
University of Minnesota
0

Given the superposition of a low-rank matrix plus the product of a known fat compression matrix times a sparse matrix, the goal of this paper is to establish deterministic conditions under which exact recovery of the low-rank and sparse components becomes possible. This fundamental identifiability issue arises with traffic anomaly detection in backbone networks, and subsumes compressed sensing as well as the timely low-rank plus sparse matrix recovery tasks encountered in matrix decomposition problems. Leveraging the ability of ℓ_1- and nuclear norms to recover sparse and low-rank matrices, a convex program is formulated to estimate the unknowns. Analysis and simulations confirm that the said convex program can recover the unknowns for sufficiently low-rank and sparse enough components, along with a compression matrix possessing an isometry property when restricted to operate on sparse vectors. When the low-rank, sparse, and compression matrices are drawn from certain random ensembles, it is established that exact recovery is possible with high probability. First-order algorithms are developed to solve the nonsmooth convex optimization problem with provable iteration complexity guarantees. Insightful tests with synthetic and real network data corroborate the effectiveness of the novel approach in unveiling traffic anomalies across flows and time, and its ability to outperform existing alternatives.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

06/05/2013

Sparse Representation of a Polytope and Recovery of Sparse Signals and Low-rank Matrices

This paper considers compressed sensing and affine rank minimization in ...
03/07/2012

In-network Sparsity-regularized Rank Minimization: Algorithms and Applications

Given a limited number of entries from the superposition of a low-rank m...
05/12/2022

Sketching sparse low-rank matrices with near-optimal sample- and time-complexity

We consider the problem of recovering an n_1 × n_2 low-rank matrix with ...
03/25/2015

A Bayesian Approach to Sparse plus Low rank Network Identification

We consider the problem of modeling multivariate time series with parsim...
04/27/2011

Finding Dense Clusters via "Low Rank + Sparse" Decomposition

Finding "densely connected clusters" in a graph is in general an importa...
09/26/2021

Sparse Plus Low Rank Matrix Decomposition: A Discrete Optimization Approach

We study the Sparse Plus Low Rank decomposition problem (SLR), which is ...
05/07/2012

Graph Prediction in a Low-Rank and Autoregressive Setting

We study the problem of prediction for evolving graph data. We formulate...

I Introduction

Let be a low-rank matrix [], and let be sparse (, counts the nonzero entries of its matrix argument). Given a compression matrix with , and observations

(1)

the present paper deals with the recovery of . This task is of interest e.g., to unveil anomalous flows in backbone networks [LCD04, MMG11, zggr05], to extract the time-varying foreground from a sequence of compressed video frames [Branuick_nips11], or, to identify active brain regions from undersampled functional magnetic resonance imagery (fMRI) [Vaswani_Allerton_11]. In addition, this fundamental problem is found at the crossroads of compressive sampling (CS), and the timely low-rank-plus-sparse matrix decompositions.

In the absence of the low-rank component (), one is left with an under-determined sparse signal recovery problem; see e.g., [CT05, rauhut] and the tutorial account [candes_tutorial]. When

, the formulation boils down to principal components pursuit (PCP), also referred to as robust principal component analysis (PCA) 

[CLMW09, CSPW11, bayes_rpca]. For this idealized noise-free setting, sufficient conditions for exact recovery are available for both of the aforementioned special cases. However, the superposition of a low-rank and a compressed sparse matrix in (1) further challenges identifiability of . In the presence of ‘dense’ noise, stable reconstruction of the low-rank and sparse matrix components is possible via PCP [zlwcm10, Outlier_pursuit]. Earlier efforts dealing with the recovery of sparse vectors in noise led to similar performance guarantees; see e.g., [bickel09] and references therein. Even when is nonzero, one could envision a CS variant where the measurements are corrupted with correlated (low-rank) noise [Vaswani_Allerton_11]. Last but not least, when and is noisy, the recovery of

subject to a rank constraint is nothing else than PCA – arguably, the workhorse of high-dimensional data analysis 

[J02].

The main contribution of this paper is to establish that given and in (1), for small enough and one can exactly recover by solving the nonsmooth convex optimization problem

where is a tuning parameter; is the nuclear norm of ( stands for the

-th singular value); and,

denotes the -norm. The aforementioned norms are convex surrogates to the rank and -norm, respectively, which albeit natural as criteria they are NP-hard to optimize [l_0_NP_hard, rank_NP_Duro]. Recently, a greedy algorithm for recovering low-rank and sparse matrices from compressive measurements was put forth in [Branuick_nips11]. However, convergence of the algorithm and its error performance are only assessed via numerical simulations. A recursive algorithm capable of processing data in real time can be found in [Vaswani_Allerton_11], which attains good performance in practice but does not offer theoretical guarantees.

A deterministic approach along the lines of [CSPW11] is adopted first to derive conditions under which (1) is locally identifiable (Section II). Introducing a notion of incoherence between the additive components and , and resorting to the restricted isometry constants of  [CT05], sufficient conditions are obtained to ensure that (P1) succeeds in exactly recovering the unknowns (Section III-A). Intuitively, the results here assert that if and are sufficiently small, the nonzero entries of are sufficiently spread out, and subsets of columns of behave as isometries, then (P1) exactly recovers . As a byproduct, recovery results for PCP and CS are also obtained by specializing the aforesaid conditions accordingly (Section III-B). The proof of the main result builds on Lagrangian duality theory [Bers, Boyd], to first derive conditions under which is the unique optimal solution of (P1) (Section IV-A). In a nutshell, satisfaction of the optimality conditions is tantamount to the existence of a valid dual certificate. Stemming from the unique challenges introduced by , the dual certificate construction procedure of Section IV-B is markedly distinct from the direct sum approach in [CSPW11], and the (random) golfing scheme of [CLMW09]. Section V shows that low-rank, sparse, and compression matrices drawn from certain random ensembles satisfy the sufficient conditions for exact recovery with high probability.

Two iterative algorithms for solving (P1) are developed in Section VI, which are based on the accelerated proximal grandient (APG) method [nesterov83, nesterov05, fista, rpca_proximal], and the alternating-direction method of multipliers (AD-MoM) [Bertsekas_Book_Distr, Boyd]. Numerical tests corroborate the exact recovery claims, and the effectiveness of (P1) in unveiling traffic volume anomalies from real network data (Section VII). Section VIII concludes the paper with a summary and a discussion of limitations, possible extensions, and interesting future directions. Technical details are deferred to the Appendix.

I-a Notational conventions

Bold uppercase (lowercase) letters will denote matrices (column vectors), and calligraphic letters will denote sets. Operators , , , , , , , and will denote transposition, matrix pseudo inverse, matrix trace, matrix vectorization, diagonal matrix, spectral radius, minimum singular value, and Kronecker product, respectively; will be used for the cardinality of a set and the magnitude of a scalar. The identity matrix will be represented by and its -th column by ; while denotes the vector of all zeros, and . The -norm of vector is for . For matrices define the trace inner product . Also, recall that is the Frobenious norm, is the -norm, is the -norm, and is the nuclear norm. In addition, denotes the induced -norm, and likewise for the induced -norm . For the linear operator , define the operator norm , which subsumes the spectral norm . Define also the support set . The indicator function equals one when , and zero otherwise.

Ii Local Identifiability

The first issue to address is model identifiability, meaning that there are unique low-rank and sparse matrices satisfying (1). If there exist multiple decompositions of into with low-rank and sparse , there is no hope of recovering from the data. For instance, if the null space of the fat matrix contains sparse matrices, there may exist a sparse perturbation such that is still sparse and is a legitimate solution. Another problematic case arises when there is a sparse perturbation such that is spanned by the row or column spaces of . Then, has the same rank as and may still be sparse. As a result, one may pick as another valid solution. Dealing with such identifiability issues is the subject of this section.

Let

denote the singular value decomposition (SVD) of

, and consider the subspaces: s1) of matrices in either the column or row space of ; s2) of matrices in with support contained in the support of ; and s3) . For notational brevity, s1)-s3) will be henceforth denoted as . Noteworthy properties of these subspaces are: i) both and , hence it is possible to directly compare elements from them; ii) and ; and iii) if is added to , then .

For now, assume that the subspaces and are also known. This extra information helps identifiability of (1), because potentially troublesome solutions are limited to a restricted class. If or , that candidate solution is not admissible since it is known a priori that and . Under these assumptions, the following lemma puts forth the necessary and sufficient conditions guaranteeing unique decomposability of according to (1) – a notion known as local identifiability [CLMW09].

Lemma 1: Matrix uniquely decomposes into if and only if , and .

Proof:

Since by definition and , one can represent every element in the subspaces and as and , respectively, where and . Assume that , and suppose by contradiction that there exist nonzero perturbations such that . Then, , meaning that and belong to the same subspace, which contradicts the assumption. Conversely, suppose there exists a non-zero . Clearly, is a feasible solution where and . This contradicts the uniqueness assumption. In addition, the condition ensures that only when for .

In words, (1) is locally identifiable if and only if the subspaces and intersect transversally, and the sparse matrices in are not annihilated by . This last condition is unique to the setting here, and is not present in [CLMW09] or [CSPW11].

Remark 1 (Projection operators)

Operator () denotes the orthogonal projection of onto the subspace (orthogonal complement ). It simply sets those elements of not in to zero. Likewise, () denotes the orthogonal projection of onto the subspace (orthogonal complement ). Let and denote, respectively, projection onto the column and row spaces of . It can be shown that , while the projection onto the complement subspace is . In addition, the following identities

(2)

of orthogonal projection operators such as , will be invoked throughout the paper.

Ii-a Incoherence measures

Building on Lemma II, alternative sufficient conditions are derived here to ensure local identifiability. To quantify the overlap between and , consider the incoherence parameter

(3)

for which it holds that . The lower bound is achieved when and are orthogonal, while the upper bound is attained when contains a nonzero element. Assuming , then represents the cosine of the angle between and  [Deutsch]. From Lemma II, it appears that guarantees . As it will become clear later on, tighter conditions on will prove instrumental to guarantee exact recovery of by solving (P1).

To measure the incoherence among subsets of columns of , which is tightly related to the second condition in Lemma II, the restricted isometry constants (RICs) come handy [CT05]. The constant measures the extent to which a -subset of columns of behaves like an isometry. It is defined as the smallest value satisfying

(4)

for every with and for some positive normalization constant  [CT05]. For later use, introduce which measures ‘how orthogonal’ are the subspaces generated by two disjoint column subsets of , with cardinality and . Formally, is the smallest value that satisfies

(5)

for every , where and . The normalization constant plays the same role as in . A wide family of matrices with small RICs have been introduced in e.g., [CT05].

All the elements are now in place to state this section’s main result.

Proposition 1: Assume that each column of contains at most nonzero elements. If and , then and .

Proof:

Suppose the intersection in nontrivial, meaning that there exists nonzero matrices and satisfying . Vectorizing the last equation and relying on the identity , one obtains a linear system of equations

(6)

where . Define an matrix and the matrix . The corresponding coefficients are and . Then, (6) implies there exists a such that .

Consider two cases: i) , and ii) . Under i) , and thus for some nonzero with where . Therefore, if , implies that , which is a contradiction. For ii) implies that there is no with and such that , since otherwise which leads to .

Iii Exact Recovery via Convex Optimization

In addition to , there are other incoherence measures which play an important role in the conditions for exact recovery. Consider a feasible solution , where and thus . It may then happen that and , while , challenging identifiability when and are unknown. Similar complications will arise if has a sparse row space that could be confused with the row space of . These issues motivate defining

where . The maximum of is attained when is in the column [row] space of for some . Small values of and imply that the column and row spaces of do not contain the columns of and sparse vectors, respectively.

Another identifiability issue arises when for some sparse matrix . In this case, each column of is spanned by a few columns of . Consider the parameter

A small value of implies that each column of is spanned by sufficiently many columns of . To understand this property, suppose for simplicity that all nonzero singular values of are identical and equal to , say. The -th column of is then , and its projection onto the -th column of is

Since the energy of is somehow allocated along the directions , if all the aforementioned projections can be made arbitrarily small, then sufficiently many nonzero terms in the expansion are needed to account for all this energy.

Iii-a Main result

Theorem 1: Consider given matrices and obeying , with and . Assume that every row and column of has at most nonzero elements, and that has orthonormal rows. If the following conditions

I)

; and

II)

hold, where

then there exists for which the convex program (P1) exactly recovers .

Note that I) alone is already more stringent than the pair of conditions and needed for local identifiability (cf. Proposition II-A). Satisfaction of the conditions in Theorem III-A hinges upon the values of the incoherence parameters , and the RICs and . In particular, are increasing functions of these parameters, and it is readily observed from I) and II) that the smaller are, the more likely the conditions are met. Furthermore, the incoherence parameters are increasing functions of the rank and sparsity level . The RIC is also an increasing function of , the maximum number of nonzero elements per row/column of . Therefore, for sufficiently small values of , the sufficient conditions of Theorem III-A can be indeed satisfied.

It is worth noting that not only , but also the position of the nonzero entries in plays an important role in satisfying I) and II). This is manifested through , for which a small value indicates the entries of are sufficiently spread out, i.e., most entries do not cluster along a few rows or columns of . Moreover, no restriction is placed on the magnitude of these entries, since as seen later on it is only the positions that affect optimal recovery via (P1).

Remark 2 (Row orthonormality of )

Assuming is equivalent to supposing that is full-rank. This is because for a full row-rank , one can pre-multiply both sides of (1) with to obtain with orthonormal rows.

Iii-B Induced recovery results for principal components pursuit and compressed sensing

Before delving into the proof of the main result, it is instructive to examine how the sufficient conditions in Theorem III-A simplify for the subsumed PCP and CS problems. In PCP one has , which implies and . To obtain sufficient conditions expressed only in terms of , one can borrow the coherence conditions of [CLMW09] and readily arrive at the following result.

Corollary 1: Consider given obeying , with and . Suppose the coherence conditions , , and hold for some positive constant . If is sufficiently small such that the following conditions

)

; and

)

hold, where

then there exists for which the convex program (P1) with exactly recovers .

In Section V, random matrices drawn from natural ensembles are shown to satisfy I) and II) with high probability. In this case, it is possible to arrive at simpler conditions (depending only on , , and the matrix dimensions) for exact recovery in the context of PCP; see Remark 6. Corollary III-B, on the other hand, offers general conditions stemming from a purely deterministic approach.

In the CS setting one has , which implies . As a result, Theorem III-A simply boils down to a RIC-dependent sufficient condition for the exact recovery of as stated next.

Corollary 2: Consider given matrices and obeying . Assume that the number of nonzero elements per column of does not exceed . If

(7)

holds, then (P1) with exactly recovers .

To place in context, consider normalizing the rows of . For such a compression matrix it is known that , see e.g., [rauhut]. Using this bound together with (7), one arrives at the stricter condition . This last condition is identical to the one reported in [donoho_elad_cs], which guarantees the success of -norm minimization in recovering sparse solutions to under-determined systems of linear equations. The conditions have been improved in recent works; see e.g., [rauhut] and references therein.

Iv Proof of the Main Result

In what follows, conditions are first derived under which is the unique optimal solution of (P1). In essence, these conditions are expressed in terms of certain dual certificates. Then, Section IV-B deals with the construction of a valid dual certificate.

Iv-a Unique optimality conditions

Recall the nonsmooth optimization problem (P1), and its Lagrangian

(8)

where is the matrix of dual variables (multipliers) associated with the constraint in (P1). From the characterization of the subdifferential for nuclear- and -norm (see e.g., [Boyd]), the subdifferential of the Lagrangian at is given by (recall that )

(9)
(10)

The optimality conditions for (P1) assert that is an optimal (not necessarily unique) solution if and only if

This can be shown equivalent to finding the pair that satisfies: i) ; ii) ; and iii) . In general, i)-iii) may hold for multiple solution pairs. However, the next lemma asserts that a slight tightening of the optimality conditions i)-iii) leads to a unique optimal solution for (P1). See Appendix A for a proof.

Lemma 2: Assume that each column of contains at most nonzero elements, as well as and . If there exists a dual certificate satisfying

C1)

C2)

C3)

C4)

then is the unique optimal solution of (P1).

The remainder of the proof deals with the construction of a dual certificate that meets C1)-C4). To this end, tighter conditions [I) and II) in Theorem III-A] for the existence of are derived in terms of the incoherence parameters and the RICs. For the special case , the conditions in Lemma IV-A boil down to those in [CSPW11, Prop. 2] for PCP. However, the dual certificate construction techniques used in [CSPW11] do not carry over to the setting considered here, where a compression matrix is present.

Iv-B Dual certificate construction

Condition C1) in Lemma IV-A implies that , for arbitrary (cf. Remark 1). Upon defining and , C1) and C2) are equivalent to .

To express in terms of the unrestricted matrix , first vectorize to obtain . Define and an matrix formed with those rows of associated with those elements in . Likewise, define which collects the remaining rows from such that for a suitable row permutation matrix . Finally, let be the vector of length containing those elements of with indices in . With these definitions, C1) and C2) can be expressed as

To upper-bound the left-hand side of C3) in terms of , use the assumption to arrive at

Similarly, the left-hand side of C4) can be bounded as

In a nutshell, if one can find such that

c1)

c2)

c3)

hold for some positive , then C1)-C4) would be satisfied as well.

The final steps of the proof entail: i) finding an appropriate candidate solution such that ) holds; and ii) deriving conditions in terms of the incoherence parameters and RICs that guarantee meets the required bounds in ) and ) for a range of values. The following lemma is instrumental to accomplishing i), and its proof can be found in Appendix B.

Lemma 3: Assume that each column of contains at most nonzero elements, as well as and . Then matrix has full row rank, and its minimum singular value is bounded below as

According to Lemma IV-B, the least-norm (LN) solution exists, and is given by

(11)
Remark 3 (Candidate dual certificate)

From the arguments at the beginning of this section, the candidate dual certificate is .

The LN solution is an attractive choice, since it facilitates satisfying ) and ) which require norms of to be small. Substituting the LN solution (11) into the left hand side of ) yields (define for notational brevity)

(12)

Moreover, substituting (11) in the left hand side of ) results in

(13)

Next, upper-bounds are obtained for and ; see Appendix C for a proof.

Lemma 4: Assume that each column and row of contains at most nonzero elements. If and hold, then