Recently, sparse recovery methods (e.g. see [donoho2006compressed, mallat2008wavelet]
) have become very popular for compressing and processing high-dimensional data. In particular, they have found widespread applications in data acquisition[candes2008introduction]elhamifar2009sparse, wright2009robust], medical imaging [lustig2006rapid, lustig2007sparse, candes2006robust, LiMiZoSe:15], and networking [coates2007compressed, haupt2008compressed, xu2011compressive].
The goal of sparse recovery is to reconstruct a signal from linear measurements , where is the measurement matrix (also known as dictionary or sparsifying basis). In general, the recovery problem is ill-posed unless is assumed to satisfy additional assumptions. For example, if we assume that is -sparse111 The vector
The vectoris -sparse if and only if at most of its entries are nonzero., where , then mild conditions on the measurement matrix (see Lemma 1) allow for the recovery from using the -minimization problem
where denotes the number of nonzero entries in . However, problem (1) is known to be NP-hard [michael1979computers]. To address this challenge, a common strategy is to solve a convex relaxation of (1) based on -minimization given by
which can be written equivalently as a linear program. It is known that the sparse signal can be recovered from (2) if the measurement matrix satisfies certain conditions. In general, these conditions are either computationally hard to verify, or too conservative so that false negative certificates are often encountered. For example, the Nullspace Property (NUP)[cohen2009compressed], which provides necessary and sufficient conditions for sparse recovery, and the Restricted Isometry Property (RIP) [candes2005decoding, baraniuk2008simple], which is only a sufficient condition for sparse recovery, are both NP-hard [TiPf:14] to check. On the other hand, the Mutual Coherence [donoho2001uncertainty] property is a sufficient condition that can be efficiently verified[TiPf:14], but is conservative in that it fails to certify that sparse recovery is possible for many measurement matrices seen in practice.
Another major limitation of these recovery guarantees and their associated computational complexity arises because they must allow for the worst case problem in their derivation. For example, the fact that the NUP and RIP are NP-hard to check does not prohibit efficient verification for particular subclasses of matrices. Similarly, even when the NUP is not satisfied for a sparsity level , it may still be possible to recover certain subsets of -sparse signals by exploiting additional knowledge about their support. These observations suggest studying the sparse recovery problem for subclasses of matrices and signals to obtain conditions that are easier to verify as well as specialized algorithms with improved recovery performance.
The goal of this paper is to study sparse recovery of signals that are defined over graphs when the measurement matrix is the graph’s incidence matrix. Our interest in incidence matrices stems from the fact that they are a fundamental representation of graphs, and thus a natural choice for the dictionary when analyzing network flows. In various application areas like communication networks, social networks, and transportation networks, the incidence matrices naturally appear when modeling the flow of information, disease, and goods (e.g., the detection of sparse structural network changes via observations at the nodes can be modeled as (1), where the incidence matrix serves as a measurement matrix).
The main contributions of this paper are as follows:
We derive a topological characterization of the NUP for incidence matrices. Specifically, we show that the NUP for these matrices is equivalent to a condition on simple cycles of the graph, which is a finite subset of the nullspace of the incidence matrix. As a consequence, we show that for incidence matrices the sparse recovery guarantee depends only on the girth222Girth: Size of the smallest cycle. of the underlying graph. This overcomes NP-hardness, as the girth of a graph can be calculated in polynomial time.
Using the above topological characterization, we further derive necessary and sufficient conditions on the support of the sparse signal that enable its recovery. Specifically, for incidence matrices we show that all signals with a given support can be recovered via (2) if and only if the support consists of strictly less than half of the edges in every simple cycle of the graph. Since our conditions on depend on its support, we will refer to them as support-dependent conditions.
We propose a specialized algorithm that utilizes the knowledge of the support of the measurements and the structure of incidence matrices to constrain the support of the signal , and consequently can guarantee sparse recovery of under even weaker conditions.
The remainder of this paper is organized as follows: In Section II we review basic concepts from convex analysis, graph theory, and sparse recovery. In Section III we formulate sparse recovery problem for incidence matrices, derive conditions for sparse and support-dependent recovery, and propose an efficient algorithm to solve the problem. In Section IV we present numerical experiments that illustrate our theoretical results, and in Section V we provide some conclusions.
Given , we let for denote the -norm, and denote the function that counts the number of nonzero entries in by . Although, the latter function is not a norm, we follow the common practice of calling it the -norm. For , we define the unit -sphere in as . Similarly, the unit -ball in is defined as
The nullspace of a matrix will be denoted by . As usual, denotes the cardinality of a set . For a vector and a nonempty index set , we denote the subvector of that corresponds to by , and the associated mapping by , i.e., . The complement of in is denoted by .
Ii-B Convex analysis
Critical to our analysis will be the notion of extreme points of a convex set and of quasi-convex functions.
An extreme point of a convex set is a point in that cannot be written as a convex combination of two different points from . The set of all extreme points of a convex set is denoted by .
Let be a convex set. A function is called quasi-convex if and only if for all and it holds that
It is easy to show that a function is quasi-convex if and only if every sublevel set is a convex set. Every convex function is quasi-convex. In particular, -norm functions for are quasi-convex. The next result on quasi-convex functions [FlBa:15] is included for completeness.
Let be a compact convex set and be a continuous quasi-convex function. Then attains its maximum value at an extreme point of .
Ii-C Graph theory
A directed graph with vertex set and edge set will be denoted by . When the edge and vertex sets are irrelevant to the discussion, we will drop them from the notation and denote the graph by . Sometimes the vertex set and the edge set of will be denoted by and respectively. A graph is called simple if there is at most one edge connecting any two vertices and no edge starts and ends at the same vertex, i.e. has no self loops. Henceforth, will always denote a simple directed graph with a finite number of edges and vertices. Although we focus on directed graphs, our analysis only requires the undirected variants of the definitions for paths, cycles, and connectivity.
We say that two vertices are adjacent if either or . A sequence of distinct vertices of a graph such that and are adjacent for all is called a path. A connected component of is a subgraph of in which any two distinct vertices are connected to each other by a path and no edge exists between and . We will say that the graph is connected (for directed graphs this is often referred to as weakly connected) if has a unique connected component. A sequence of adjacent vertices of a graph is called a (simple) cycle if and whenever ; the length of such a cycle is . The length of the shortest simple cycle of a graph is called the girth of . For an acyclic graph (i.e., a graph with no cycles), the girth is defined to be . Since acyclic graphs are not interesting for our purposes, we will assume that the girth is finite. That is, the graph has at least one simple cycle.
Associated with a directed graph , we can define the incidence matrix as
For a nonempty index set , the subgraph of consisting of edges is denoted by . The incidence matrix of is denoted by . Let be a simple cycle of a simple directed graph . Then, can be associated with a vector , where each coordinate of is defined as
We now define the cycle space of as the subspace spanned by .333Sometimes this subspace is called the Flow Space [GoRo:01].
The cycle space of is exactly the nullspace of the incidence matrix [GoRo:01].
The dimension of the nullspace of is , where is the number of connected components of . Hence, the rank of is [GoRo:01].
Ii-D Sparse Recovery
It is known [FoRa:13] that every vector with is the unique solution to (2) if and only if
This leads us to the definition of the Nullspace Property.
Definition 3 (Nullspace Property, NUP)
A matrix is said to satisfy the nullspace property (NUP) of order , if for any , and any nonempty index set with , we have
Another needed concept is the spark of a matrix.
Definition 4 (Spark)
The spark of a matrix is the smallest number of linearly dependent columns in . Formally, we have s.t. .
We note that the rank of a matrix may be used to bound its spark. Specifically, it holds that
The spark may be used to provide a necessary and sufficient condition for uniqueness of sparse solutions [FoRa:13].
For any matrix , the -sparse signal is the unique solution to the optimization problem
if and only if .
Iii Main Results
It is not difficult to prove that the spark of an incidence matrix is equal to the girth of the underlying graph . By combining this fact with Lemma 1, we obtain the following.
Let be the incidence matrix of a simple, connected graph with girth . Then, for every -sparse vector , is the unique solution to
if and only if .
Even though Proposition 2 is useful as a uniqueness result, it does not come in handy when one would like to recover the original signal from the measurements . For this purpose, an -relaxation of the optimization problem is preferable. Hence, one needs a theorem akin to Proposition 2 that addresses the solutions of the optimization problem
This leads us to study the NUP for incidence matrices. Specifically, in this section we answer the following questions about the incidence matrix of a simple connected graph with edges and vertices, and an -sparse vector :
What are necessary and sufficient conditions for to satisfy the NUP of order ? Such conditions would guarantee sparse recovery, i.e., that any -sparse signal can be recovered as the unique solution to (10).
If traditional sparse recovery is not possible, can we characterize subclasses of sparse signals that are recoverable via (10) in terms of the support of the signal and the topology of the graph ?
Can we use the support of the measurement and the structure of A to derive constraints on the support of that allow us to modify (10) and successfully recover the sparse signal ?
Iii-a Topological Characterization of the Nullspace Property for the Class of Incidence Matrices
Before addressing the questions above in detail, we would like to build a simple framework to study them. We will start with a reformulation of the NUP.
A matrix satisfies the NUP of order if and only if
For , using , we get
Using this inequality and Definition 3, it follows that satisfies NUP of order if and only if
The value of the left hand side of (11) is called the nullspace constant in the literature [TiPf:14]. The calculation of the nullspace constant for an arbitrary matrix and sparsity is known to be NP-hard [TiPf:14].
The reformulation of the NUP in Lemma 2 has certain benefits. For a fixed index set , it draws our attention to the optimization problem
which is the maximization of a continuous convex function over a compact convex set . Thus, it follows from Proposition 1 that the maximum is attained at an extreme point of . This leads us to want to understand the extreme points of this set, which can be a computationally involved task for arbitrary matrices . Nonetheless, one can still set a bound on the sparsity of the extreme points of .
If is an matrix of rank , then extreme points of are at most -sparse.
If , then the result of the lemma holds trivially since .
The unit -sphere in , namely , may be written as the union of -dimensional simplices. Hence, if , each extreme point is contained in an -dimensional simplex.
In particular, if , we argue that, no extreme point of lies in the interior of these -dimensional simplices. This is because in this case the intersection of and the interior of the -dimensional simplex is either empty, or a non-singleton open convex subset of , where every point is a convex combination of two distinct points. Hence, no extreme point can live there. So, the extreme points must lie in the boundary of -dimensional simplices, which are -dimensional simplices. Moreover, these -dimensional simplices are where one coordinate becomes zero. Hence, at least one coordinate of an extreme point living in an -dimensional simplex is zero.
As long as the sum of the dimension of the simplex, which contains the extreme point, and the dimension of is strictly larger than , we could repeat the argument in the previous paragraph. This argument stops when the dimension of the simplex containing the extreme point is . Thus, at least coordinates of the extreme point have to be zero, so that each extreme point is at most -sparse.
Although Lemma 3 is analogous to (7), it is stronger in the sense that the statement bounds the sparsity of all the extreme points of the convex set , not just the sparsest vectors in the , as is implied by (7). Armed with Lemma 3, it turns out that for incidence matrices, these extreme points have a nice characterization in terms of the properties of the underlying graph.
Let be the incidence matrix of a simple connected graph that has at least one simple cycle, and let denote the set of normalized simple cycles of , i.e.
Then, we have .
Let be an extreme point of with support . We necessarily have . Suppose that has vertices and connected components. Note that has only nonzero entries. We claim that is an extreme point of . For a proof by contradiction, suppose that could be written as a convex combination of two distinct vectors in
. In this case, one could pad those vectors with zeros at the coordinates into get two distinct vectors in , whose convex combination would give . Since this would contradict the fact that is an extreme point, we must conclude that is an extreme point of .
Combining (16) with the rank-nullity theorem yields
This may be combined with to conclude that ; thus is spanned by . It now follows from Remark 1 that corresponds to the unique simple cycle in . In addition, since has no zero entries, the corresponding simple cycle includes every edge of . It follows that corresponds to a simple cycle in , which combined with proves .
Let be the incidence matrix of a simple, connected graph , and let denote the set of normalized simple cycles of as in (15). Let be a nonempty index set. Then
Since , we obviously have
For the converse inequality we argue as follows: Since is a continuous convex function (thus also quasi-convex), and is a compact convex set, the maximum in the left hand side of (17) will be attained at an extreme point of (see Proposition 1). That is,
But by Proposition 3. Hence,
Combining this with (18) we get the converse inequality, and hence the result.
Theorem 1 builds a connection between the algebraic condition NUP and a topological property of the graph, namely its simple cycles. This connection is what we will primarily exploit in the rest of the paper.
Let us make a simple but important observation.
Let and be as in Theorem 1. Then, for any index set and , it holds that
For any simple cycle , the entries of are in the set . Hence, entries of any are from the set , from which the result follows.
Iii-B Polynomial Time Guarantees for Sparse Recovery
Let be the incidence matrix of a simple, connected graph with girth . Then, every -sparse vector is the unique solution to
if and only if .
If , then (20) is equal to . If , the maximum is attained at the smallest simple cycle when picks edges only from this cycle, and the maximum is equal to . Hence,
As was mentioned in Remark 3, calculating the nullspace constant is NP-hard in general. However, there are algorithms that can calculate the girth of a graph exactly in or sufficiently accurately for our purposes in (see [ItaiR:78]). Therefore, Theorem 2 reveals that the nullspace constant can be calculated—hence the NUP can be verified—for graph incidence matrices in polynomial time.
Iii-C Guarantees for Support-dependent Sparse Recovery
The sparse recovery result given by Theorem 2 depends on the cardinality of the support of the unknown signal. For a graph with small girth (e.g., ), this result establishes sparse recovery guarantees only for signals with low sparsity levels (specifically, ). It is natural, therefore, to ask the following question: Given a graph with relatively small girth, can we identify a class of signals with sparsity that can be recovered? This leads us to the study of support-dependent sparse recovery, where we not only focus on the cardinality of the support of the unknown signal , but also take the location of the support into account. More precisely, we have the following result.
Let be the incidence matrix of a simple, connected graph , let denote the set of normalized simple cycles of as in (15), and let . Then, every vector with is the unique solution to the optimization problem
if and only if
In comparison to Theorem 2, Theorem 3 provides a deeper insight into the vectors that can be recovered from the observations via -minimization. Specifically, any vector whose support within each simple cycle has size strictly less than half the size of that cycle, can be successfully recovered.
As an example consider the graph in Fig. 1, which has girth . Thus, it follows from Theorem 2 that any -sparse signal can be recovered. However, in reality, some signals that are supported on more than one edge can also be recovered. For instance, Theorem 3 guarantees that a signal supported on edges or can be recovered as the unique solution to the problem (22) since for each of the three simple cycles, its intersection with the support of the unknown vector is less than half of the length of the simple cycle. This reveals that different sparsity patterns of the same sparsity level can have different recovery performances.
To use Theorem 3 as a way of providing a support-dependent recovery guarantee, one needs to compute the intersection of the support of with all simple cycles in the graph. A number of algorithms with a polynomial time complexity bound for cycle enumeration are known [mateti1976algorithms].
Iii-D Using Measurement Sparsity to Aid Recovery
When there is additional information about the support of the unknown signal, Theorem 3 gives a necessary and sufficient condition for exact recovery. In practice, this information is usually missing. However, the special structure of the incidence matrix and its connection to the graph can help us circumvent this difficulty. Notice that the columns of any incidence matrix are always -sparse, which means that the measurement will be -sparse for any -sparse signal . Therefore, one can seek to obtain a superset of by observing , i.e. the vertices with nonzero measurements. Typically, this observation reduces the size of the problem, and gives rise to Algorithm 1.
There are cases where Algorithm 1 may fail. For instance, when the unknown nonzero signal is in the nullspace of one of the rows of , say , and (i.e. when the non-zero values of the signal cancel each other at a vertex ), Algorithm 1 would simply ignore the corresponding vertex, and hence, the edges connected to it. This leads to a wrong subgraph and possibly an incorrect outcome . Fortunately, this case happens with zero probability when the signal has random support and random values.
. Fortunately, this case happens with zero probability when the signal has random support and random values.
In comparison to -minimization on the original graph, Algorithm 1 may have improved support-dependent recovery performance because the formation of the subgraph may eliminate cycles; this is especially true for large-scale networks. For concreteness, let us give a simple example.
Consider the graph depicted in Fig. 1. Let be a random signal that is supported on the edges , so that (with probability one) there are nonzero measurements on nodes . Then, cannot be recovered using the incidence matrix of the original graph. However, the subgraph is acyclic, as shown in Fig. 2, which allows exact recovery via Algorithm 1.
In this section we provide numerical simulation results on the recovery performance associated with incidence matrices. In all experiments, the signal
has random support with each nonzero entry drawn i.i.d. from the standard normal distribution. We use the CVX software package[grant2008cvx] to solve the optimization problems. A vector is declared to be recovered if the -norm error is less than or equal to .
Fig. 4 shows the probability of exact recovery of signals via -relaxation over a sequence of graphs containing two cycles (see Fig. 3). Each graph has a cycle of length and a larger one of varying lengths , and , respectively. For each sparsity level, 1000 trials were performed. According to Theorem 2, for all graphs we can recover 1-sparse signals since the girth is 3 for each graph in the sequence. When the sparsity level is increased, we expect that the probability of exact recovery will increase for graphs with larger cycles because it becomes less likely that the support of the random signal will consist of more than half of the edges for one of the simple cycles in the graph. Note that this agrees with our observation in Fig. 4.
We now evaluate the performance of Algorithm 1 against the -minimization method in (10) on two graphs with nodes (see Fig. 5). The first graph, , is a simple cycle connecting node 1 to node 20 in order (blue edges in Fig. 5). The second graph, , consists of both blue and red edges. The red edges connect each node to its third neighbor, i.e., . Fig. 6 and Fig. 7 show the performance of both algorithms on and respectively. For each sparsity level, the experiments are repeated 1000 times to compute recovery probabilities. For , Algorithm 1 outperforms -minimization since the reduced graph in Algorithm 1 will be acyclic in some cases. For , Algorithm 1 will eliminate small cycles when forming the subgraph and lead to higher recovery probability for fixed due to a larger girth.
In this paper we studied sparse recovery for the class of graph incidence matrices. For such matrices we provided a characterization of the NUP in terms of the topological properties of the underlying graph. This characterization allows one to verify sparse recovery guarantees in polynomial time for the class of incidence matrices. Moreover, we showed that support-dependent recovery performance can also be analyzed in terms of the simple cycles of the graph. Finally, by exploiting the structure of incidence matrices and sparsity of the measurements, we proposed an efficient algorithm for recovery and evaluated its performance numerically.