On the construction of sparse matrices from expander graphs

04/24/2018
by   Bubacarr Bah, et al.
University of Oxford
0

We revisit the asymptotic analysis of probabilistic construction of adjacency matrices of expander graphs proposed in bah2013vanishingly. With better bounds we derived a new reduced sample complexity for the number of nonzeros per column of these matrices, precisely d = O(_s(N/s) ); as opposed to the standard d = O((N/s) ). This gives insights into why using small d performed well in numerical experiments involving such matrices. Furthermore, we derive quantitative sampling theorems for our constructions which show our construction outperforming the existing state-of-the-art. We also used our results to compare performance of sparse recovery algorithms where these matrices are used for linear sketching.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

02/10/2021

Sparse graph based sketching for fast numerical linear algebra

In recent years, a variety of randomized constructions of sketching matr...
03/16/2020

When Can Matrix Query Languages Discern Matrices?

We investigate when two graphs, represented by their adjacency matrices,...
03/21/2016

Convex block-sparse linear regression with expanders -- provably

Sparse matrices are favorable objects in machine learning and optimizati...
06/18/2016

Mathematical Foundations of the GraphBLAS

The GraphBLAS standard (GraphBlas.org) is being developed to bring the p...
12/20/2019

Covering point-sets with parallel hyperplanes and sparse signal recovery

Let S be a set of k > n points in a Euclidean space R^n, n ≥ 1. How many...
10/12/2010

Optimal designs for Lasso and Dantzig selector using Expander Codes

We investigate the high-dimensional regression problem using adjacency m...
04/29/2021

Constructions in combinatorics via neural networks

We demonstrate how by using a reinforcement learning algorithm, the deep...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Sparse binary matrices, say , with are widely used in applications including graph sketching [1, 21], network tomography [32, 14], data streaming [31, 25], breaking privacy of databases via aggregate queries [19], compressed imaging of intensity patterns [16], and more generally combinatorial compressed sensing [15, 33, 28, 8, 29, 30], linear sketching [25], and group testing [18, 22]. In all these areas we are interested in the case where , in which case is used as an efficient encoder of sparse signals with sparsity , where they are known to preserve

distance of sparse vectors

[7]. Conditions that guarantee that a given encoder, , also referred to as a sensing matrix in compressed sensing, typically include the the nullspace, coherence, and the restricted isometry conditions, see [20] and references there in. The goal is for to satisfy one or more of these conditions with the minimum possible , the number of measurements. For uniform guarantees over all , it has been established that has to be

, but that with high probability on the draw of random

, can be for

with entries drawn from a sub-gaussian distribution, see

[20]

for a review of such results. Matrices with entries drawn from a Bernoulli distribution fall in the family of sub-gaussian but these are

dense as opposed the the sparse binary matrices considered here. For computational advantages, such as faster application and smaller storage, it is advantageous to use sparse in application [7, 4, 29].

Herein we consider the achievable when is an adjacency matrix of a expander graph [7], expander graph will be defined in the next section. Hence the construction of such matrices can be construed as either a linear algebra problem or equivalently a graph theory one (in this manuscript we will focus more on the linear algebra discourse). There has been significant research on expander graphs in pure mathematics and theoretical computer science, see [24] and references therein. Both deterministic and probabilistic constructions of expander graphs have been suggested. The best known deterministic constructions achieve for [23]. One the other hand random constructions, first proven in [5], achieve the optimal , precisely , with , where is the left degree of the expander graph but also the number of ones in each column of , to be defined in the next section. However, to the best of our knowledge, it was [4] that proposed a probabilistic construction that is not only optimal but also more suitable to making quantitative statements where such matrices are applied.

This work follows the probabilistic construction proposed in [4] but with careful computation of the bounds, is able to achieve with . We retain the complexity of but got a smaller complexity for , which is novel. Related results with a similar were derived in [27, 2] but for structure sparse signals in the framework of model-based compressing sensing or sketching. In that framework, one has second order information about beyond simple sparsity, which is first order information about . It is thus expected and established that it is possible to get a small and hence a smaller . Arguably, such a small complexity for justifies in hindsight fixing to a small number in simulations with such as in [4, 2, 29], just to mention a few.

The results derive here are asymptotic, though finite dimensional bounds follow directly. We focus on for what ratios of the problem dimensions

does these results hold. There is almost a standard way of interrogating such a question, i.e. phase transitions, probably introduced to the compressed sensing literature by

[17]. In other words, we derive sampling theorems numerically depicted by phase transition plots about problem size spaces for which our construction holds. This is similar to what was done in [4] but for comparison purposes we include phase transition plots from probabilistic constructions by [11, 6]. The plots show improvement over these earlier works. Furthermore, we show implications of our results for compressed sensing by using our results with the phase transition framework to compare the performance of selected combinatorial compressed sensing algorithms as is done in [4, 29].

The manuscript is organized as follows. Section 1 gives the introduction; while Section 2 sets the notation and defines some useful terms. The main results are stated in Section 3 and the details of the construction is given in Section 4. This is followed by a discussion in Section 5 about our results, comparing them to existing results and using the results to compare the performance of some combinatorial compressed sensing algorithms. In Section 6 we state the remaining proofs of theorems, lemmas, corollaries, and propositions used in this manuscript. After this section is the conclusion in Section 7. We include an appendix in Section 8, where we summarized key relevant materials from [4], and showed the derivation of some bounds used in the proofs.

2 Preliminaries

2.1 Notation

Scalars will be denoted by lowercase letters (e.g. ), vectors by lowercase boldface letters (e.g., ), sets by uppercase calligraphic letters (e.g., ) and matrices by uppercase boldface letters (e.g. ). The cardinality of a set is denoted by and . Given , its complement is denoted by and is the restriction of to , i.e.  if and otherwise. For a matrix , the restriction of to the columns indexed by is denoted by . For a graph, denotes the set of neighbors of , that is the nodes that are connected to the nodes in , and represents an edge connecting node to node . The norm of a vector is defined as .

2.2 Definitions

Below we give formal definitions that will be used in this manuscript.

Definition 2.1 (-norm restricted isometry property).

A matrix satisfies the -norm restricted isometry property (RIP-p) of order and constant if it satisfies the following inequality.

(1)

The most popular case is RIP-2 and was first proposed in [12]. Typically when RIP is mentioned without qualification, it means . In the discourse of this work though RIP-1 is the most relevant. The RIP says that is a near-isometry and it is a sufficient condition to guarantee exact sparse recovery in the noiseless setting (i.e. ); or recovery up to some error bound, also referred to as optimality condition, in the noisy setting (i.e. , where is the bounded noise vector). we define optimality condition more precisely below.

Definition 2.2 (Optimality condition).

Given and for a reconstruction algorithm , the optimal error guarantee is

(2)

where depend only on the RIP constant (RIC), i.e. , and not the problem size, , and denote the error of the best -term approximation in the -norm, that is

(3)

Equation (2) is also referred to as the optimality condition (or error guarantee). Ideally, we would like , but the best provable is [12], weaker than this is the [7], which is what is possible with the considered in this work.

To aid translation between the terminology of graph theory and linear algebra we define the set of neighbors in both notation.

Definition 2.3 (Definition 1.4 in [4]).

Consider a bipartite graph where is the set of edges and is the edge that connects vertex to vertex . For a given set of left vertices its set of neighbors is . In terms of the adjacency matrix, , of the set of neighbors of for , denoted by , is the set of rows with at least one nonzero.

Definition 2.4 (Expander graph).

Let be a left-regular bipartite graph with left vertexes, right vertexes, a set of edges and left degree . If, for any and any of size , we have that , then is referred to as a -expander graph.

The is referred to as the expansion coefficient of the graph. A -expander graph, also called an unbalanced expander graph [7] or a lossless expander graph [13], is a highly connected bipartite graph. We denote the ensemble of binary matrices with ones per column by , or just to simplify notation. We also will denote the ensemble of adjacency matrices of -expander graphs as or simply .

3 Results

The main result of this work is formalized in Theorem 3.1, which is an asymptotic result, where the dimensions grow while their ratios remain bounded. This is also referred to as the propoational growth asymptotics [10, 3].

Theorem 3.1.

Consider and let , a random draw of an matrix from , i.e. for each column of uniformly assign ones in out of positions, as while and , with probability approaching exponentially, the matrix with

(4)

The proof of this theorem is found Section 6.1. It is worth emphasizing that the complexity of is novel and it is the main contribution of this work.

Furthermore, in the proportional growth asymptotics, i.e. as while and with , for completeness, we derived a phase transition function (curve) in -space below which Theorem 3.1 is satisfied with high probability and the reverse is true. This is formalized in the following lemma.

Lemma 3.1.

Fix and let , as while and then for and , a random draw of from implies with probability approaching exponentially.

The proof of this lemma is given in Section 6.2. The phase transition function turned out to be significantly higher that those derived from existing probabilistic constructions, hence our results are significant improvement over earlier works. This will be graphically demonstrated with some numerical simulations in Section 5.

4 Construction

The standard probabilistic construction is for each column of to uniformly assign ones in out of positions; while the standard approach to derive the probability bounds is to randomly selected columns of indexed by and compute the probability that , then do a union bound over all sets of size . Our work in [4] computed smaller bounds than previous works based on a dyadic splitting of and derived the following bound. We changed the notation and format of Theorem 1.6 in [4] slightly to be consistent with the notation and format in this manuscript.

Theorem 4.1 (Theorem 1.6, [4]).

Consider , fix with , let an matrix be drawn from , then

(5)

with , and the functions defined as

(6)
(7)

where

(8)

and is the Shannon entropy in base logarithms, and the index set .

  • If no restriction is imposed on then the for take on their expected value given by

    (9)
  • If is restricted to be less than , then the for are the unique solutions to the following polynomial system

    (10)

    with for each .

In this work, based on the same approach as in [4], we derive new expressions for the and in Theorem 4.1, i.e. (6) and (7) respectively, and provide a simpler bound for the improved expression of .

Lemma 4.1.

Theorem 4.1 holds with the functions

(11)
(12)

The proof of the lemma is given in Section 6.3. Asymptotically, the argument of the exponential term in the bound of the probability in (5) of Theorem 4.1, i.e. in (12) is more important than the polynomial in (11) since the exponential factor will dominate the polynomial factor. The significance of the lemma is that in (12) is smaller than in (7) since in (12) is asymptotically smaller than in (7), because the former is while the latter is , since we consider .

Recall that we are interested in computing when . This means having to solve the polynomial equation (10) to compute as small a bound of as possible. We derive an asymptotic solution to (10) for and use that solution to get the following bounds.

Theorem 4.2.

Consider , fix with , for , , and , let an matrix be drawn from , then

(13)

where

(14)
(15)

The proof of this theorem is also found in Section 6.5. Since Theorem 4.2 holds for a fixed of size at most , if we want this to hold for all of size at most , we do a union bound over all of size at most . This leads to the following probability bound.

Theorem 4.3.

Consider , and all with , for , and , let an matrix be drawn from , then

(16)

where

(17)
(18)
Proof.

Applying the union bound over all of size at most to (13) leads to the following.

(19)

Then we used the upper bound of (143) to bound the combinatorial term in (19). After some algebraic manipulations, we separated the the polynomial term, given in (17), from the exponential terms whose exponent is

(20)

We upper bound in (18) by upper bounding with and the upper bound of in (15). The decays to zeros with and its a result of dividing the polylogarithmic terms of in (15), and in (15). This concludes the proof. ∎

The next corollary easily follows from Theorem 4.3 and it is equivalent to Theorem 3.1. Its statement is that if the conditions therein holds, then the probability that the cardinality of the set of neighbors of any with is less than goes to zero as dimensions of grows. On the other hand, the probability that the cardinality of the set of neighbors of any with is greater than goes to one as dimensions of grows. Implying that is the adjacency matrix of a -expander graph.

Corollary 4.1.

Given , and , for and , with . Let an matrix be drawn from , in the proportional growth asymptotics

(21)
Proof.

It suffice to focus on the exponent of (16), more precisely on the bound of in (18), i.e.

(22)

We can ignore the term as this goes to zero as grows, and show that the remaining sum is negative. The remaining sum is

(23)

Hence, we can further focus on the sum in the square brackets, and find conditions on that will make it negative. We require

(24)
(25)

Recall and is a function of , with . Therefore, is a function of , and , hence there exists a for (25) to hold.

With regards to the complexity of , we go back to the right hand side (RHS) of (24) and we substitute with for in the RHS of (24) to get the following.

(26)

Now we assume with for and substitute this in (26) to get the following.

(27)
(28)

Again since , hence there exists a for (28) to hold. The bound of in (28) agrees with our earlier assumption, thus concluding the proof. ∎

5 Discussion

5.1 Comparison to other constructions

In addition to being the first probabilistic construction of adjacency matrices of expander graphs with such a small degree, quantitatively our results compares favorably to existing probabilistic constructions.We use the standard tool of phase transitions to compare our construction to the construction proposed in [6] and those proposed in [11]. The phase transition curve we derived in Lemma 3.1 is the that solves the following equation.

(29)

where is as in (28). Equation 29 comes from taking the limit, in the proportional growth asymptotics, of the bound in (18), setting that to zero and simplifying. Similarly, for any with , Berinde in [6] derived the following bound on the set of neigbours of , i.e. .

(30)

We then express the bound in (30) as the product of a polynomial term and an exponential term. A bound of the exponent is carefully derived as in the derivations above. We set the limit, in the proportional growth asymptotics, of this bound to zero and simplify to get the following.

(31)

We refer to the that solves (31) as the phase transition for the construction proposed by Berinde in [6] and denote this (the phase transition function) as . Another probabilistic construction was proposed by Burhman et al. in [11]. In conforminty with the notation used in this manuscript their bound is equivalent to the following, also stated in a similar form by Indyk and Razenshteyn in [27].

(32)

where . We again express the bound in (32) as the product of a polynomial term and an exponential term. A bound of the exponent is carefully derived as in the derivations above. We set the limit, in the proportional growth asymptotics, of this bound to zero and simplify to get the following.

(33)

Similarly, we refer to the that solves (33) as the phase transition for the construction proposed by Burhman et al. in [11] and denote this as . We compute numerical solutions to (29), (31), and (33) to derive the phase transitions , , and respectively. These are plotted in the left panel of Figure 1. It is clear that our construction has a much higher phase transition than the others. Recall that the phase transition curves in these plots depict construction of adjacency matrices of -expanders with high probability for ratios of and (since , and ) below the curve; and the failure to construct adjacency matrices of -expanders with high probability for ratios of and above the curve. Essentially, the larger the area under the curve the better.

Remark 5.1.

It is easy to see that is a special case of since the two phase transitions will coincide, or equivalently (31) and (33) will be the same, when . One could argue that Berinde’s derivation in [6] suffers from over counting.

Figure 1: Phase transitions plots with fixed , , and on a logarithmically spaced grid of 100 points. Left panel: A comparison of , , and , where . Right panel: A comparison of denoted as to our previous denoted as in [4] with different values of , (i.e. , and ) and .

Given that this work is an improvement of our work in [4] in terms of simplicity in computing , for completeness we compare our new phase transition denoted as to our previous denoted as in the right panel of Figure 1. Each computation of requires the specification of , which is not needed in the computation of , hence the simplification. However, the simplification led to a lower phase transition as expected, which is confirmed by the plots in the right panel of Figure 1.

Remark 5.2.

These simulations also inform us about the size of . See from the plots of and that the smaller the value of the higher the phase transition but since has to be a lower bound of , for values of much smaller than , the lower bound will fail to hold. This informed the choice of in the plot of in the left panel of Figure 1.

5.2 Implications for combinatorial compressed sensing

When the sensing matrices are restricted to the sparse binary matrices considered in this manuscript, compressed sensing is usually referred to as combinatorial compressed sensing a term introduced in [7] and used extensively in [29, 30]. In this setting, compressed sensing is more-or-less equivalent to linear sketching. The implications of our results on combinatorial compressed sensing are two-fold. One is on the -norm RIP, we donate as RIP-1; while the second is in the comparison of performance of recovery algorithms for combinatorial compressed sensing.

5.2.1 Rip-1

As can be seen from (2), the recovery errors in compressed sensing depend on the RIC, i.e. . The following lemma deduced from Theorem 1 of [7] shows that a scaled drawn from have RIP with .

Lemma 5.1.

Consider and let be drawn from , then satisfies the following RIP-1 condition

(34)

The interested reader is referred to the proof of Theorem 1 in [7] for the proof of this lemma. Key to the holding of Lemma 5.1 is the existence of -expander graphs, hence one can draw corollaries from our results on this.

Corollary 5.1.

Consider and let . In the proportional growth asymptotics with a random draw of an matrix from , the matrix has RIP-1 with probability approaching exponentially, if

(35)
Proof.

Note that the upper bound of (34) holds trivially for any where has ones per column, i.e. . But for the lower bound of (34) to hold for any , we need to be an -expander matrix, i.e. . Note that the event is equal to the event , which is equivalent to , for a fixed , with . For to be in , we need expansion for all sets , with , i.e. . The key thing to remember is that

(36)

The probability in (36) going to 1 exponentially in the proportional growth asymptotics, i.e. the existence of with parameters as given in (35), is what is stated in Theorem 3.1. Therefore, the rest of the proof follows from the proof of Theorem 3.1, hence concluding the proof of the corollary. ∎

Notably, Lemma 5.1 holds with having much smaller number of nonzeros per column due to our construction. More over, we can derive sampling theorems for which Lemma 5.1 holds as thus.

Corollary 5.2.

Fix and let . In the proportional growth asymptotics, for any and , a random draw of from implies has RIP-1 with probability approaching exponentially.

Proof.

The proof of this corollary follows from the proof of Corollary 5.1 above, and it is related to the proof of Lemma 3.1 as the proof of Corollary 5.1 is to the proof of Theorem 3.1. The details of the proof is thus skipped. ∎

5.2.2 Performance of algorithms

We wish to compare the performance of selected combinatorial compressed sensing algorithms in terms of the possible problem sizes that these algorithms can reconstruct sparse/compressible signals/vectors up to their respective error guarantees. The comparison is typically done in the framework of phase transitions, which depict a boundary curve where ratios of problems sizes above this curve are recovered with probability approaching 0 exponentially; while problems sizes below the curve are recovered with probability approaching 1 exponentially. The list of combinatorial compressed sensing algorithms includes Expander Matching Pursuit (EMP) [26], Sparse Matching Pursuit [9], Sequential Sparse Matching Pursuit (SSMP) [8], Left Degree Dependent Signal Recovery (LDDSR) [33], Expander Recovery (ER) [28], Expander Iterative Hard-Thresholding (EIHT) [20, Section 13.4], and Expander -decoding (ELD) with both serial and parallel versions [29]. For reason similar to those used in [4, 29], we selected out of this list four of the algorithms: SSMP, ER, EIHT, ELD. Descriptions of these algorithms is skipped here but the interested reader is referred to the original papers or their summarized details in [4, 29]. We were also curious as to how -minimization’s performance compares to these selected combinatorial compressed sensing algorithms, since -minimization (-min) can be used to solve the combinatorial problem solved by these algorithms, see [7, Theorem 3].

The phase transitions are based on conditions on the RIC of the sensing matrices used. Consequent to Lemma 5.1, this becomes conditions on the expansion coefficient (i.e. ) of the underlying -expander graphs of the sparse sensing matrices used. Where this condition on is not explicitly given it is easily deducible from the recovery guarantees given for each algorithms. The conditions are summarized in the table below.

Algorithm Theoretical values Computational values
Condition Sparsity Condition Sparsity
SSMP [8]
ER [28]
EIHT [20]
ELD [29]
-min [7]
Table 1: Recovery conditions on

The theoretical values are what will be found in the reference given in the table; while the computational values are what we used in our numerical experiments to compute the phase transition curves of the algorithms. The value for was set to be , to make the as large as possible under the given condition. With these values we computed the phase transitions in Figure 2.

Figure 2: Phase transitions plots of algorithms with fixed , as in the fourth column of Table 1 with , and on a logarithmically spaced grid of 100 points. Left panel: for . Right panel: for .

The two figures are the same except for the different sparsity value used. The performance of the algorithms in this framework are thus ranked as follows: ELD, ER, -min, EIHT, and SSMP.

Remark 5.3.

We point out that there are many way to compare performance of algorithms, this is just one way. For instance, we can compare runtime complexities or actual computational runtimes as in [29]; phase transitions of different probabilities, here the probability of recovery is 1 but this could be set to something else, like 1/2 in the simulations in [29]; one could also compare number of iterations and iteration cost as was also done in [29].

6 Proofs

6.1 Theorem 3.1

The proof of the theorem follows trivially from Corollary 4.1. Based on (21) of Corollary 4.1 we deduce that with probability approaching 1 exponentially with and , with , hence concluding the proof.

6.2 Lemma 3.1

The phase transition curve is based the bound of the exponent of (16), which is

(37)

In the propotional growth asymptotics while and . This implies that and (37) becomes

(38)

where is as in (28). If (38) is negative then as the problem size grows we have

(39)

Therefore, setting (38) to zero and solving for gives us a critical below which (38) is negative and positive above it. The critical is the phase transition , i.e. , where below is parameterized by the in the lemma. This concludes the proof.

6.3 Lemma 4.1

By the dyadic splitting proposed in [4], we let such that and therefore

(40)
(41)

In (41) we sum over all possible events, i.e. all possible sizes of . In line with the splitting technique, we simplify the probability to the product of the probabilities of the cardinalities of and and their intersection. Using the definition of in Lemma 8.2 (Appendix 8.1), thus leads to the following.

(42)

In a slight abuse of notation we write to denote applying the sum times. We also drop the limits of the summation indices henceforth. Now we use Lemma 8.1 in Appendix 8.1 to simplify (42) as follows.