Sparsity is a natural property of many real-world signals. For example, image and speech signals are sparse in the Fourier basis, which led to the theory of compressed sensing, and more broadly, sampling theory Landau (1967); Donoho (2006). In some important multivariate optimization problems with many optimal points, sparsity of the solution is also a measure of ‘simplicity’ and insisting on sparsity is a common method of regularization Tibshirani (1996). While recovering sparse vectors from linear measurements is a well-studied topic, technological advances and increasing data size raises new questions. These include quantized and nonlinear signal acquisition models, such as 1-bit compressed sensing Boufounos and Baraniuk (2008). In 1-bit compressed sensing, linear measurements of a sparse vector are quantized to only 1 bit, e.g. indicating whether the measurement outcome is positive or not, and the task is to recover the vector up to a prescribed Euclidean error with minimum number of measurements. Like compressed sensing, the overwhelming majority of the literature, including this paper, focuses on the nonadaptive setting for the problem.
One of the ways to approximately recover a sparse vector from 1-bit measurements is to use a subset of all the measurements to identify the support of the vector. Next, the remainder of the measurements can be used to approximate the vector within the support. Note that this second set of measurements is also predefined, and therefore the entire scheme is still nonadaptive. Such a method appears in the context of ‘universal’ matrix designs in Gopi et al. (2013); Acharya et al. (2017). The resulting schemes are the best known, in some sense, but still result in a large gap between the upper and lower bounds for approximate recovery of vectors.
In this paper we take steps to close these gaps, by presenting a simple yet powerful idea. Instead of using a subset of the measurements to recover the support of the vector exactly, we propose using a (smaller) set of measurements to recover a superset of the support. The remainder of the measurements can then be used to better approximate the vector within the superset. It turns out this idea which we call the “superset technique” leads to optimal number of measurements for universal schemes for several important classes of sparse vectors (for example, nonnegative vectors). We also present theoretical results providing a characterization of matrices that would yield universal schemes for all sparse vectors.
While the compressed sensing framework was introduced in Donoho (2006), it was not until Boufounos and Baraniuk (2008) that 1-bit quantization of the measurements was considered as well, to try and combat the fact that taking real-valued measurements to arbitrary precision may not be practical in applications. Initially, the focus was primarily on approximately reconstructing the direction of the signal (the quantization does not preserve any information about the magnitude of the signal, so all we can hope to reconstruct is the direction). However, in Haupt and Baraniuk (2011) the problem of support recovery, as opposed to approximate vector reconstruction, was first considered and it was shown that measurements is sufficient to recover the support of a -sparse signal in with high probability. This was subsequently shown to be tight with the lower bound proven in Atia and Saligrama (2012).
All the above results assume that a new measurement matrix is constructed for each sparse signal, and success is defined as either approximately recovering the signal up to error in the norm (for the approximate vector recovery problem), or exactly recovering the support of the signal (for the support recovery problem), with high probability. Generating a new matrix for each instance is not practical in all applications, which has led to interest in the “universal” versions of the above two problems, where a single matrix must work for support recovery or approximate recovery of all -sparse real signals, with high probability.
Plan and Vershynin showed in Plan and Vershynin (2013) that both and measurements suffice for universal approximate recovery. The dependence on was then improved significantly to in Gopi et al. (2013), who also considered the problem of universal support recovery, and showed that for that problem, measurements is sufficient. They showed as well that if we restrict the entries of the signal to be nonnegative (which is the case for many real-world signals such as images), then is sufficient for universal support recovery. The constructions of their measurement matrices are based primarily on combinatorial objects, specifically expanders and Union Free Families (UFFs).
Most recently, Acharya et al. (2017) showed that a modified version of the UFFs used in Gopi et al. (2013) called “Robust UFFs” (RUFFs) can be used to improve the upper bound on universal support recovery to for all real-valued signals, matching the previous upper bound for nonnegative signals, and showed this is nearly tight with a lower bound of for real signals. They also show that measurements suffices for universal approximate recovery.
In tandem with the development of these theoretical results providing necessary and sufficient numbers of measurements for support recovery and approximate vector recovery, there has been a significant body of work in other directions on 1-bit compressed sensing, such as heuristic algorithms that perform well empirically, and tradeoffs between different parameters. More specifically,Jacques et al. (2013) introduced a gradient-descent based algorithm called Binary Iterative Hard Thresholding (BIHT) which performs very well in practice; later, Li (2016) gave another heuristic algorithm which performs comparably well or better, and aims to allow for very efficient decoding after the measurements are taken. Other papers such as Slawski and Li (2015) have studied the tradeoff between the amount of quantization of the signal, and the necessary number of measurements.
We focus primarily on upper bounds in the universal setting, aiming to give constructions that work with high probability for all sparse vectors. In Acharya et al. (2017), 3 major open questions are given regarding Universal 1-bit Compressed Sensing, which, paraphrasing, are as follows:
How many measurements are necessary and sufficient for a matrix to be used to exactly recover all -sparse binary vectors?
What is the correct complexity (in terms of number of measurements) of universal -approximate vector recovery for real signals?
Can we obtain explicit (i.e. requiring time polynomial in and ) constructions of the Robust UFFs used for universal support recovery (yielding measurement matrices with rows)?
In this work we make progress towards solutions to all three Open Questions. Our primary contribution is the “superset technique” which relies on ideas from the closely related sparse recovery problem of group testing Du and Hwang (2000); in particular, we show in Theorem 6 that for a large class of signals including all nonnegative (and thus all binary) signals, we can improve the upper bound for approximate recovery by first recovering an -sized superset of the support rather than the exact support, then subsequently using Gaussian measurements. The previous best upper bound for binary signals from Jacques et al. (2013) was , which we improve to , and for nonnegative signals was , which we improve to .
Regarding Open Question 3, using results of Porat and Rothschild regarding weakly explicit constructions of Error-Correcting Codes (ECCs) on the Gilbert-Varshamov bound Porat and Rothschild (2011), we give a construction of Robust UFFs yielding measurement matrices for support recovery with rows in time that is polynomial in (though not in ) in Theorem 12. Based on a similar idea, we also give a weakly explicit construction for non-universal approximate recovery using only sightly more measurements than is optimal ( as opposed to ) in Section 4.2; to our knowledge, explicit constructions in the non-universal setting have not been studied previously. Furthermore, this result gives a single measurement matrix which works for almost all vectors, as opposed to typical non-universal results which work with high probability for a particular vector and matrix pair.
In Appendix C, we give a sufficient condition generalizing the notion of RUFFs for a matrix to be used for universal recovery of a superset of the support for all real signals; while we do not provide constructions, this seems to be a promising direction for resolving Open Question 2.
|Universal Support Recovery ()||Acharya et al. (2017)||Acharya et al. (2017)|
|Universal -approximate Recovery ()||–||Acharya et al. (2017)|
|Acharya et al. (2017), Jacques et al. (2013)|
|Universal -approximate Recovery ()||–|
|Universal Exact Recovery ()||–|
|Non-Universal Support Recovery ()||Atia and Saligrama (2012)||Atia and Saligrama (2012)|
*Bound proved in this work.
The best known upper and lower bounds for the various compressed sensing problems considered in this work are presented in Table 1.
We write for the th row of the matrix , and for the entry of in the th row and th column. We write vectors in boldface, and write for the th component of the vector . The set will be denoted by , and for any set we write for the power set of (i.e. the set of all subsets of ).
We will write to mean the set of indices of nonzero components of (so ), and to denote .
For a real number , returns if is strictly positive, if y is strictly negative, and if . While this technically returns more than one bit of information, if we had instead defined to be 1 when and otherwise, we could still determine whether by looking at , so this affects the numbers of measurements by only a constant factor. We will not concern ourselves with the constants involved in any of our results, so we have chosen to instead use the more convenient definition.
We will sometimes refer to constructions from the similar “group testing” problem in our results. To this end, we will use the symbol “” to represent the group testing measurement between a measurement vector and a signal vector. Specifically, for a measurement of length and signal of length , is equal to if is nonempty, and otherwise. We will also make use of the “list-disjunct” matrices used in some group testing constructions.
An binary matrix is -list disjunct if for any two disjoint sets with , there exists a row in in which some column from has a nonzero entry, but every column from has a zero.
The primary use of such matrices is that in the group testing model, they can be used to recover a superset of size at most of the support of any -sparse signal from applying a simple decoding to the measurement results .
In the following definitions, we write for a generic set that is the domain of the signal. In this paper we consider signals with domain (nonnegative reals), and .
An measurement matrix can be used for Universal Support Recovery of -sparse (in measurements) if there exists a decoding function such that for all satisfying .
An measurement matrix can be used for Universal -Approximate Recovery of -sparse (in measurements) if there exists a decoding function such that
for all with .
3 Upper Bounds for Universal Approximate Recovery
Here we present our main result, an upper bound on the number of measurements needed to perform universal -approximate recovery for a large class of real vectors that includes all binary vectors and all nonnegative vectors. The general technique will be to first use what are known as “list-disjunct” matrices from the group testing literature to recover a superset of the support of the signal, then use Gaussian measurements to approximate the signal within the superset. Because the measurements in the second part are Gaussian, we can perform the recovery within the (initially unknown) superset nonadaptively. When restricting to the class of binary or nonnegative signals, our upper bound improves on existing results and is close to known lower bounds.
First, we need a lemma stating the necessary and sufficient conditions on a signal vector in order to be able to reconstruct the results of a single group testing measurement using sign measurements. To concisely state the condition, we introduce some notation: for a subset and vector of length , we write to mean the restriction of to the indices of .
Let and . Define . If either is empty or is nonempty and , we can reconstruct the result of the group testing measurement from the sign measurement .
We observe and based on that must determine the value of , or equivalently whether is empty or nonempty. If then , so is nonempty and . Otherwise we have , in which case we must have . If were nonempty then we would have , contradicting our assumption. Therefore in this case we must have empty and , so for satisfying the above condition we can reconstruct the results of a group testing measurement. ∎
For convenience, we use the following property to mean that a signal has the necessary property from Lemma 1 with respect to every row of a matrix .
Let be an matrix, and a signal of length . Define . Then for every row of , either is empty, or .
Let be a -list disjunct matrix, and be a -sparse real signal. If creftype 1 holds for and , then we can use the measurement matrix to recover a superset of size at most of the support of using sign measurements.
Combining this corollary with results of De Bonis et al. (2005), there exist matrices with rows which we can use to recover an -sized superset of the support of using sign measurements, provided satisfies the above condition. Strongly explicit constructions of these matrices exist also, although requiring rows Cheraghchi (2013).
The other result we need is one that tells us how many Gaussian measurements are necessary to approximately recover a real signal using maximum likelihood decoding. Similar results have appeared elsewhere, such as Jacques et al. (2013), but we include the proof for completeness.
There exists a measurement matrix for 1-bit compressed sensing such that for every pair of -sparse with , whenever , provided that
We will make use of the following facts in the proof.
For all , .
For all , .
Proof of Lemma 3.
Let . For a measurement to separate and , it is necessary that the hyperplane corresponding to some row of lies between and . Thus our goal here is to show that if we take to be large enough, that all pairs of points at distance will be separated with high probability. Since the rows of are chosen independently and have Gaussian entries, they are spherically symmetric, and thus the probability that the random hyperplane lies between and is proportional to the angle between them. Let , then we start out by upper bounding the probability that no measurement separates a particular pair and .
Before beginning, recall that for unit vectors , so given that , we have .
As there are independent measurements, the probability that and are not separated by any of the measurements is at most
so union bounding over all pairs of -sparse and , the total probability of error is strictly less than
This probability becomes less than 1 for , so with this number of measurements there exists a matrix that can perform -approximate recovery for all pairs of sparse vectors. ∎
Note that in the case that we already have a superset of the support of size , the previous result tells us there exists a matrix with rows which can be used to perform -approximate recovery within the superset. We can do this even nonadaptively, because the rows of the matrix for approximate recovery are Gaussian. Combining this with Corollary 2 and the group testing constructions of De Bonis et al. (2005), we have the following theorem.
As special cases, we have improved upper bounds for nonnegative and binary signals. For ease of comparison with the other results, we assume the binary signal is rescaled to have unit norm, so has all entries either 0 or equal to .
Let where is a -list disjunct matrix with rows, and is a matrix with rows that can be used for -approximate recovery within the superset as in Lemma 3, so consists of rows. Let be a -sparse signal. If all entries of are nonnegative, then can be used for -approximate recovery of .
Let where is a -list disjunct matrix with rows, and is a matrix with rows that can be used for -approximate recovery (with ) within the superset as in Corollary 2 , so consists of rows. Let be the -sparse signal vector. If all nonzero entries of are equal, then can be used for exact recovery of .
Here we use the fact that if we perform -approximate recovery using then as the minimum possible distance between any two -sparse rescaled binary vectors is , we will recover the signal vector exactly. ∎
4 Explicit Constructions
4.1 Explicit Robust UFFs from Error-Correcting Codes
In this section we explain how to combine several existing results in order to explicitly construct Robust UFFs that can be used for support recovery of real vectors. This partially answers Open Problem 3 from Acharya et al. (2017).
A family of sets with each is an -Robust-UFF if , and for every distinct , .
It is shown in Acharya et al. (2017) that nonexplicit -Robust UFFs exist with which can be used to exactly recover the support of any -sparse real vector of length in measurements.
The results we will need are the following, where the -ary entropy function is defined as
Theorem 9 (Porat and Rothschild (2011) Thm. 2).
Let be a prime power, and positive integers, and . Then if , we can construct a -ary linear code with rate and relative distance in time .
Theorem 10 (Acharya et al. (2017) Prop. 17).
Given a -ary error correcting code with rate and relative distance , we can construct a -Robust-UFF.
Theorem 11 (Acharya et al. (2017) Prop. 15).
If is an -Robust-UFF, then is also an -Robust-UFF.
By combining the above three results, we have the following.
We can explicitly construct an -Robust UFF with and in time .
First, we instantiate Theorem 9 to obtain a -ary code of length with , relative distance , and rate in time .
While the time needed for this construction is not polynomial in (and therefore the construction is not strongly explicit) as asked for in Open Question 3 of Acharya et al. (2017), this at least demonstrates that there exist codes with sufficiently good parameters to yield Robust UFFs with .
4.2 Non-Universal Approximate Recovery
If instead of requiring our measurement matrices to be able to recover all -sparse signals simultaneously (i.e. to be universal), we can instead require only that they are able to recover “most” -sparse signals. Specifically, in this section we will assume that the sparse signal is generated in the following way: first a set of indices is chosen to be the support of the signal uniformly at random. Then, the signal is chosen to be a uniformly random vector from the unit sphere on those indices. We relax the requirement that the supports of all -sparse signals can be recovered exactly (by some decoding) to the requirement that we can identify the support of a -sparse signal with probability at least , where . Note that even when , this is a weaker condition than universality, as the space of possible -sparse signals is infinite.
It is shown in Atia and Saligrama (2012)
that a random matrix construction usingmeasurements suffices to recover the support with error probability approaching 0 as and approach infinity. The following theorem shows that we can explicitly construct a matrix which works in this setting, at the cost of slightly more measurements (about ).
We can explicitly construct measurement matrices for Support Recovery (of real vectors) with rows that can exactly determine the support of a -sparse signal with probability at least , where the signals are generated by first choosing the size support uniformly at random, then choosing the signal to be a uniformly random vector on the sphere on those coordinates.
To prove this theorem, we need a lemma which explains how we can use sign measurements to “simulate” group testing measurements with high probability. Both the result and proof are similar to Lemma 1, with the main difference being that given the distribution described above, the vectors violating the necessary condition in Lemma 1 occur with zero probability and so can be safely ignored. For this lemma, we do not need the further assumption made in Theorem 13 that the distribution over support sets is uniform. The proof is presented in Appendix A.
Suppose we have a measurement vector , and a -sparse signal . The signal is generated randomly by first picking a subset of size from (using any distribution) to be the support, then taking to be a uniformly random vector on the sphere on those coordinates. Then from , we can determine the value of with probability 1.
As the above argument works with probability 1, we can easily extend it to an entire measurement matrix with any finite number of rows by a union bound, and recover all the group testing measurement results with probability 1 as well. This means we can leverage the following result from Mazumdar (2016):
Theorem 15 (Mazumdar (2016) Thm. 5).
When is drawn uniformly at random among all -sparse binary vectors, there exists an explicitly constructible group testing matrix with rows which can exactly identify from observing the measurement results with probability at least .
Combining this with the lemma above, we can use the matrix from Theorem 15 with rows (now representing sign measurements) to exactly determine the support of with probability at least ; we first use Lemma 14 to recover the results of the group testing tests with probability 1, and can then apply the above theorem using the results of the group testing measurements.
We can also use this construction for approximate recovery rather than support recovery using Lemma 3, by appending rows of Gaussian measurements to , first recovering the exact support, then doing approximate recovery within that support. This gives a matrix with about rows for non-universal approximate recovery of real signals, where the top portion is explicit.
Above, we have shown that in the non-universal setting, we can use constructions from group testing to recover the exact support with high probability, and then subsequently perform approximate recovery within that exact support. If we are interested only in performing approximate recovery, we can apply our superset technique here as well; Lemma 14 implies also that using a -list disjunct matrix we can with probability 1 recover an -sized superset of the support, and such matrices exist with rows. Following this, we can use more Gaussian measurements to recover the signal within the superset. This gives a non-universal matrix with rows for approximate recovery, the top part of which can be made strongly explicit with only slightly more measurements ( vs. ).
In this section, we present some empirical results relating to the use of our superset technique in approximate vector recovery for real-valued signals. To do so, we compare the average error (in norm) of the reconstructed vector from using an “all Gaussian” measurement matrix to first using a small number of measurements to recover a superset of the support of the signal, then using the remainder of the measurements to recover the signal within that superset via Gaussian measurements. We have used the well-known BIHT algorithm of Jacques et al. (2013) for recovery of the vector both using the all Gaussian matrix and within the superset, but we emphasize that this superset technique is highly general, and could just as easily be applied on top of other decoding algorithms that use only Gaussian measurements, such as the “QCoSaMP” algorithm of Shi et al. (2016).
To generate random signals , we first choose a size support uniformly at random among the possibilities, then for each coordinate in the chosen support, generate a random value from . The vector is then rescaled so that .
For the dotted lines in Figure 1 labeled “all Gaussian,” for each value of we performed 500 trials in which we generated an matrix with all entries in . We then used BIHT (run either until convergence or 1000 iterations, as there is no convergence guarantee) to recover the signal from the measurement matrix and measurement outcomes.
For the solid lines in Figure 1 labeled “ Superset,” we again performed 500 trials for each value of where in each trial we generated a measurement matrix with rows in total. Each entry of
is a Bernoulli random variable that takes value 1 with probabilityand value 0 with probability ; there is evidence from the group testing literature Atia and Saligrama (2012); Aldridge et al. (2014) that this probability is near-optimal in some regimes, and it appears also to perform well in practice; see Appendix B for some empirical evidence. The entries of are drawn from . We use a standard group testing decoding (i.e., remove any coordinates that appear in a test with result 0) to determine a superset based on , then use BIHT (again run either until convergence or 1000 iterations) to reconstruct within the superset using the measurement results . The number of rows in is taken to be based on the fact that with high probability rows for some constant should be sufficient to recover an -sized superset, and the remainder of the measurements are used in .
We display data only for larger values of , to ensure there are sufficiently many rows in both portions of the measurement matrix. From Figure 1 one can see that in this regime, using a small number of measurements to first recover a superset of the support provides a modest improvement in reconstruction error compared to the alternative. In the higher-error regime when there are simply not enough measurements to obtain an accurate reconstruction, as can be seen in the left side of the graph in Figure 0(d), the two methods perform about the same. In the empirical setting, our superset of support recovery technique can be viewed as a very flexible and low overhead method of extending other existing 1bCS algorithms which use only Gaussian measurements, which are quite common.
This research is supported in part by NSF CCF awards 1618512, 1642658, and 1642550 and the UMass Center for Data Science.
-  (2016) 2016 information theory and applications workshop, ITA 2016, la jolla, ca, usa, january 31 - february 5, 2016. IEEE. External Links: Cited by: 22.
-  (2008) 42nd annual conference on information sciences and systems, CISS 2008, princeton, nj, usa, 19-21 march 2008. IEEE. External Links: Cited by: 7.
-  (2011) 45st annual conference on information sciences and systems, CISS 2011, the john hopkins university, baltimore, md, usa, 23-25 march 2011. IEEE. External Links: Cited by: 15.
-  (2017) Improved bounds for universal one-bit compressive sensing. In 2017 IEEE International Symposium on Information Theory (ISIT), pp. 2353–2357. Cited by: Appendix C, Appendix C, Appendix C, §1, §1, Table 1, §1, §4.1, §4.1, §4.1, Theorem 10, Theorem 11.
-  (2014) Group testing algorithms: bounds and simulations. IEEE Trans. Information Theory 60 (6), pp. 3671–3687. External Links: Cited by: §5.
-  (2012) Boolean compressed sensing and noisy group testing. IEEE Trans. Information Theory 58 (3), pp. 1880–1901. External Links: Cited by: §1, Table 1, §4.2, §5.
-  (2008) 1-bit compressive sensing. See 2, pp. 16–21. External Links: Cited by: §1, §1.
-  (2013) Noise-resilient group testing: limitations and constructions. Discrete Applied Mathematics 161 (1-2), pp. 81–95. External Links: Cited by: §3.
-  C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.) (2015) Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, december 7-12, 2015, montreal, quebec, canada. External Links: Cited by: 23.
-  (2005) Optimal two-stage algorithms for group testing problems. SIAM Journal on Computing 34 (5), pp. 1253–1270. Cited by: §3, §3.
-  (2006) Compressed sensing. IEEE Trans. Information Theory 52 (4), pp. 1289–1306. External Links: Cited by: §1, §1.
-  (2000) Combinatorial group testing and its applications. Applied Mathematics, World Scientific. External Links: Cited by: §1.
One-bit compressed sensing: provable support and vector recovery.
International Conference on Machine Learning, pp. 154–162. Cited by: §1, §1, §1.
A. Gretton and C. C. Robert (Eds.) (2016)
Proceedings of the 19th international conference on artificial intelligence and statistics, AISTATS 2016, cadiz, spain, may 9-11, 2016. JMLR Workshop and Conference Proceedings, Vol. 51, JMLR.org. External Links: Cited by: 18.
-  (2011) Robust support recovery using sparse compressive sensing matrices. See 3, pp. 1–6. External Links: Cited by: §1.
-  (2013) Robust 1-bit compressive sensing via binary stable embeddings of sparse vectors. IEEE Transactions on Information Theory 59 (4), pp. 2082–2102. Cited by: §1, §1, Table 1, §3, §5.
-  (1967) Sampling, data transmission, and the nyquist rate. Proceedings of the IEEE 55 (10), pp. 1701–1706. Cited by: §1.
-  (2016) One scan 1-bit compressed sensing. See Proceedings of the 19th international conference on artificial intelligence and statistics, AISTATS 2016, cadiz, spain, may 9-11, 2016, Gretton and Robert, pp. 1515–1523. External Links: Cited by: §1.
-  (2016) Nonadaptive group testing with random set of defectives. IEEE Trans. Information Theory 62 (12), pp. 7522–7531. External Links: Cited by: §4.2, Theorem 15.
Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach. IEEE Trans. Information Theory 59 (1), pp. 482–494. External Links: Cited by: §1.
-  (2011) Explicit nonadaptive combinatorial group testing schemes. IEEE Trans. Information Theory 57 (12), pp. 7982–7989. External Links: Cited by: §1, Theorem 9.
-  (2016) Methods for quantized compressed sensing. See 1, pp. 1–9. External Links: Cited by: §5.
-  (2015) B-bit marginal regression. See Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, december 7-12, 2015, montreal, quebec, canada, Cortes et al., pp. 2062–2070. External Links: Cited by: §1.
-  (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58 (1), pp. 267–288. Cited by: §1.
Appendix A Proof of Lemma 14
Lemma (Lemma 14).
Suppose we have a known measurement vector , and an unknown -sparse signal . The signal is generated randomly by first picking a subset of size from (using any distribution) to be the support, then taking to be a uniformly random vector on the sphere on those coordinates. Then from , we can determine the value of with probability 1.
We assume without loss of generality that is supported on the first coordinates; the remainder of the argument does not depend specifically on the choice of support, so this is purely for notational convenience. If , then immediately we must have , as .
Otherwise if , we must have . This leaves two cases: either , or is orthogonal to and . In the latter case satisfies the equation
Let be a random vector formed by using the same distribution as that used to determine the support of in order to determine the support, then within that support drawing variables to be the coordinates, and finally rescaling so that . It is well-known that the distribution of such is identical to the distribution of , thus the probability that is orthogonal to is the same as the probability that is orthogonal to . We proceed by showing the probability is orthogonal to is 0.
If is orthogonal to , then as above we must have
Thus in order for to lie in the nullspace of , it is necessary that takes a specific value determined by the other ; as is drawn independently of the other and from a continuous distribution, this happens with probability 0. We conclude that the same is true for , and thus when we assume that , and are correct with probability 1. ∎
Appendix B Empirical Evidence for Experimental Choice of Bernoulli Probability
In this section, we provide some empirical evidence that the choice of for the Bernoulli probability of the experiments in Section 5 is reasonable.
Figure 2 shows the average size of the superset using a matrix with Bernoulli entries (i.e. each value is 1 with probability and 0 otherwise) following a group testing decoding. The different lines represent different numbers of measurements used in the Bernoulli matrix, and different plots show different sparsity levels. All vectors had length 1000, and were constructed randomly by first choosing a size support set uniformly at random, then drawing a random value from for each coordinate in the support set and normalizing so that . 1000 trials were performed for each tuple of values.
The vertical line overlaid atop the other curves in Figure 2 indicates where the Bernoulli probability is equal to . For all three sparsity levels, it appears that this value is very close to achieving the minimum size superset for a given number of measurements. Furthermore, the fact that the curves all have relatively wide basins around the minimum indicates that any value close to the minimum should perform fairly well.
Appendix C Sufficient Condition for Universal Support Recovery of Real Vectors
The goal in this section is to give sufficient conditions on a measurement matrix in order to be able to recover a superset of the support of an unknown -sparse signal using 1-bit sign measurements, by generalizing the definition of “Robust UFF” given in .
In this section we will work primarily with matrix columns rather than rows, so to this end for any matrix , here we let denote its -th column. For any sets and , let denote the submatrix of restricted to rows indexed by and columns indexed by . Let denote the size of the support of , i.e. . We say has full support if .
In order to recover the superset of the support of using the sign measurements , we use the algorithm of  (Algorithm 1). For any subset of columns , , define . These are the columns outside of the subset that have large intersection with the union of the columns indexed by .
 show that if is a Robust UFF with sufficient parameters, then their algorithm recovers the exact support of . Algorithm 1 computes the intersection of the support of each column with the output . It includes the index
in the estimated support if the intersection is sufficiently large. The property of a Robust UFF ensures that the estimated support is exactly the support of.
We relax the definition of an -Robust UFF to allow a few false positives, since we only require a superset of the support of rather than the exact support. The allowable size of controls the number of false positives. Note that allowing might induce some false negatives as well, thus to avoid this possibility we need to ensure that no column of in the support of has too many zero test results. In general, zero test results can occur when lies in the nullspace of many rows of that have a nonempty intersection with the support of . We construct the matrix to avoid such situations.
For any subset , and any , define . These are the rows in the support of that intersect with the support of the columns of indexed by . In order to ensure that the algorithm does not introduce any false negatives, we want the output vector to have not many zeros in rows corresponding to . Let us define to be the matrix restricted to the rows in and columns of . Note that since , , therefore has at least rows. We now define a list-Robust UFF as follows:
Definition 5 (List-RUFF).
A real matrix is called an -list Robust UFF if for all , and for all subsets , , the following properties hold:
For any , and any with full support, .
The first condition ensures that the Algorithm 1 introduces at most false positives. The second condition is used to ensure that no -sparse vector is in the nullspace of too many rows of , and therefore Algorithm 1 will not yield any false negatives.
Next we show that Algorithm 1 recovers a superset of size at most given a measurement matrix which is an -list RUFF.
Let be an unknown -sparse vector with . If is an -list RUFF, then Algorithm 1 returns such that .
We first show that . We in fact prove the contrapositive, i.e. if , then . Let . By definition of , we know that does not intersect in too many places, i.e. . Consider all the rows . Note that for all these rows, . Therefore,
From Algorithm 1, it then follows that .
To show that every is included in , we need to show that for every such , . This is equivalent to showing that there are not too many zeros in the rows of corresponding to rows in . Let be any column in the support of . Let us partition into two groups. Let . Define
Note that for all , since since . Therefore, . We can without loss of generality assume that . Otherwise, by definition of it follows that , and Algorithm 1 includes .
We now show that for many . In particular, we show that is zero for at most indices in . This follows from the property of the list-RUFF. Consider the following submatrix of , . Since , , and therefore has at least rows, and at most columns.
From the definition of list-RUFF, we know that for any with full support, . Therefore, for that is supported on , for at least indices in .
Combining these observations, it follows that
Therefore the fact that follows from Algorithm 1.
In light of this, a possible direction for improving the current upper bound for universal approximate recovery of real vectors would be to show the existence of -list RUFFs with . This would immediately yield a measurement matrix with rows that could be used for universal -approximate recovery. We show below via a simple probabilistic construction that matrices satisfying the first property in definition 5 with and exist, but leave open the question of whether rows suffices also for the second property, or whether rows are necessary.
There exist matrices satisfying for all columns and for every subset of columns , , we have , under the assumptions that , , and .
We will construct by drawing a set of size uniformly at random among all such sets for each column of . If then we set the th entry of to 1, otherwise 0. Now we must show that with probability less than 1 there does not exist any subset of at most columns of with .
Recall that by definition,
or in other words, is the set of “confusable” columns for the subset of columns of . The event that we wish to avoid is that there exists a set of “bad” columns for which the union of the supports of a subset of of those columns has a large intersection with the supports of all of the remaining columns. Since the columns of are all chosen independently, we have
Now we can assume we have a fixed set of columns and another fixed column , and we want to upper bound the probability that more than half the nonzero entries of lie in . Let be the binary random variable that is equal to 1 if and only if the th entry of is nonzero and lies in . Since every column has weight exactly , , thus for any Then by linearity of expectation we conclude that
While the are not independent, if some then it is less likely that a different random variable as there are less coordinates remaining in . Since the are negatively correlated we can apply a Chernoff bound: