# Derandomizing compressed sensing with combinatorial design

Compressed sensing is the art of reconstructing structured n-dimensional vectors from substantially fewer measurements than naively anticipated. A plethora of analytic reconstruction guarantees support this credo. The strongest among them are based on deep results from large-dimensional probability theory that require a considerable amount of randomness in the measurement design. Here, we demonstrate that derandomization techniques allow for considerably reducing the amount of randomness that is required for such proof strategies. More, precisely we establish uniform s-sparse reconstruction guarantees for C s (n) measurements that are chosen independently from strength-four orthogonal arrays and maximal sets of mutually unbiased bases, respectively. These are highly structured families of C̃ n^2 vectors that imitate signed Bernoulli and standard Gaussian vectors in a (partially) derandomized fashion.

## Authors

• 31 publications
• 18 publications
• 26 publications
• ### Compressed Sensing using Generative Models

The goal of compressed sensing is to estimate a vector from an underdete...
03/09/2017 ∙ by Ashish Bora, et al. ∙ 0

• ### Compressed sensing photoacoustic tomography reduces to compressed sensing for undersampled Fourier measurements

Photoacoustic tomography (PAT) is an emerging imaging modality that aims...
10/16/2020 ∙ by Giovanni S. Alberti, et al. ∙ 0

• ### Robust one-bit compressed sensing with non-Gaussian measurements

We study memoryless one-bit compressed sensing with non-Gaussian measure...
05/23/2018 ∙ by Sjoerd Dirksen, et al. ∙ 0

• ### Hierarchical compressed sensing

Compressed sensing is a paradigm within signal processing that provides ...
04/06/2021 ∙ by Jens Eisert, et al. ∙ 0

• ### Anisotropic compressed sensing for non-Cartesian MRI acquisitions

In the present note we develop some theoretical results in the theory of...
10/31/2019 ∙ by Philippe Ciuciu, et al. ∙ 0

• ### Lossless Analog Compression

We establish the fundamental limits of lossless analog compression by co...
03/19/2018 ∙ by Giovanni Alberti, et al. ∙ 0

• ### Neurally Augmented ALISTA

It is well-established that many iterative sparse reconstruction algorit...
10/05/2020 ∙ by Freya Behrens, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction and main results

### I-a Motivation

Compressed sensing is the art of reconstructing structured signals from substantially fewer measurements than would naively be required for standard techniques like least squares. Although not entirely novel, rigorous treatments of this observation [1, 2] spurred considerable scientific attention from 2006 on, see e.g. [3, 4] and references therein. While deterministic results do exist, the strongest theoretic convergence guarantees still rely on randomness. Broadly, these can be grouped into two families:

1. generic measurements such as independent Gaussian, or Bernoulli vectors. Such an abundance of randomness allows for establishing very strong results by following comparatively simple and instructive proof techniques. The downside is that concrete implementations do require a lot of randomness. In fact, they might be too random to be useful for certain applications.

2. structured measurements such as random rows of a Fourier, or Hadamard matrix. In contrast to generic measurements, these feature a lot of structure that is geared towards applications. Moreover, sampling random rows from a fixed matrix does require very little randomness. E.g.  random bits are required to sample a random DFT row while an i.i.d. Bernoulli vector consumes bits of randomness. Structure and comparatively little randomness have a downside, however. Theoretic convergence guarantees tend to be weaker than their generic counterparts. It should also not come as a surprise that the necessary proof techniques become considerably more involved.

Typically, results of type 1) precede results of type 2). Phase retrieval via PhaseLift is a concrete example for such a development. Generic convergence guarantees [5, 6] preceded (partially) de-randomized results [7, 8]. Compressed sensing is special in this regard. The two seminal works [1, 2] from 2006 provided both results almost simultaneously. This had an interesting consequence. Despite considerable effort, to this date there still seems to be a gap between both proof techniques.

Here, we try to close this gap by applying a method that is very well established in theoretical computer science: partial derandomization. We start with a proof technique of type 1) and considerably limit the amount of randomness required for it to work. While doing so, we keep careful track of the “amount of randomness” that is still necessary. Finally, we replace the original (generic) random measurements with pseudo-random ones that mimic them in a sufficiently accurate fashion. Our results highlight that this technique almost allows for bridging the gap between existing proof techniques for generic and structured measurements: the results are still strong, but require slightly more randomness than choosing vectors uniformly from a bounded orthogonal system, such as Fourier or Hadamard vectors.

There is a also a didactic angle to this work: within the realm of signal processing, partial-derandomization techniques have been successfully applied to matrix reconstruction [8, 9] and phase retrieval via PhaseLift [7, 10, 11]. Although similar in spirit, the more involved nature of these problems may obscure the key ideas, intuition and tricks behind such an approach. However, the same techniques have not yet been applied to the original problem of compressed sensing. Here, we fill this gap and, in doing so, provide an introduction to partial derandomization techniques by example. To preserve this didactic angle, we try to keep the presentation as simple and self-contained as possible.

Finally, one may argue that compressed sensing has not fully lived up to the high expectations of the community yet, see e.g. [12]. Arguably, one of the most glaring problems for applications is the requirement of choosing individual measurements at random111Existing deterministic constructions, see e.g. [13], do not (yet) yield comparable statements.. While we are not able to fully overcome this drawback here, the methods described in this work do limit the amount of randomness required to generate individual structured measurements. We believe that this may help to reduce the discrepancy between “what can be proved” and “what can be done” in a variety of concrete applications.

### I-B Preliminaries on compressed sensing

Compressed sensing aims at reconstructing -sparse vectors from linear measurements:

 y=Ax∈Cm.

Since , the matrix is singular and there are infinitely many solutions to this equation. A convex penalizing function is used to promote sparsity among these solutions. Typically, this penalizing function is the -norm :

 minimizez∈Cn ∥z∥ℓ1 (1) subject to Az=y

Mathematical proofs for convergence to the correct solution have been established for different measurement matrices . By and large, they require randomness in the sense that each row of is an independent copy of a random vector . Prominent examples include

1. standard complex Gaussian measurements: ,

2. signed Bernoulli (Rademacher) measurements: ,

3. random rows of a DFT matrix: ,

4. for : random rows of a Hadamard matrix: .

A rigorous treatment of all these cases can be found in Ref. [3]. Here, and throughout this work, denotes an absolute constant whose exact value depends on the context, but it is always independent of the problem parameters and . It is instructive to compare the amount of randomness that is required to generate one instance of the random vectors in question. A random signed Bernoulli vector requires random bits (one for each coordinate), while a total of random bits suffice to select a random row of a Hadamard matrix. A comparison between complex standard Gaussian vectors and random Fourier vectors indicates a similar discrepancy. In summary: highly structured random vectors, like require exponentially fewer random bits to generate than generic random vectors, like . Importantly, this transition from generic measurements to highly structured ones comes at a price. The number of measurements required in case (1) and (4) scales poly-logarithmically in . More sophisticated approaches allow for converting this offset into a polylogarithmic scaling in rather than [14, 15]. Another, arguably even higher price, is hidden in the proof techniques behind these results. They are considerably more involved.

The following two subsections are devoted to introduce formalisms that allow for partially de-randomizing signed Bernoulli vectors and complex standard Gaussian vectors, respectively.

### I-C Partially de-randomizing signed Bernoulli vectors

Throughout this work, we endow with the standard inner product . We denote the associated (Euclidean) norm by . Let be a signed Bernoulli vector with coefficients

chosen independently at random (Rademacher random variables). Then,

 E[ϵi¯ϵj]=E[ϵiϵj]=δij (2)

which is equivalent to demanding

 E[⟨y,asb⟩⟨asb,z⟩]= (3)

Independent sign entries are sufficient, but not necessary for this feature. Indeed, suppose that is a power of two. Then the rows of a Sylvester Hadamard matrix correspond to a particular subset of sign vectors. Let be the random vector arising from choosing a Hadamard row uniformly at random. Then,

 E[⟨y,ah⟩⟨ah,z⟩]=1nn∑i=1⟨y,hi⟩⟨hi,z⟩=⟨y,z⟩∀y,z,

because the Hadamard rows ’s are proportional to an orthonormal basis and have norm . This in turn implies that the coordinates of a randomly selected Hadamard matrix row obey (2), despite not being independent instances of random signs. This feature is called pairwise independence and naturally generalizes to :

###### Definition 1 (k-wise independence).

Fix and let denote independent instances of a signed Bernoulli random variable. We call a random sign vector -wise independent, if its components obey

 E[k∏i=1aik]=E[k∏i=1ϵik]

for all -tuples of indices .

Explicit constructions for -wise independent vectors are known for any and . In this work we focus on particular constructions that rely on generalizing the following instructive example. Fix and consider the rows of the following matrix:

 ⎛⎜⎝11−1−11−11−11−1−11⎞⎟⎠

The first two rows summarize all possible length-two combinations of . The coefficients of the third row correspond to their entry-wise product. Hence, it is completely characterized by the first two. The three row vectors are not mutually independent. Nonetheless, each subset of two rows does mimic independent behavior: all possible length-two combinations of occur exactly once. This ensures that a randomly selected row is pairwise independent in the sense that its coefficients obey Eq. (2).

This simple example may readily be generalized. A binary orthogonal array of strength is a sign matrix such that every selection of rows contains all elements of an equal number of times.

Several different explicit constructions of orthogonal arrays are known. A simple counting argument reveals that the number of rows must obey . This number scales polynomially in the array strength – a potentially exponential improvement over the “full” array that lists all possible elements of . In turn, selecting a random row of only requires random bits and produces a random vector that is -wise independent according to Definition 1. We refer to Sec. IV and Ref. [16] for a more thorough treatment of this concept.

### I-D Partially derandomizing complex standard Gaussian vectors

Let us now discuss another general purpose tool for (partial) de-randomization. Concentration of measure implies that -dimensional standard complex Gaussian vectors concentrate sharply around the complex sphere of radius . Hence, they behave very similarly to vectors chosen uniformly from this sphere. Such random vectors obey the following formula for any and any :

 E[|⟨z,as⟩|2k]= nk∫w∈Sn−1|⟨z,w⟩|2kdw = nk(n+k−1k)−1∥z∥2kℓ2.

Here, denotes the uniform measure on the complex unit sphere

. This formula characterizes even moments of this uniform distribution

222For comparison, a complex standard Gaussian vector obeys instead.. The concept of -designs [17] uses this moment formula as a starting point for partial de-randomization. Roughly speaking, a -design is a finite subset of -length vectors such that the uniform distribution over these vectors reproduces the uniform measure on up to -th moments. More precisely:

###### Definition 2.

A set of vectors with length is called a (complex projective) -design if a randomly chosen vector obeys for any

 E[|⟨z,a(t)⟩|2k]=nk(n+k−1k)−1∥z∥2kℓ2∀z∈Cn.

(Spherical) -designs were originally developed as cubature formulas for the real-valued unit sphere [17]. The concept has since been extended to other sets. A generalization to the complex projective space gives rise to Definition 2. Complex projective -designs are known to exist for any and any dimension , see e.g. [18, 19, 20]. However, explicit constructions for are notoriously difficult to find. In contrast, several explicit families of 2-designs have been identified. Here, we will focus on one such family. Two orthonormal bases and of are called mutually unbiased if

 ∣∣⟨bi,cj⟩∣∣2=1nfor alli,j∈[n]={1,…,n}. (4)

A prominent example for such a basis pair are the standard basis and the Fourier, or Hadamard, basis, respectively. One can show that at most different orthonormal bases exist that have this property in a pairwise fashion [21, Theorem 3.5]. Such a set of bases is called a maximal set of mutually unbiased bases (MMUB). For instance, in the standard basis together with

 1√2(11),1√2(1−1),1√2(1i),1√2(1−i)

forms a MMUB. Importantly, MMUBs are always (proportional to) 2-designs [22]. Explicit constructions exist for any prime power dimension

and one can ensure that the standard basis is always one of them. Here we point out one construction that is particularly simple if the dimension is (an odd) prime

[23]: The standard basis vectors together with all vectors whose entry-wise coefficients correspond to

 [bα,λ]k=1√nω(k+α)3+λ(k+α)n (5)

form a MMUB. Here is a -th root of unity. The parameter singles out one of the different bases, while labels the corresponding basis vectors. Excluding the standard basis, this set of vectors corresponds to all time-frequency shifts of a discrete Alltop sequence [24].

### I-E Main results

###### Theorem 1 (CS from orthogonal array measurements).

Suppose that a matrix contains rows that are chosen independently from an orthogonal array with strength four. Then, with probability at least , any -sparse can be recovered from by means of algorithm (1).

###### Theorem 2 (CS from time-frequency shifted Alltop sequences).

Let be prime and suppose that contains rows that correspond to random time-frequency shifts of the Alltop sequence (5) in dimension . Then, with probability at least , any -sparse can be recovered from by means of algorithm (1).

This result actually generalizes to measurements that are sampled from a maximal set of mutually unbiased bases (excluding the standard basis). Time-frequency shifts of the Alltop sequence are one concrete construction that applies to prime dimensions only.

Note that the cardinality of all Alltop shifts is . Hence, random bits suffice to select a random time-frequency shift. In turn, a total of

 2log2(n)m≃2Cslog2(n) (6)

random bits are required for sampling a complete measurement matrix . This number is exponentially smaller than the number of random bits required to generate a matrix with independent complex Gaussian entries. A similar comparison holds true for random signed Bernoulli matrices and columns sampled from a strength-4 orthogonal array.

Highly structured families of vectors – such as rows of a Fourier, or Hadamard matrix – require even less randomness to sample from: only bits are required to select such a row uniformly at random. However, existing convergence guarantees are weaker than the main results presented here. They require an order of random measurements to establish comparable results. Thus, the total number of random bits required for such a procedure scales like . Eq. (6) still establishes a logarithmic improvement in terms of sparsity.

The recovery guarantees in Theorem 1 and 2 can be readily extended to ensure stability with respect to noise corruption in the measurements and robustness with respect to violations of the model assumption of sparsity. We refer to Sec. III for details.

We also emphasize that there are results in the literature that establish compressed sensing guarantees comparable, or even less, randomness. Obviously, deterministic constructions are the extreme case in this regard. Early results suffer from a “quadratic bottleneck”. The number of measurements must scale quadratically in the sparsity: . Although this obstacle was overcome, existing progress is still comparatively mild. Refs. [25, 26, 27] establish deterministic convergence guarantees for , where is a (very) small constant.

Closer in spirit to this work is Ref. [28]. There, the authors employ the Legendre symbol – which is well known for its pseudorandom behavior – to partially derandomize a signed Bernoulli matrix. In doing so, they establish uniform -sparse recovery from measurements that require an order of random bits to generate. Compared to the main results presented here, this result gets by with less randomness, but requires more measurements. The proof technique is also very different.

To this date, the strongest de-randomized reconstruction guarantees hail from a close connection between -sparse recovery and Johnson-Lindenstrauss embeddings [29, 30]

. These have a wide range of applications in modern data science. Kane and Nelson

[31] established a very strong partial de-randomization for such embeddings. This result may be used to establish uniform -sparse recovery for measurements that require an order of random bits. This result surpasses the main results presented here in both sampling rate and randomness required.

However, this strong result follows from “reducing” the problem of -sparse recovery to a (seemingly) very different problem: find Johnson-Lindenstrauss embeddings. Such a reduction typically does not preserve problem-specific structure. In contrast, the approach presented addresses the problem of sparse recovery directly and relies on tools from signal processing. In doing so, we maintain structural properties that are common in several applications of -sparse recovery. Orthogonal array measurements, for instance, have -entries. This is well-suited for the single pixel camera [32]. Alltop sequence constructions, on the other hand, have successfully been applied to stylized radar problems [33]. Both types of measurements also have the property that every entry has unit modulus. This is an important feature for the application of CDMA [34]. Having pointed out these high level connections, we want to emphasize that careful, problem specific adaptations may be required to rigorously exploit these. The framework developed here may serve as a guideline on how to achieve this goal in concrete scenarios.

## Ii Proofs

### Ii-a Textbook-worthy proof for real-valued compressed sensing with Gaussian measurements

This section is devoted to summarizing an elegant argument that is originally due to Rudelson and Vershynin [14], see also [35, 36, 37] for arguments that are similar in spirit. This argument only applies to -sparse recovery of real-valued signals. We will generalize a similar idea to the complex case later on.

In this work we are concerned with uniform reconstruction guarantees: With high probability a single realization of the measurement matrix allows for reconstructing any -sparse vector by means of -regularization (1). A necessary pre-requisite for uniform recovery is the demand that no -sparse vector is contained in the kernel, or nullspace, of . This condition is captured by the nullspace property (NSP). Define

 Ts={z∈Sn−1:∥z∥ℓ1≥2σs(z)}⊂Sn−1, (7)

where is the approximation error (measured in -norm) one incurs when approximating with a -sparse vector. A matrix obeys the NSP of order if

 infz∈Ts∥Az∥ℓ2>0. (8)

The set is a subset of the unit sphere that contains all normalized -sparse vectors. This justifies the informal definition of the NSP: no -sparse vector is an element of the nullspace of . Importantly, the NSP is not only necessary, but also sufficient for uniform recovery, see e.g. [3, Theorem 4.5]. Hence, universal recovery of -sparse signals readily follows from establishing Rel. (8). The nullspace property and its relation to -sparse recovery has long been somewhat folklore. We refer to Ref. [3] for a discussion of its origin.

The following powerful statement allows for exploiting generic randomness in order to establish nullspace properties. It is originally due to Gordon [38], but we utilize a more modern reformulation, see [3, Theorem 9.21].

###### Theorem 3 (Gordon’s escape through a mesh).

Let be a real-valued standard Gaussian matrix and let be a subset of the real-valued unit sphere. Define the Gaussian width where the expectation is over realizations of a standard Gaussian random vector. Then, for the bound

 infz∈E∥Az∥ℓ2≥√m−1−ℓ(E)−t

is true with probability at least .

This is a deep statement that connects random matrix theory to geometry: the Gaussian width is a rough measure of the size of the set

. Setting allows us to conclude that a matrix encompassing independent Gaussian measurements is very likely to obey the -NSP (8), provided that exceeds . In order to derive an upper bound on , we may use the following inclusion

 Ts⊂2conv(Σsn),

see e.g. [35, Lemma 3] and [14, Lemma 4.5]. Here, denotes the set of all -sparse vectors with unit length. In turn,

 ℓ(Ts)≤2Esupz∈conv(Σsn)⟨ag,z⟩=2Esupz∈Σns⟨ag,z⟩, (9)

because the linear function achieves its maximum value at the boundary of the convex set . The right hand side of (9) is the expected supremum of a Gaussian process indexed by . Dudley’s inequality [39], see also [3, Theorem 8.23], states

 Esupz∈Σns⟨ag,z⟩≤4√2∫10√ln(N(Σns,∥⋅∥ℓ2,u)),du

where are covering numbers associated with the set . They are defined as the smallest cardinality of a -covering net with respect to the Euclidean distance. A volumetric counting argument yields and Dudley’s inequality therefore implies

 ℓ(Ts)≤c√slog(en/s),

where is an absolute constant. This readily yields the following assertion.

###### Theorem 4 (NSP for Gaussian measurements).

A number of independent real-valued Gaussian measurements obeys the (real-valued) -NSP with high probability at least .

This argument is exemplary for generic proof techniques: strong results from probability theory allow for establishing close-to-optimal results in a relatively succinct fashion.

### Ii-B Extending the scope to subgaussian measurements

The extended arguments presented here are largely due to Dirksen, Lecue and Rauhut [36]. Again, we will focus on the real-valued case.

Gordon’s escape through a mesh is only valid for Gaussian random matrices . Novel methods are required to extend this proof technique beyond this idealized case. Comparatively recently, Mendelson provided one by generalizing Gordon’s escape through a mesh [40, 41].

###### Theorem 5 (Mendelson’s small ball method, Tropp’s formulation [37]).

Suppose that is a random matrix whose rows correspond to independent realizations of a random vector . Fix a set , and define

 Qξ(a,E)= infz∈EPr[|⟨z,a⟩|≥ξ]forξ>0, Wm(a,E)= Esupz∈E⟨z,h⟩whereh=1√mm∑i=1ϵiai∈Rn,

is the empirical average over independent copies of weighted by uniformly random signs . Then, for any

 infz∈E∥Az∥ℓ2≥ξ√mQ2ξ(a,E)−2Wm(a,E)−ξt

with probability at least .

It is worthwhile to point out that for real-valued Gaussian vectors this result recovers Theorem 3 up to constants. Fix of appropriate size. Then, ensures that is constant. Moreover, reduces to the usual Gaussian width .

Mendelson’s small ball method can be used to establish the nullspace property for independent random measurements that exhibit subgaussian behavior:

 Eexp(θ⟨y,a⟩)≤exp(θ22∥y∥2ℓ2)for% ally∈Rn,θ>0. (10)

Signed Bernoulli vectors are a concrete example: is an independent instance of a Rademacher random variable. Signed Bernoulli vectors obey

 E[⟨z,asb⟩2]= n∑i,j=1E[ϵiϵj]zizj=∥z∥2ℓ2∀z∈Rn. (11)

Direct computation also reveals

 E[⟨z,asb⟩4]= n∑i,j,k,l=1E[ϵiϵjϵkϵl]zizjzkzl = n∑i=1E[ϵ4i]z4i+3∑i≠jE[ϵ2i]E[ϵ2j]z2iz2j = n∑i=1z4i+3∑i≠jz2iz2j=3∥z∥4ℓ2−2∥z∥4ℓ4 ≤ 3∥z∥4ℓ2, (12)

because there are 3 possible pairings of four indices.

Now, set .

An application of the Paley-Zygmund inequality then allows for bounding the parameter in Mendelson’s small ball method from below:

 Q2ξ(asb,Ts)≥ infz∈SnPr[|⟨z,asb⟩|≥2ξ] = ≥ infz∈Sn(1−4ξ2)2E[⟨z,asb⟩2]2E[⟨z,asb⟩4]≥(1−4ξ2)23.

This lower bound is constant for any .

Next, note that is a stochastic process that is indexed by . This process is centered () and Eq. (10) implies that it is also subguassian (at least for any ). Moreover, readily follows from (11). Unlike Gordon’s escape through a mesh, Dudley’s inequality does remain valid for such stochastic processes with subgaussian marginals. We can now repeat the width analysis from the previous section to obtain

 Wm(asb,Ts)≤2Esupz∈Σns⟨z,h⟩≤c√slog(en/s).

Fixing sufficiently small, setting and inserting these bounds into Eq. (5) yields the following result.

###### Theorem 6 (NSP for signed Bernoulli measurements).

A matrix encompassing random signed Bernoulli measurements obeys the real-valued -NSP with probability at least .

A similar result remains valid for other classes of independent measurements with subgaussian marginals (10).

### Ii-C Generalization to complex-valued signals and partial de-randomization

The nullspace property, as well as its connection to uniform -sparse recovery readily generalizes to complex-valued -sparse vectors. A similar extension applies to Mendelson’s small ball method:

###### Theorem 7 (Mendelson’s small ball method for complex vector spaces).

Suppose that the rows of correspond to independent copies of a random vector . Fix a set and define

 Qξ(a,E)= Wm(a,E)= Esupz∈E|⟨z,h⟩|whereh=1√mm∑i=1ϵiai.

Then, for any

 infz∈E∥Az∥ℓ2≥√2(ξ√mQ23/2ξ/2−2Wm(E,a)−ξt)

with probability at least .

Such a generalization was conjectured by Tropp [37], but we are not aware of any rigorous proof in the literature. We provide one in Subsection V-B and believe that such an extension may be of independent interest. This extension allows for generalizing the arguments from the previous subsection to the complex-valued case.

Let us now turn to the main scope of this work: partial de-randomization. Effectively, Mendelson’s small ball method reduces the task of establishing nullspace properties to bounding the two parameters and in an appropriate fashion. A lower bound on the former readily follows from the Paley-Zygmund inequality, provided that the random vector obeys

 E[|⟨a,z⟩|2]= ∥z∥2ℓ2for allz∈Cn (isotropy), E[|⟨a,z⟩|4]≤ C4∥z∥4ℓ2 (4h moment bound),

where is a constant:

 Q23/2ξ(a,Ts)≥C−14(1−8ξ2)2for anyξ>0. (13)

In contrast, establishing an upper bound on via Dudley’s inequality requires subgaussian marginals (10) (that must not depend on the ambient dimension). This implicitly imposes stringent constraints on all moments simultaneously. An additional assumption allows to considerably weaken these demands:

 max1≤k≤n|⟨ek,a⟩|2= 1almost surely (incoherence). (14)

Incoherence has long been identified as a key ingredient for developing -sparse recovery guarantees. Here, we utilize it to establish an upper bound on that does not rely on subgaussian marginals.

###### Lemma 1.

Let be a random vector that is isotropic and incoherent. Let be the complex-valued generalization of the set defined in Eq. (7) and assume . Then,

 Wm(a,Ts)≤4√2slog(2n). (15)

This bound only requires an appropriate scaling of the first two moments (isotropy). However, this partial derandomization comes at a price: the bound scales logarithmically in rather than . We defer a proof of this statement to Subsection V-A below. Inserting the bounds (13) and (15) into the assertion of Theorem 7 readily yields the main technical result of this work:

###### Theorem 8.

Suppose that is a random vector that obeys incoherence, isotropy and the 4th moment bound. Then, choosing

 m≥Cslog(n)

instances of uniformly at random results in a measurement matrix that obeys the complex-valued nullspace property of order with probability at least .

In complete analogy to the real-valued case, the complex nullspace property ensures uniform recovery of -sparse vectors from linear measurements of the form via algorithm (1).

### Ii-D Recovery guarantee for strength-four orthogonal arrays

Suppose that is chosen uniformly from an orthogonal array with strength 4. By definition

 ∥aoa∥ℓ∞=|±1|=1,

which establishes incoherence. Moreover, the components of obey , because 4-wise independence necessarily implies 2-wise independence. Isotropy readily follows:

 E[|⟨z,a⟩|2]=∑i,jE[ai¯aj]¯zizj=⟨z,z⟩∀z∈Cn.

Finally, 4-wise independence suffices to establish the 4th moment bound. By assumption and we may thus infer

 E[|⟨z,aoa⟩|4]=n∑i,j,k,l=1E[ϵiϵjϵkϵl]¯zi¯zjzkzl = n∑i=1E[ϵ4i]|zi|4 + ∑i≠jE[ϵ2i]E[ϵ2j]⎛⎝∑i≠j¯z2iz2j+2∑i≠j|zi|2∑j|zj|2⎞⎠ ≤ ∥z∥4ℓ4+3∥z∥4ℓ2≤4∥z∥4ℓ2.

Therefore meets all the requirements of Theorem 8. The first main result then readily follows from the fact that the complex nullspace property ensures uniform recovery of all -sparse signals.

### Ii-E Recovery guarantee for mutually unbiased bases

Suppose that is chosen uniformly from a maximal set of mutually unbiased bases (excluding the standard basis) whose elements are re-normalized to length . Random time-frequency shift of the Alltop sequence (5) is a concrete example for such a sampling procedure, provided that the dimension is an (odd) prime.

The vector is chosen from a union of bases that are all mutually unbiased with respect to the standard basis, see Eq. (4). Together with super-normalization () this readily establishes incoherence: with probability one.

Next, by assumption is chosen uniformly from a union of re-scaled orthonormal bases with . Therefore, for any

 E[|⟨amub,z⟩|2]= 1n2n∑l=1n∑i=1|√n⟨b(l)i,z⟩|2 = 1nn∑l=1∥z∥2ℓ2=∥z∥2ℓ2

which establishes isotropy.

Finally, a maximal set of mutually unbiased bases – including the standard basis which we denote by – forms a 2-design according to Definition 2. For any this property ensures

 E[|⟨amub,z⟩|4]= n+1∑l=1n∑i=1|⟨b(l)i,z⟩|4−n∑k=1|⟨ek,z⟩|4 = 2∥z∥4ℓ2−∥z∥4ℓ4≤2∥z∥4ℓ2

which implies the 4th moment bound. In summary, the random vector meets the requirements of Theorem 8. Theorem 2 then readily follows form the implications of the nullspace property for -sparse recovery.

## Iii Extension to noisy measurements

The nullspace property may be generalized to address two imperfections in -sparse recovery simultaneously: (i) the vector may only be approximately sparse in the sense that it is well-approximated by a -sparse vector, (ii) the measurements may be corrupted by additive noise: with .

To state this generalization, we need some additional notation. For and , let be the vector that only contains the largest entries in modulus. All other entries are set to zero. Likewise, we write to denote the remainder. In particular, . A matrix obeys the robust nullspace property of order with parameters and if

 ∥zs∥ℓ2≤ρ√s∥z¯s∥ℓ1+τ∥Az∥ℓ2for allz∈Sn−1

see e.g. [3, Definition 4.21]. This extension of the nullspace property is closely related to stable -sparse recovery from noisy measurements via basis pursuit denoising:

 minimize ∥z∥ℓ1 (16) subject to ∥Az−y∥ℓ2≤η.

Here, denotes an upper bound on the strength of the noise corruption: . Indeed, [3, Theorem 4.22] draws the following connection: suppose that obeys the robust nullspace property with parameters . Then, the solution to (16) is guaranteed to obey

 ∥z♯−x∥ℓ2≤D1√sσs(x)+D2η, (17)

where and . The first term on the r.h.s. vanishes if is exactly -sparse and remains small if is well approximated by a -sparse vector. The second term scales linearly in the noise bound and vanishes in the absence of any noise corruption.

In the previous section, we have established the classical nullspace property for measurements that are chosen independently from a vector distribution that is isotropic, incoherent and obeys a bound on the 4th moments. This argument may readily be extended to establish the robust nullspace property with relatively little extra effort. To this end, define the set

 Tρ,s={z∈Sn−1:∥zs∥ℓ2>ρ√s∥z¯s∥ℓ1}⊆Sn−1.

A moment of thought reveals that the matrix obeys the robust nullspace property with parameters if

 infz∈Tρ,s∥Az∥ℓ2≥1τ. (18)

What is more, the following inclusion formula is also valid:

 Tρ,s⊂3ρconv(Σsn),

see [35, Lemma 3] and [14, Lemma 4.5]. This ensures that the bounds on the parameters in Mendelson’s small ball method generalize in a rather straightforward fashion. Isotropy, incoherence and the 4th moment bound ensure

 Q2ξ(a,Tρ,s)≥ (1−2ξ2)2C4, Wm(a,Tρ,s)≤ 12ρ√2slog(2n).

Now, suppose that subsumes independent copies of the random vector , where is sufficiently large. Then, Theorem 7 readily asserts

 infz∈Tρ,s∥Az∥ℓ2≥cρ√m (19)

with probability at least . Previously, we employed Mendelson’s small ball method to simply assert that a similar infimum is strictly positive. Eq. (19) provides a strictly positive lower bound with comparable effort. Comparing this relation to Eq. (18) highlights that this is enough to establish the robust nullspace property with parameters and with high probability. In turn, a stable generalization of the main recovery guarantee follows from Eq. (17).

###### Theorem 9.

Fix and . Suppose that we sample independent copies of an isotropic, incoherent random vector that also obeys the 4th moment bound. Then, with probability at least , the resulting measurement matrix allows for stable, uniform recovery of (approximately) -sparse vectors. More precisely, the solution to (16) is guaranteed to obey

 ∥x−z♯∥ℓ2≤D1√sσs(x)+D2η√m,

where depend only on .

## Iv Numerical experiments

In this part we demonstrate the performance which can be achieved with our proposed derandomized constructions and we compare this to generic measurement matrices (Gaussian, signed Bernoulli). However, since the orthogonal array construction is more involved we first provide additional details relevant for numerical experiments.

### Iv-a Details on orthogonal arrays

An orthogonal array of strength , with factors and levels is an array of different symbols such that in any columns every ordered -tuple occurs in exactly rows. Arrays with are called simple. A comprehensive treatment can be found in the book [16]. Known arrays are listed in several libraries333for example http://neilsloane.com/oadir/ or http://pietereendebak.nl/oapage/. Often the symbol alphabet is not relevant, but we use the set for concreteness. Such arrays can be represented as a matrix in . For with prime the simple orthogonal array is linear if the rows of the matrix form a vector space over . The runs of an orthogonal array (the rows of the corresponding matrix) can also be interpreted as codewords of a code and vice versa. The array is linear if and only if the corresponding code is linear [16, Chapter 4]. This relationship allows to employ classical code constructions to construct orthogonal arrays.

### Iv-B Counting bits

In this work we propose to generate sampling matrices by selecting rows at random from an orthogonal array , eventually removing the bias (substracting per component) and scale appropriately. Intuitively, bits are then required to specify such a matrix . For and , a classical lower bound due to Rao [42] demands

 M=λσ4≥1+n+(n2)=Ω(n2). (20)

Arrays that saturate this bound are called tight (or complete). In summary, an order of bits are required to sample a matrix with rows according to this procedure.

### Iv-C Strength-4 Constructions

For compressed sensing applications we want arrays with large number of factors since this corresponds to the ambient dimension of the sparse vectors to recover. On the other hand the run size should scale “moderately” to describe the random matrices only with few bits. Most constructions use an existing orthogonal array as a seed to construct larger arrays. Known binary arrays of strength are for example the simple array , or . Ref. [43] proposes an algorithm that uses a linear orthogonal array as a seed to construct a linear orthogonal array . This procedure may then be iterated.

### Iv-D Numerical results for orthogonal arrays:

Figure 1 summarizes the empirical performance of basis pursuit (1) from independent orthogonal array measurements. We consider real-valued signals and quantify the performance in terms of the normalized -recovery error (NMSE). To construct the orthogonal array, algorithm [43] is applied twice .

The rows are uniformly sampled from this array, i.e. the sampling matrix has entries (mapping ) and size . Note that, in the case of non-negative sparse vectors, the corresponding 0/1-matrices may be used instead to recover with non-negative least-squares [44]. The sparsity of the unknown vector has been varied between . For each sparsity many experiments are performed to compute NMSE. In each run, the support of the unknown vector has been chosen uniformly at random and the values are independent instances of a standard Gaussian random variable. For comparison, we have also included the corresponding performances of a generic sampling matrix (signed Bernoulli) of the same size. Numerically, the partially derandomized orthogonal array construction achieves essentially the same performance as its generic counterpart.

### Iv-E Numerical results for the Alltop design

Figure 1 shows the NMSE achieved for measurement matrices based on subsampling from an Alltop-design (5

). The data is obtained in the same way as above but the sparse vectors are generated as iid. complex-normal distributed on the support. For comparison the results for a (complex) standard Gaussian sampling matrix are included as well. Again, the performance of random Alltop-design measurements essentially matches its generic (Gaussian) counterpart.