1. Introduction
Since Shannon’s classical sampling theorem [50, 54], sampling theory has been a widely studied field in signal and image processing. Infinitedimensional compressed sensing [4, 9, 36, 48, 49, 37, 18] is part of this rich theory and offers a method that allows for infinitedimensional signals to be recovered from undersampled linear measurements. This gives a nonlinear alternative to other methods like generalized sampling [5, 2, 6, 34, 32, 8, 41] and the ParametrizedBackground DataWeak (PBDW)method [13, 24, 14, 44, 43, 42] that reconstruct infinitedimensional objects from linear measurement. However, these methods do not allow for subsampling, and hence are dependent on consecutive samples of, for example, the Fourier transform. Infinitedimensional compressed sensing, on the other hand, is similar to generalized sampling and the PBDWmethod, but utilises an optimisation problem that allows for subsampling.
Beside the typical flagship of modern compressed sensing, namely MRI [40, 31], there is also a myriad of other applications, like fluorescence microscopy [47, 51], single pixel cameras [29], medical imaging devices like computer tomography [19], electron microscopy [38], lensless cameras [35], compressive holography [20] and laserbased failureanalysis [52] among others. The applications divide themselves in three different groups: those that are modelled by Fourier measurements, those that are based on the Radon transform, and those that are represented by binary measurements. For Fourier measurements there exists a large history of research, however for Radon and binary measurements, the theoretical results are scarce. In this paper we consider binary measurements and provide the first nonuniform recovery guarantees for infinitedimensional compressed sensing.
The setup of infinitedimensional compressed sensing is as follows. We consider an orthonormal basis of a Hilbert space and an element
to be recovered from linear measurements. That is, we have another orthonormal basis of and we can access the linear measurements given by . Although the Hilbert space can be arbitrary, we will in applications mostly consider function spaces. Hence, we will often refer to the object as well as the basis elements as functions. We call the functions , sampling functions and the space spanned by them sampling space. Accordingly, , are called reconstruction functions and reconstruction space. Generalized sampling [2, 3, 4] and the PBDWmethod [44] use the change of basis matrix with to find a solution to the problem of reconstructing coefficients in the reconstruction space from measurements in the sampling space. This is also the case in infinitedimensional compressed sensing. In particular, we consider the following reconstruction problem. For a fixed signal and the measurements , where is the subsampling set, the orthogonal projection onto the elements indexed by and some additional noise. The reconstruction problem is to find a minimiser of
(1.1) 
2. Preliminaries
2.1. Setting and Definitions
In this section we recall the settings from [9] that are needed to establish the main results. First, note that we will use to describe that is smaller modulo a constant, i.e. there exists some such that . Moreover, for a set the orthogonal projection corresponding to the elements of the canonical bases of with the indices of is denoted by . Similar, for the orthogonal projection onto the first elements of the canonical basis of is represented by . Finally,
stands for the orthogonal projection onto the basis vectors related to the indices
.Note that (1.1) is an infinitedimensional optimisation problem, however, in practice (1.1) is replaced by
Xlets such as wavelets [45], Shearlets[22, 23] and Curvelets [15, 17, 16] yield a specific sparsity structure according to their level structure. To describe this phenomena the notation of sparsity is introduced instead of pure sparsity.
Definition 2.1 ([9]).
Let . For let with and , with , where . We say that is sparse if, for each ,
(2.1) 
satisfies . We denote the set of  sparse vectors by .
The majority of natural signals is not perfectly sparse but instead has a small tail in the representation system. Hence, in a large number of applications it is unlikely to ask for sparsity but compressibility.
Definition 2.2 ([9]).
Let , where . We say that is  compressible with respect to if is small, where
(2.2) 
In terms of this more detailed description of the signal instead of classical sparsity it is possible to adapt the sampling scheme accordingly. Complete random sampling will be substituted by the setting of multilevel random sampling. This allows us later to treat the different levels separately. Moreover, this represents sampling schemes that are used in practice.
Definition 2.3 ([9]).
Let with , , with , , and suppose that
(2.3) 
are chosen uniformly at random, where . We refer to the set
(2.4) 
as an  multilevel sampling scheme.
Remark 2.4.
To avoid pathological examples we assume as in [9] that we have for the total sparsity . This results in the fact that and therefore also for all .
3. Main results: Nonuniform recovery for the Walshwavelet case
In this paper we focus on the reconstruction from binary measurements. This arises naturally in examples like those mentioned in the introduction and applications where the sampling is performed with an apparatus that has an ”onoff”behaviour. We focus on the setting of recovering data in , however the theory builds on general results for Hilbert spaces as presented in §2. Linear measurements are typically represented by inner products between sampling functions and the data of interest. Binary measurements can be represented with functions that take values in , or, after a well known and convenient trick of subtracting constant one measurements, with functions that take values in . For practical reasons it is sensible to consider functions that provide fast transforms. Additionally, the function system should correspond well to the reconstruction space. For the reconstruction with wavelets, Walsh functions have proven to be a sensible choice, and are discussed in more detail in §3.2.1. Sampling from binary measurements has been analysed for the linear case in [33, 53, 12] and in the nonlinear case in [1, 46]. We extend this results to the nonuniform recovery guarantees in the nonlinear case. By filling this gap we gain broad knowledge about linear and nonlinear reconstruction for two of the three main measurement systems: Fourier and binary.
Let
(3.1) 
where denotes the Walsh functions on as described in §3.2.1, and demotes the Daubecies boundary wavelets on described in §3.2.2. We are now able to state the recovery guarantees for the Walshwavelet case.
Theorem 3.1 (Main theorem).
Given the notation above, let define the sampling levels as in (3.28) and represent the levels of the reconstruction space as in (3.27). Consider as in (3.1) , and let be a multilevel sampling scheme such that the following holds:

Let , , , such that
(3.2) 
For each ,
(3.3)
Then with probability exceeding
, any minimizer of (1.1) satisfies(3.4) 
for some constant , where . If for then this holds with probability .
This results allows one to exploit the asymptotic sparsity structure of most natural images under the wavelet transform. It was observed in [9] that the ratio of nonzero coefficients per level decreases very fast with increasing level and at the same time the level size increases. Hence, most images are not that sparse in the first levels and sampling patterns should adapt to that. However, they are very sparse in the higher levels. Therefore, we get that the number of measurements depends mainly on the sparsity in this level and the influence of the sparsity in the other levels decays exponentially.
Remark 3.2.
For awareness of potential extensions of this work to higher dimensions or other reconstruction and sampling spaces we kept the factor in (3.3). However, for the Walshwavelet case in one dimension this factor reduces to , when the values from Equation are used. Hence, the Equation (3.3) can be further simplified to
(3.5) 
however, in general one needs the factor .
Remark 3.3.
We would like to highlight the differences to the Fourierwavelet case, i.e. to Theorem 6.2. in [9]. The most striking difference is the squared factor in (3.2). In the Fourierwavelet case this is dependent on the smoothness of the wavelet and shown to be where denotes the decay rate under the Fourier transform, i.e. the smoothness of the wavelet. For very smooth wavelets this can be improved to
(3.6) 
Hence, for very smooth wavelets we get the optimal linear relation, beside log terms. However, for nonsmooth wavelets like the Haar wavelet, we get a squared relation instead of linear. The reason why we do not observe a similar dependence on the smoothness in terms of the sampling relation is that smoothness of a wavelet does not relate to a faster decay under the Walsh transform. This is also related to the fact that for Fourier measurements (3.3) become
(3.7) 
where and are positive numbers, , where denotes the number of vanishing moments, and In particular, smoothness and vanishing moments of the wavelet does have an impact in the Fourier case, but not in the Walsh case. This is also confirmed in Figure 7, where we have plotted the absolute values of sections of , where is the infinite matrix from (3.1). As can be seen in Figure 7, the matrix gets more block diagonal in the Fourier case with more vanishing moments confirming the dependence of and in (3.7). Note that for a completely block diagonal matrix the in (3.7) will only depend on and not any of the when
. In contrast this effect is not visible in the Walsh situation suggesting that the estimate in (
3.3) captures the correct behaviour by not depending on and . The reason why is that a function needs to be smooth in the dyadic sense to have a faster decay rate under the Walsh transform. However, this is not related to classical smoothness. Finally, numerical examples in §5 suggest that the squared relation in (3.2) is not sharp and is also possible to reconstruct images with a reduced relation between the maximal sample and the maximal reconstructed coefficient.3.1. Connection to related work
Reconstruction methods are mainly divided in two major classes of linear and nonlinear methods. For the linear case generalized sampling [2] and the PBDWmethod [44] are prominent examples. Preceding to the first one consistent sampling was investigated by Aldroubi, Eldar, Unser and others [11, 55, 25, 26, 27, 28]. Then generalized sampling has been studied by Adcock, Hansen, Hrycak, Gröchenig, Kutyniok, Ma, Poon, Shadrin in [5, 2, 6, 34, 32, 8, 41]. The PBDWmethod evolved from the work of Maday, Patera, Penn and Yano in [43] first under the name
generalized empirical interpolation method
. This was then further analysed and extended to the PBDWmethod by Binev, Cohen, Dahmen, DeVore, Petrova, and Wojtaszczyk [13, 24, 14, 44, 42]. The stability and accuracy of both methods is secured by the stable sampling rate (SSR) which controls the number of samples needed for a stable and accurate reconstruction of a certain number of coefficients in the reconstruction space. It was shown that the SSR is linear for the Fourierwavelet [7], Fouriershearlet [41] and Walshwavelet case [33]. However, this is not always the case as for the Fourierpolynomial situation [34]. In the nonlinear setting the most prominent reconstruction technique is infinitedimensional compressed sensing [18] as analysed in the Fourier case by Adcock, Hansen, Kutyniok, Lim, Poon and Roman [9, 37, 4, 49, 48]. There exists wide spread knowledge in this area. For the Fourier wavelet case we know uniform recovery guarantees [39] and nonuniform recovery guarantees [9, 10]. For Walsh measurements we have uniform recovery guarantees from Adcock, Antun and Hansen [1] and an analysis for variable and multilevel density sampling strategies for the WalshHaar case and finitedimensional signals by Moshtaghpour, Dias and Jacques in [46]. In this paper we present the nonuniform results for the Walshwavelet case as has been studied for the Fourier case in [9, 10].3.2. Sampling and Reconstruction space
3.2.1. Sampling Space
We start with the sampling space. To model binary measurements Walsh functions have proven to be a good choice. They behave similar to Fourier measurements with the difference that they work in the dyadic rather than the decimal analysis. They also have an increasing number of zero crossing. This leads to the fact that the change of basis matrix gets a block diagonal structure, as can be seen in Figure 7. One can exploit the asymptotic sparsity and incoherence. However, the fact that they are defined in the dyadic analysis leads to some difficulties and specialities in the proof.
Let us now define the Walsh functions, which form the kernel of the Hadamard matrix. Then we proceed with their properties and the definition of the Walsh transform.
Definition 3.4 (Walsh function).
Let with be the dyadic expansion of . Analogously, let with . The generalized Walsh functions in are given by
(3.8) 
This definition can also be extended to negative inputs by . Walsh functions are oneperiodic in the second input if the first one is an integer. Moreover, the definition is extended to arbitrary inputs instead of the more classical definition for . We would like to make the reader aware of different orderings of the Walsh functions. The one presented here is the WalshKaczmarz ordering. It is ordered in terms of increasing number of zero crossings. This has the advantage that it relates nicely with the scaling of wavelets. Two other possible orderings are WalshPaley and WalshKronecker. Both have the drawback that the number of zero crossings is not increasing. Therefore, we are not able to get the block diagonal structure in the change of basis matrix. The Walsh Kronecker ordering is also not often used in practice because one has to predefine the largest input of and dependent on this value the ordering is changing, i.e. there is a third input which also leads to changes.
For the sampling pattern we divide the sequency parameter into blocks of size with . This results in an insightful relationship between the wavelets and the Walsh functions. Additionally, it is directly related to the block structure observed in numerical experiments.
After the small excursion on orderings we now define the sampling space in one dimension by
(3.9) 
In general it is not possible to acquire or save an infinite number of samples. Therefore, we restrict ourselves to the sampling space according to , i.e.
(3.10) 
The Walsh functions obey some interesting properties: the scaling property, i.e. for all and and the multiplicative identity, i.e. , where is the dyadic addition. With the Walsh functions we are able to define the continuous Walsh transform almost everywhere:
(3.11) 
The properties from the Walsh functions are easily transferred to the Walsh transform. We state now some more statements about the Walsh functions and the Walsh transform, which are necessary for the main proof.
Lemma 3.5 ([33]).
Let and , then the following holds:
(3.12) 
Remark that this only holds because and do not have nonzero elements in their dyadic representation at the same spot and therefore the dyadic addition equals the decimal addition. Next, we consider Walsh polynomials and see how we can relate the sum of squares of the polynomial to the sum of squares of its coefficients.
Definition 3.6 ([33]).
Let such that and . Then for we define the Walsh polynomial of order by . The set of all Walsh polynomials up to degree is given by
(3.13) 
Lemma 3.7 ([33]).
Let such that and consider the Walsh polynomial for . If , such that , then
(3.14) 
In the proof we will combine the shifts in the wavelet in a Walsh polynomial. With this lemma at hand this is then easily bounded.
3.2.2. Reconstruction Space
Next, we define the reconstruction space. As we are mainly interested in the reconstruction of images and audio signals, we use Daubechies wavelets. They provide good smoothness and support properties. Moreover, they obey the Multiresolution analysis (MRA). The wavelet space is described by the wavelet at different levels and shifts for , i.e. we have the wavelet space at level
(3.15) 
They build a representation system for , i.e. . For the MRA we define also the sampling function and the according sampling space
(3.16) 
where . We then have that and . The Daubechies scaling function and wavelet have the same smoothness properties. This allows us to deal with them interchangeably, as we only need the decay rate under the Walsh transform for the analysis.
However, the classical definition of Daubechies wavelets has a large drawback for our setting. Normally, they are defined on the whole line . Due to the fact that Walsh functions are defined on it is necessary to restrict the wavelets also to . Otherwise there will be elements in the reconstruction space which are not in the sampling space and therefore the solution could not be unique. Hence, we are using boundary corrected wavelets (§4 [21]). In [33] this problem of the relation between the reconstruction and sampling space is discussed in more length and we refer the interested reader.
For the definition of boundary wavelets we have to correct those that intersect with the boundary. We start with the definition of the scaling space and continue with the wavelet space. Let be the scaling function of order with support in . First consider the lowest level such that the scaling functions do only intersect with one boundary or , i.e. . Then we can keep the interior scaling functions. The exterior ones are changed according to [21] and are denoted by
(3.17) 
The left and right functions still have the same smoothness properties and staggered support, such that the new system has the same properties as before. Additionally, it was proved in [21] that can be spanned by the scaling function and it translates and the reflected version , i.e.
(3.18) 
The new system still obeys the MRA hence the boundary wavelets can be deduced from the boundary corrected scaling functions. Fortunately, we only need the smoothness properties of the wavelet. The boundary wavelet will be denoted by and . Because the smoothness properties are also kept after the boundary correction, we do not get into the details about the construction of the wavelet. The interested reader should seek out for [21] for a detailed analysis.
With this information at hand we can now have a look at the reconstruction space. To analyse we represent the low frequency part by the scaling space at and the higher frequency with the wavelet spaces with . Hence, the reconstruction space is given by
(3.19)  
(3.20) 
In the finitedimensional setting we only consider wavelets up to a certain scale we denote the space of the first elements by
(3.21)  
(3.22) 
Remark 3.8.
We consider here only the case of Daubechies wavelets of order . The theory also holds for the case for order . Nevertheless, we get unpleasant exponents depending on the wavelet and different cases to consider. For the Haar wavelet, we can get even better estimates due to the perfect block structure of the change of basis matrix in that case. A detailed analysis of the relation between Haar wavelets and Walsh functions can be found in [53] and we discuss the recovery guarantees for this special case in §4.4.
For the future analysis we want to get the shift in the wavelets transferred to the Walsh function. For this sake we use Lemma 3.5. However, in the assumptions we have that and . Due to the larger support of the wavelets this does not hold true, i.e.
(3.23)  
(3.24) 
Therefore, we have to separate the wavelets into parts which have support in . Remark that this is not a contradiction to the construction of the boundary wavelets. They are indeed supported in . However, only from the beginning of the scaling and not the mother scaling function. Therefore, we represent the mother scaling function as follows
(3.25) 
and
(3.26) 
This can also be done accordingly for the reflected function . More detailed information about this problem can be found in [33].
3.2.3. Ordering
We are now discussing the ordering of the sampling and reconstruction space. We order the reconstruction space according to the levels, as in (3.19). With this we get the multilevel subsampling scheme with the level structure. For this sake, we bring the scaling function at level and the wavelet at level together into one block of size . The next level constitutes of the wavelets of order of size and so forth. Therefore, we define
(3.27) 
to represent the level structure of the reconstruction space. For the sampling space we use the same partition. We only allow by the choice of oversampling. Let
(3.28) 
We then get for the reconstruction matrix in (3.1) with that and for the first block we have for and for . For the next levels, i.e. for we get for with and that .
The proof of the main theorem relies mainly on the analysis of the change of basis matrix. Numerical examples and rigour mathematics [53] show that it is perfectly block diagonal for the WalshHaar case. And it is also close to block diagonality for other Daubechies wavelets, which can be seen in Figure 7. An intuition about this phenomena is given in Figure 11. We plotted Haar wavelets at different scales with Walsh functions at different sequencies. In (a)a the scaling of the Haar wavelet is higher than the sequency of the Walsh function. Therefore, the Walsh function does not change the wavelet on its support and hence it integrates to zero. The next one (b)b shows a wavelet and Walsh function at similar scale and sequency which relates to parts of the change of basis function in the inner block. Here the two functions add up nicely to get a nonzero output. Last, we have in (c)c that the Walsh functions oscillate faster then the wavelet and hence the Walsh function is not disturbed by the wavelet and can integrate to zero.
4. Proof of the Main Result
4.1. Preliminaries
It is important to make sure that the uneven finite sections of the change of basis matrix are close to an isometry. In the finitedimensional setting this is assured by the stable sampling rate. Detailed analysis about the stable sampling rate for Walsh functions can be found in [33]. The analysis for Fourier measurements is conducted in [41, 30, 7]. For the infinite case the balancing property controls the relation between the number of samples and reconstructed coefficients, such that the matrix is close to an isometry.
Definition 4.1 ([4]).
Let be an isometry. Then and satisfy the strong balancing property with respect to and if
(4.1)  
(4.2) 
where is the norm on .
In this setting we use the notation as in [9]. Let
(4.3) 
In the rest of the analysis we are interested in the number of samples needed for stable and accurate recovery. This value depends besides known values on the local coherence and the relative sparsity which are defined next. We start with the (global) coherence.
Definition 4.2 ([9]).
Let be an isometry. The coherence of is
(4.4) 
With this it is possible to define the local coherence for every level band.
Definition 4.3 ([9]).
Let be an isometry. The local coherence of with respect to is given by
(4.5) 
We also define
(4.6) 
Definition 4.4 ([9]).
Let be an isometry and and the relative sparsity is given by
(4.7) 
with
(4.8) 
After clarifying the notation and settings we are now able to state the main theorem from [9].
Theorem 4.5 ([9]).
Let be an isometry and . Suppose that is a multilevel sampling scheme, where and . Let , where , and , be any pair such that the following holds:

The parameters
(4.9) satisfy the strong balancing property with respect to and ;

For and ,
(4.10) (with replaced by ) and , where is such that
(4.11) for all and all satisfying
(4.12)
Suppose that is a minimizer of (1.1). Then, with probability exceeding ,
(4.13) 
for some constant and . If for then this holds with probability .
It is a mathematical justification to use structured sampling schemes in contrast to the first compressed sensing results which promoted the use of random sampling masks.
4.2. Key estimates
In this chapter we discuss the important estimates that are needed for the proof of Theorem 3.1. They are also interesting for themselves and allow a deeper understanding of the relation between Walsh functions and wavelets.
4.2.1. Local coherence estimate
We start with restating the results about the decay rate of wavelets under the Walsh transform.
Lemma 4.6 ([12]).
Let be a Hölder continuous function of order . Then the constant exists and we have that
(4.14) 
This leads directly to the following estimate of the decay rate of wavelets under the Walsh transform.
Corollary 4.7.
Let be the mother scaling function of order and be its reflected version. Moreover, let be the corresponding mother wavelet. Then we have that
(4.15) 
We denote by the maximum of .
Proof.
The corollary follows directly from the Hölder continuity of the wavelet. ∎
This decay rate is important in a lot of the following proofs. Next, we use it to estimate .
Lemma 4.8.
Proof.
Now, we recall the result from [12] about the local coherence. Note that the local coherence has a different definition in [9] and [12].
Theorem 4.9 ([12]).
With this two theorems at hand we can now give an estimate for the local coherence.
Corollary 4.10.
Let be as in (4.5). Then,
(4.23) 
Proof.
First, we have that
(4.24) 
Let then
(4.25) 
and similar for
(4.26) 
Combining this with the result in Lemma 4.8 we get again first for
(4.27) 