1. Introduction
Sampling theory is a core principle in image and signal processing as well as the mathematics of information, data science and inverse problems. Since the early results of Shannon
[25, 38, 40] many techniques have been developed, and there are now a myriad of methods available. Moreover, the many applications, such as Magnetic Resonance Imaging (MRI) [30, 21], electron tomography [27, 28], lensless cameras, fluorescence microscopy [39, 36], Xray computed tomography [14, 35], surface scattering [26] as well as parametrised PDEs [7, 10, 17], make the field well connected to different areas of the sciences.A standard sampling model is as follows. We have an element , where is a separable Hilbert space, and the goal is to reconstruct an approximation to from a finite number of linear samples of the form , . In particular, given that the s are linear functionals, we measure the scalar product between and some sampling element , , i.e. . It is important to note that the s cannot be chosen freely, but are dictated by the modality of the sampling device, say an MRI scanner providing Fourier samples or a fluorescence microscope giving binary measurements modelled by Walsh coefficients. A natural question that arises from this setting is, what is the number of samples that is needed for a accurate and stable reconstruction? This can be made explicit by the stable sampling rate (SSR).
To define the SSR we first introduce the sampling space and the reconstruction space. We define the sampling space , meaning the closure of the span. In practice, one can only acquire a finite number of samples, therefore, we denote by the sampling space of the first elements. Similarly, the reconstruction space denoted by is spanned by reconstruction functions , i.e. . As in the sampling case, one has to restrict to a finite reconstruction space, which is denoted by .
The key ingredient in the definition of the SSR is the subspace angle between the subspaces and . In particular,
(1) 
The orthogonal projection onto the sampling space is denoted by . Mainly, one is interested in the reciprocal value
which, as we will see below, plays a key role in all the error estimates of the two linear algorithms discussed here. Due to the definition of cosine, takes values in . We can now define the stable sampling rate
(2) 
In particular, the SSR determines how many samples are needed when given
reconstruction vectors in order to bound
by .Although the SSR is developed mainly for linear methods there is a strong connection to nonlinear techniques. For example in infinitedimensional compressed sensing a key condition is the socalled balancing property [5], which is very similar to the SSR. When the balancing property is satisfied, the truncated matrix is close to an isometry. Hence, it also ensures the stability of the reconstruction.
The stable sampling rate has already been analysed for different settings. A prominent one is the reconstruction from Fourier measurements. In this case the sampling functions are the complex exponentials. For the reconstruction of wavelets it was shown in [3] that the SSR is linear. The other main measurements are binary. Combined with wavelets they also have a linear SSR [23].
The purpose of this paper is twofold:

First, we want to compare two linear reconstruction methods: generalized sampling [4] and the PBDW approach based on data assimilation [10]. We also show how both linear methods completely rely on the SSR in order to be accurate. Additionally, the nonlinear extension of generalized sampling, which is presented in [34], is included in the comparison.

Second, we provide sharp results on the SSR when considering Walsh functions and Haar wavelets. This can be done by realising the common structure of the Walsh functions and the Haar wavelets. Although the SSR is linear when considering Walsh samples and wavelet reconstructions, sharp results on the SSR for arbitrary Daubechies wavelets are still open. The difficulty is that the higher order Daubechies wavelets share very little structural similarities with the Walsh functions.
2. Reconstruction Methods
In terms of reconstruction methods, there are three different properties that are often desired. The most important are obviously accuracy and stability. However, consistency, meaning that the reconstruction will yield the same samples as the true solution, is also often considered an advantage. Below, we will see how the SSR is crucial for the two former properties.
2.1. Reconstruction and Sampling Space
Throughout the paper . Due to the fact that we are dealing with the dimensional case, we introduce multiindices to make the notation more readable. Let , , be a multiindex. A natural number is in the context with a multiindex interpreted as a multiindex with the same entry, i.e. . Then, we define the addition of two multiindices for by the pointwise addition, i.e. and the sum , where . The multiplication of a multiindex with a real number is understood pointwise as well as the division by a multiindex.
We now discuss the sampling space . As pointed out before, we are interested in binary measurements. Binary measurements come from applications such as fluorescence microscopy or single pixel cameras. Mathematically they are modelled by scalarproducts of our function of interest and sampling functions , , which take only the values or . Without loss of generality, we assume in the further scope of this paper that the functions take values
. The former can be attained by taking an additional measurement with the constant function. For practical reasons it is desirable to have a fast transform for these measurements. Therefore, measurement matrices which consist of random matrices are not desirable. Additionally, it has been shown that the reconstruction quality is better with more structured sampling spaces. Therefore, we use Walsh functions to represent the measurements. Walsh functions obey the advantage that they correspond to a fast transform. Moreover, they are the kernels of the Hadamard transform and the analogue or dyadic version of exponential functions. Hence, they are a natural choice to model binary measurements. Now, we define the Walsh functions. The Walsh functions in higher dimensions can be represented by the tensor product of onedimensional Walsh functions.
Definition 1 (Walsh function [19]).
Let with be the dyadic expansion of . Analogously, let with . The generalized Walsh functions in are given by
(3) 
We extend it to functions in by the tensor product for
(4) 
Notice that the parameter represents the number of zero crossings of the function. For this reason, it is often referred to as sequency, which is similar to frequency in the case of exponential functions. The Walsh functions span the sampling space, i.e. for we have
(5) 
Moreover, Walsh functions can be extended to negative inputs by .
With the help of the Walsh functions we can define the continuous Walsh transform of a function as in [19] almost everywhere by
(6) 
The Walsh functions have the following useful properties. First, they obey the scaling property, i.e. for all and . Second, the multiplicative identity holds, this means , where is the dyadic addition, i.e. the elementwise addition modulo two. These properties are also easily transferred to the Walsh transform. For further information on Walsh functions and transforms see [8, 37, 13].
Direct inversion from a finite number of samples, both in the Fourier and Walsh case, may lead to substantial artefacts such as the Gibbs phenomenon in the Fourier case or block artefacts known from Walsh functions. This can be seen in the numerical experiments in Figures (c)c, (c)c and (b)b. Therefore, it is important to consider reconstruction spaces which represent the data in a better way so that a finite and lowdimensional subspace leads to a good reconstruction. In many different applications, such as image and signal processing and also representations of solution manifolds for PDEs [32], wavelets have become highly popular alternatives. The reconstruction space is spanned by reconstruction functions , . As it is not possible to deal numerically with an infinite number of samples, it also is not possible to reconstruct infinite number of coefficients. Therefore, we examine the reconstruction space . When is used as an approximation for the solution manifold of a PDE, we are also given the approximation error .
In the following, we use wavelets as the reconstruction space due to their good time and frequency localisation. First, we study the onedimensional case. Then, we get to higher dimensions. We use the common notation and denote the mother wavelet with and the corresponding scaling function with . These functions are then scaled and translated. This results in the functions
(7) 
where . The wavelet space at a certain level is given by and the scaling space is given by . Often one is interested in the representation of functions in instead of . For this sake boundary corrected wavelets were introduced in [15]. We are focussing on the version presented in chapter of [15]
, because they still obey the same smoothness and vanishing moments properties as the wavelets they are derived from on the real line. Additionally, they also remain the multiresolution analysis property. We now shortly repeat the construction. The scaling space for boundary corrected wavelets is spanned by the original scaling function
and reflections around of the scaling function , i.e.(8) 
for Daubechies wavelets of order . The boundary corrected Daubechies wavelets still obey the multiresolution analysis. Therefore, it is possible to represent the union of the wavelet spaces up to a certain level by the scaling space at this level, i.e.
(9) 
Hence, it is not necessary for the analysis to have a deeper look in the ordering of the wavelets and their construction for the boundary corrected version as long as we have that the number of coefficients equals the number of elements in that level, i.e. if for some . The reconstruction space is then given by
(10) 
The higherdimensional scaling spaces are constructed by the tensor product of the onedimensional one, such that we get in dimensions
(11) 
for . Note that the corresponding wavelet space is not simply the tensor product of the onedimensional wavelets. Fortunately, this is not often a problem in the analysis, since Daubechies wavelets obey the multiresolution analysis. We will deal with the internal ordering for the one and twodimensional case for the Haar wavelets.
2.2. Reconstruction Techniques
In the following, we present two different interesting reconstruction methods, which are both optimal in their setting. We highlight some advantages and disadvantages. Particularly, we see that the performance of both methods depends highly on the subspace angle between the sampling and the reconstruction space. This gives rise to the discussion in §3.1 about the question for which sampling and reconstruction spaces the stable sampling rate is linear.
2.2.1. PBDWmethod
In [7, 10, 17] the PBDWmethod from [31] is analysed. In contrast to the prominent application for image and signal analysis this method arises from the application with PDEs. One tries to estimate a state of a physical system by solving a parametric family of PDEs
(12) 
where is a differential operator. The parameter may not be known exactly. Therefore, the value of cannot be attained by simply solving the PDE. Hence, other information is necessary. In most applications, one has access to linear measurements , of the state . This alone is not sufficient to estimate or more ambitiously even the parameter . Fortunately, one also has information about the PDE, which can be used to analyse the solution manifold . The solution manifold is usually quite complicated and not given directly. Hence, approximations are used. The method used in this context is the approximation by a sequence of nested finite subspaces
(13) 
where the approximation error is known to be for each subspace . There are a lot of different methods that allow us to construct the spaces , such as the reduced basis method [9, 12, 16, 41] and the use of wavelets [32].
The given setting leads to the goal of trying to merge the data driven and model based information, which leads to the concept of PBDWmethod. Let the measurement be given. The idea is to combine the information given by the measurements and the PDE. This means we are searching for an approximation where
(14) 
The intersection is then the space of possible solutions, i.e. . The aim is to reduce the distance between the approximation and the true solution .
It was shown in [10] that the following approach introduced in [31] is optimal for this task. First, the minimization problem
(15) 
is solved. The solution to the reconstruction problem is then given by the mapping defined by
(16) 
For the performance analysis, we want to compare the reconstruction algorithm with all other mappings . In the following, let represent any alternative reconstruction algorithm which also only depends on the information of in the sampling space .
We first examine the instance optimality. This means we analyse the error for a given measurement , i.e.
(17) 
The algorithm which leads to the smallest constant is called instance optimal. It is clear that the error scales with the distance of the element from the reconstruction space . Due to the fact that we normally do not know a priori, this estimate is not very helpful. Hence, one is interested in the performance for any kind of input . Therefore, the performance of a recovery algorithm on any subset is given by
(18) 
Taking the infimum over the whole class gives the class optimal performance on a given set defined by
(19) 
It is shown in [10] that the presented algorithm is both instance and class optimal and gives the estimate
(20) 
Hence, with this approach we do not get reasonable estimates, if . Moreover, a detailed knowledge about the stable sampling rate is necessary to get a method which can be used in practice. A reconstruction method is of little help if we do not have good error bounds or if the condition number is too high. We will see in the next chapter that the condition number of this reconstruction method is also bounded if the stable sampling rate has been taken into account, see Equation (27). Next, it was shown in [31] that this estimate can even be improved to
(21) 
Note that (21) demonstrates how the PBDWmethod is dependent on the SSR. Moreover, it was shown in [10] that the constant cannot be improved. Thus, the estimate is sharp, which further demonstrates the importance of the SSR.
2.2.2. Generalized Sampling
After the examination of the concept of the PBDWmethod, we want to study the different reconstruction technique generalized sampling. Here, the approach is a stable improvement of concepts as finite section methods [11, 20, 22, 29]. The main difference is that generalized sampling allows for different dimensions on and , whereas in the finite section method they are always the same. However, the finite section method becomes a special case of generalized sampling when the dimensions of the sampling space and reconstruction space are equal. The method is defined now. Afterwards, we discuss how the equation can be attained by solving the equivalent least square problem.
Definition 2 ([1]).
For and we define the reconstruction method of generalized sampling by
(22) 
We also refer to as the generalized sampling reconstruction of .
We stress at this point that generalized sampling is also a linear reconstruction scheme. In particular, Equation (22) is equivalent to solving the following linear equation for .
(23) 
where
(24) 
and . The matrix can be seen in Figure 7 for different sampling and reconstruction spaces. The reconstruction is given by . An interesting fact about this is that the matrix is very illconditioned for most cases. As pointed out in [3], in the case of Fourier samples and wavelet reconstructions, the condition number grows exponentially in . This means that this approach is only feasible due to the above mentioned allowance of a different number of samples and reconstructed coefficients. It can be shown that there exists a certain number of samples such that (22) obeys a solution.
Theorem 1 ([2]).
Let . Then, there exists such that for every Equation (22) has a unique solution for all . Moreover, the smallest is the least number such that .
Just as for the PBDWmethod, the performance bounds of generalized sampling were studied. Here we observe again that the reconstruction quality highly depends on the subspace angle and on the relation between the data and the reconstruction space.
Theorem 2 ([2]).
Retaining the definitions and notations from this chapter, for all we have
(25) 
and
(26) 
In particular, these bounds are sharp.
Additionally, it was shown in [4] that the condition number of the reconstruction method can also be controlled by the SSR, i.e.
(27) 
Hence, not only the accuracy but also the stability is controlled by the SSR.
We conclude that along with mappings that map into the reconstruction space , generalized sampling is also optimal in terms of achieving a low condition number and high accuracy.
2.2.3. Nonlinear extension of generalized sampling
In [34] a consistent approach to generalized sampling was introduced. This means that, in contrast to generalized sampling, the output of the reconstruction method has the same measurements in the sampling space as the input. Let , then it can be represented as . The measurements can be written as , where is the orthogonal projection onto space spanned by the first elements of the canonical bases of , and is defined as in (24). The introduced method solves the nonlinear minimization problem
(28) 
The solution is given by
(29) 
The measurements of and are naturally equal, because of . Hence, this approach is consistent and maps into the reconstruction space. In contrast to generalized sampling, it is not necessary to decide the number of reconstructed coefficients a priori.
In the setting of arbitrary sampling and reconstruction spaces, it was shown that for every number of reconstructed coefficients there exists some such that the method reconstructs the data correct up to its first coefficients. Hence, this approach is convergent as the error is given by , which goes to zero for . The speed of convergence is then dependent upon the sampling and reconstruction space. This speed was analysed in [34] for the case of Fourier measurements and wavelet reconstruction. Let the reconstruction space be given by wavelets and the sampling space be the space representing Fourier measurements. Then, for and the following holds:

If for some and
, the Fourier transform of the scaling function
decays with(30) then there exists some constant independent of (but dependent on and ) such that for , any solution to (28) satisfies
(31) 
If for , for some and , the Fourier transform of the scaling function , the wavelet and their first two derivatives decays with
(32) then there exists some constant independent of (but dependent on , and ) such that for , any solution to (28) satisfies
(33)
For Dauchechies wavelets among other wavelets, the assumptions of this result are satisfied by their construction. The first assumption is fulfilled by all Daubechies wavelets and the second one is fulfilled for Daubechies wavelets of or more vanishing moments. Numerical experiments suggest that even Daubechies wavelets with less vanishing moments might have a linear relationship [34].
2.2.4. Comparison
In this chapter we want to point out the similarity of generalized sampling and the PBDWmethod in terms of optimality and the dependence on the SSR. Additionally, we want to discuss the differences concerning the output space and the consistency. We also include the nonlinear extension of generalized sampling into this comparison and examine the overall robustness.
In Equation (21) and Theorem 2 we see that the error bounds of generalized sampling and the PBDWmethod depend both on . Moreover, since both systems solve the same least square problem
(34) 
as part of the reconstruction, they also have the same condition number . In more detail, we have that the PBDWmethod first solves the generalized sampling problem with a solution , see Equation (15). Then, the solution is tweaked to be consistent, i.e.
(35) 
Moreover, both methods have been proven to be optimal in their setting. This means that linear methods cannot outperform them. Additionally, the bounds are sharp. This underlines the importance of the analysis of the stable sampling rate. Therefore, we dedicate §3 to the investigation of the relation between the dimension of the sampling and the reconstruction space to bound the subspace angle.
Besides the similarities both methods take different parts into consideration for the construction of the algorithms. Generalized sampling ensures that the output is contained in the reconstruction space. This is very desirable when the signal is sparse in the reconstruction bases, because we do not get the artefacts from the measurements in the output signal. Nevertheless, the approach is not consistent. This means that the projection onto the sampling space of the input and output may not be equal.
This is overcome with the nonlinear approach presented in [34]. For this gain in consistency one, unfortunately, has to compromise with the robustness. Stability is considered in the setting, and the used definition is equal to the existence of the condition number of the method. This means that the problem is wellconditioned in terms of solving an minimization problem rather than the reconstruction of from samples. Particularly, this means that the problem is not robust against noise.
Finally, the PBDW method is linear and consistent. This is possible due to the fact that the solution is not forced to stay in the reconstruction space . Remark that (21) shows that the error gets smaller with larger even though we keep the reconstruction space the same. Hence, with increasing we are leaving the reconstruction space and get further away as increases. This can be desirable, if the artefacts from the sampling space are not too severe, as it allows us a mix of properties from the sampling and the reconstruction space. Nevertheless, if the function is very sparse in the reconstruction space and has a lot of artefacts in the sampling space, this approach leads to less impressive reconstructions. The impact is further illustrated in §4 Numerical experiments.
3. Stable Sampling Rate
3.1. Linearity of the Stable Sampling Rate
In §2.2 we saw that the subspace angle between the sampling and the reconstruction space controls the reconstruction accuracy. Therefore, one is interested in the relation between the number of samples and the number of reconstructed coefficients, such that the subspace angle is bounded. In detail, we are interested in the stable sampling rate:
(36) 
The stable sampling rate has been analysed for important cases which appear frequently in practice, i.e. for the FourierWavelet [3], FourierPolynomial bases [24] and WalshWavelet cases [23]. For the reconstruction with wavelets we get for both Fourier and Walsh sampling that the stable sampling rate is linear. This is the best relation one can wish for and means that the methods discussed above are, up to a constant, as good as if one could access the wavelet coefficients directly. In particular, for the Walsh case we have the following theorem.
Theorem 3 ([23]).
Let and be the sampling and reconstruction space spanned by the ddimensional Walsh functions and separable boundary wavelets respectively. Moreover, let with . Then for all there exists such that for all we have . In particular one gets . Hence, the relation holds for all .
A natural question which arises from this theorem is whether it is possible to give sharp bounds on the constant . In Figure 7 we can see the stable sampling rate for different Wavelets and the bound . The slope is unknown in most cases and very difficult to find. This comes from the fact that for the majority of wavelets the reconstruction matrix is not perfectly block diagonal, as in (d)d and (f)f. Hence, one has to take the off diagonals into consideration. The numerics suggest that the slope is higher the further away the reconstruction matrix gets from block diagonal. Only for the case of Haar wavelets and Walsh functions we get that the reconstruction matrix is perfectly block diagonal. This can be seen in (b)b. Note, that from the numerical example one may deduce that . This is indeed the case, and the analysis detailed below establishes that for all
3.2. Sharpness for the Haar wavelet  Walsh case
The sharp bound on can be summarised in the following theorem.
Theorem 4.
Let the sampling space be spanned by the Walsh functions and the reconstruction space by the Haar wavelets in . If for some , then for every we have that the stable sampling rate is the identity, i.e. .
For the proof we first study the behaviour of Haar wavelets in one dimension under the Walsh transform. This way we also get a theoretical argument for the block structure that can be seen in the numerical implementation.
Lemma 1.
Let be the Haar wavelet. Then, we have that
(37) 
Proof.
For the scalarproduct we have
(38)  
(39) 
where . We know from [6] that the function for takes the value on the interval or and on the other one for .
Now, we consider three different cases.
Case 1: .
There exist such that . Then, the function is constant on the interval for any . Note that for we have that the rounding error is bounded as follows
(40) 
Then, we have the interval inclusion
(41)  
(42)  
(43) 
Moreover,
(44) 
Hence, takes the same value on and . Therefore, the two integrals are equal and the scalarproduct vanishes.
Case 2: .
We have for that is equal to on or and on the other. Therefore, we have either
(45)  
(46) 
Or in the other case analogously
(47) 
Now, we are left with the last case.
Case 3: .
There exists an integer such that . This is similar to the first case. Moreover, we have for that
(48) 
With the fact that takes the value on one of the invervals and and on the other, we have that the function takes the values and on half of the interval of . Therefore, the integral vanishes. The same holds true for such that we get the desired result.
∎
Before we prove Theorem 4, we analyse the reconstruction matrix for the Haar wavelet  Walsh case in two dimensions. The number of functions which span the wavelet space in dimensions grows exponentially with . Therefore, we restrict ourselves to two dimensions to underline the main idea. In Figure 11 we can see that the reconstruction matrix in two dimensions has an additional structure in each level. Similar to the onedimensional case we have perfect block structure for the Haar case and nearly block structure for the higher order wavelets. For the analysis of this phenomena in the Haar wavelet  Walsh case we examine the definition and order of the two dimensional Haar wavelet. Note that this is necessary for the analysis of the reconstruction matrix, instead of the SSR. In two dimensions the wavelets are constructed by the tensor product. Due to the multiresolution analysis we have that . Hence, the wavelets can be in three different spaces, , and . In the first space wavelets are constructed by the tensor product of the onedimensional scaling function and the onedimensional wavelet. For the second one the onedimensional wavelet and the onedimensional scaling function are combined by the tensor product. And finally, for the third space we take the tensor products of two onedimensional wavelets. This results for in
(49) 
The scaling function is simply the tensor product of two onedimensional scaling functions:
(50) 
For the order of the reconstruction matrix, we first take the first level scaling function . Then, we increase by levels, in each level we let first go from and then . Finally, we let such that we get for the order of the wavelets: , , , .
Due to the fact that the higher dimensional wavelets are constructed also by means of the scaling functions, it is necessary to analyse the decay rate for the scaling function as well. Moreover, this is also a main ingredient for the proof of Theorem 4 as we represent the union of the wavelet spaces by the scaling space.
Lemma 2.
Let be the Haar scaling function. Then, we have that the Walsh transform obeys the following block and decay structure
(51) 
Proof.
The scalarproduct can be expressed as an integral over the interval .
(52)  
(53) 
We look at the two different cases
Case 1: Remember from Lemma 1 that is constant to or on the interval for . Hence, we get that
(54) 
Case 2: This follows as in Case 3 of Lemma 1. With the difference that we are looking at the integral over the interval instead of the two integrals and . Nevertheless, they vanish for the same reason. ∎
With this in hand we can now state the structure of the reconstruction matrix in two dimensions.
Corollary 1.
Let be the Haar wavelet defined as in (49). Then, the Walsh transform has the following property for
(55) 
(56) 
and for the third version
(57) 
After this detailed analysis of the behaviour of the wavelet and the scaling function under the Walsh transform, we are able to proof Theorem 4.
Proof of Theorem 4.
We want to analyse the subspace angle for . In detail, we are interested in bounding by for all . Hence, we try to show that or equally for . Due to the fact that the circle is compact, and the orthogonal projection is continuous there exists , such that we have
(58) 
The minimal element can be represented by
(59) 
where the multiindex notation is used. Then, we have that
(60)  
(61) 
With (51) we get that this sum vanishes. Hence,
(62) 
as desired. ∎
3.3. Approximation Rate
In this chapter we discuss the approximation rate for Walsh functions and wavelets. With approximation theory we get a good insight in the representation properties of bases. For a given orthonormal basis for we have that for all it holds
Unfortunately, in most applications we cannot store or access for all but only for for some . Hence, instead of the true function we can only have an approximation . In the field of approximation theory one studies for different representation systems how good this estimate is. Particularly, one is interested in the error
(63) 
In general, representation systems are desirable with the property that this error decays very fast in . The reason why is that we get a good representation from only a small amount of information. This results in more efficient compression or less measurement time.
In [6] it is shown that for all continuous functions in the approximation error decays with . The block artefacts are reflected in the poor approximation rate, and it underlines the need of reconstruction methods, which change the basis to achieve better approximation rates.
Daubechies wavelets obey this property. This is analysed in detail in [33]. Consider the representation with Daubechies wavelets of order , and let the data , which we try to represent, be in the Sobolev space for some . Then, we get an improved approximation rate of
(64) 
Hence, for all Daubechies wavelets of order and all functions for some we get an improved decay rate with Daubechies wavelets in contrast to the representation with Walsh functions.
From Theorem 3 we know that the stable sampling rate is linear for Walsh functions and wavelets and in the following chapter we will even see that the constant is reasonably low, i.e. for Daubechies wavelets. This means that with only a constant increase of measurements we get from a decay rate of to an approximation rate of .
Comments
There are no comments yet.