Sampling theory is a core principle in image and signal processing as well as the mathematics of information, data science and inverse problems. Since the early results of Shannon[25, 38, 40] many techniques have been developed, and there are now a myriad of methods available. Moreover, the many applications, such as Magnetic Resonance Imaging (MRI) [30, 21], electron tomography [27, 28], lensless cameras, fluorescence microscopy [39, 36], X-ray computed tomography [14, 35], surface scattering  as well as parametrised PDEs [7, 10, 17], make the field well connected to different areas of the sciences.
A standard sampling model is as follows. We have an element , where is a separable Hilbert space, and the goal is to reconstruct an approximation to from a finite number of linear samples of the form , . In particular, given that the s are linear functionals, we measure the scalar product between and some sampling element , , i.e. . It is important to note that the s cannot be chosen freely, but are dictated by the modality of the sampling device, say an MRI scanner providing Fourier samples or a fluorescence microscope giving binary measurements modelled by Walsh coefficients. A natural question that arises from this setting is, what is the number of samples that is needed for a accurate and stable reconstruction? This can be made explicit by the stable sampling rate (SSR).
To define the SSR we first introduce the sampling space and the reconstruction space. We define the sampling space , meaning the closure of the span. In practice, one can only acquire a finite number of samples, therefore, we denote by the sampling space of the first elements. Similarly, the reconstruction space denoted by is spanned by reconstruction functions , i.e. . As in the sampling case, one has to restrict to a finite reconstruction space, which is denoted by .
The key ingredient in the definition of the SSR is the subspace angle between the subspaces and . In particular,
The orthogonal projection onto the sampling space is denoted by . Mainly, one is interested in the reciprocal value
which, as we will see below, plays a key role in all the error estimates of the two linear algorithms discussed here. Due to the definition of cosine, takes values in . We can now define the stable sampling rate
In particular, the SSR determines how many samples are needed when given
reconstruction vectors in order to boundby .
Although the SSR is developed mainly for linear methods there is a strong connection to non-linear techniques. For example in infinite-dimensional compressed sensing a key condition is the so-called balancing property , which is very similar to the SSR. When the balancing property is satisfied, the truncated matrix is close to an isometry. Hence, it also ensures the stability of the reconstruction.
The stable sampling rate has already been analysed for different settings. A prominent one is the reconstruction from Fourier measurements. In this case the sampling functions are the complex exponentials. For the reconstruction of wavelets it was shown in  that the SSR is linear. The other main measurements are binary. Combined with wavelets they also have a linear SSR .
The purpose of this paper is twofold:
First, we want to compare two linear reconstruction methods: generalized sampling  and the PBDW approach based on data assimilation . We also show how both linear methods completely rely on the SSR in order to be accurate. Additionally, the non-linear extension of generalized sampling, which is presented in , is included in the comparison.
Second, we provide sharp results on the SSR when considering Walsh functions and Haar wavelets. This can be done by realising the common structure of the Walsh functions and the Haar wavelets. Although the SSR is linear when considering Walsh samples and wavelet reconstructions, sharp results on the SSR for arbitrary Daubechies wavelets are still open. The difficulty is that the higher order Daubechies wavelets share very little structural similarities with the Walsh functions.
2. Reconstruction Methods
In terms of reconstruction methods, there are three different properties that are often desired. The most important are obviously accuracy and stability. However, consistency, meaning that the reconstruction will yield the same samples as the true solution, is also often considered an advantage. Below, we will see how the SSR is crucial for the two former properties.
2.1. Reconstruction and Sampling Space
Throughout the paper . Due to the fact that we are dealing with the -dimensional case, we introduce multi-indices to make the notation more readable. Let , , be a multi-index. A natural number is in the context with a multi-index interpreted as a multi-index with the same entry, i.e. . Then, we define the addition of two multi-indices for by the point-wise addition, i.e. and the sum , where . The multiplication of a multi-index with a real number is understood point-wise as well as the division by a multi-index.
We now discuss the sampling space . As pointed out before, we are interested in binary measurements. Binary measurements come from applications such as fluorescence microscopy or single pixel cameras. Mathematically they are modelled by scalarproducts of our function of interest and sampling functions , , which take only the values or . Without loss of generality, we assume in the further scope of this paper that the functions take values
. The former can be attained by taking an additional measurement with the constant function. For practical reasons it is desirable to have a fast transform for these measurements. Therefore, measurement matrices which consist of random matrices are not desirable. Additionally, it has been shown that the reconstruction quality is better with more structured sampling spaces. Therefore, we use Walsh functions to represent the measurements. Walsh functions obey the advantage that they correspond to a fast transform. Moreover, they are the kernels of the Hadamard transform and the analogue or dyadic version of exponential functions. Hence, they are a natural choice to model binary measurements. Now, we define the Walsh functions. The Walsh functions in higher dimensions can be represented by the tensor product of one-dimensional Walsh functions.
Definition 1 (Walsh function ).
Let with be the dyadic expansion of . Analogously, let with . The generalized Walsh functions in are given by
We extend it to functions in by the tensor product for
Notice that the parameter represents the number of zero crossings of the function. For this reason, it is often referred to as sequency, which is similar to frequency in the case of exponential functions. The Walsh functions span the sampling space, i.e. for we have
Moreover, Walsh functions can be extended to negative inputs by .
With the help of the Walsh functions we can define the continuous Walsh transform of a function as in  almost everywhere by
The Walsh functions have the following useful properties. First, they obey the scaling property, i.e. for all and . Second, the multiplicative identity holds, this means , where is the dyadic addition, i.e. the element-wise addition modulo two. These properties are also easily transferred to the Walsh transform. For further information on Walsh functions and transforms see [8, 37, 13].
Direct inversion from a finite number of samples, both in the Fourier and Walsh case, may lead to substantial artefacts such as the Gibbs phenomenon in the Fourier case or block artefacts known from Walsh functions. This can be seen in the numerical experiments in Figures (c)c, (c)c and (b)b. Therefore, it is important to consider reconstruction spaces which represent the data in a better way so that a finite and low-dimensional subspace leads to a good reconstruction. In many different applications, such as image and signal processing and also representations of solution manifolds for PDEs , wavelets have become highly popular alternatives. The reconstruction space is spanned by reconstruction functions , . As it is not possible to deal numerically with an infinite number of samples, it also is not possible to reconstruct infinite number of coefficients. Therefore, we examine the reconstruction space . When is used as an approximation for the solution manifold of a PDE, we are also given the approximation error .
In the following, we use wavelets as the reconstruction space due to their good time and frequency localisation. First, we study the one-dimensional case. Then, we get to higher dimensions. We use the common notation and denote the mother wavelet with and the corresponding scaling function with . These functions are then scaled and translated. This results in the functions
where . The wavelet space at a certain level is given by and the scaling space is given by . Often one is interested in the representation of functions in instead of . For this sake boundary corrected wavelets were introduced in . We are focussing on the version presented in chapter of 
, because they still obey the same smoothness and vanishing moments properties as the wavelets they are derived from on the real line. Additionally, they also remain the multi-resolution analysis property. We now shortly repeat the construction. The scaling space for boundary corrected wavelets is spanned by the original scaling functionand reflections around of the scaling function , i.e.
for Daubechies wavelets of order . The boundary corrected Daubechies wavelets still obey the multi-resolution analysis. Therefore, it is possible to represent the union of the wavelet spaces up to a certain level by the scaling space at this level, i.e.
Hence, it is not necessary for the analysis to have a deeper look in the ordering of the wavelets and their construction for the boundary corrected version as long as we have that the number of coefficients equals the number of elements in that level, i.e. if for some . The reconstruction space is then given by
The higher-dimensional scaling spaces are constructed by the tensor product of the one-dimensional one, such that we get in dimensions
for . Note that the corresponding wavelet space is not simply the tensor product of the one-dimensional wavelets. Fortunately, this is not often a problem in the analysis, since Daubechies wavelets obey the multi-resolution analysis. We will deal with the internal ordering for the one- and two-dimensional case for the Haar wavelets.
2.2. Reconstruction Techniques
In the following, we present two different interesting reconstruction methods, which are both optimal in their setting. We highlight some advantages and disadvantages. Particularly, we see that the performance of both methods depends highly on the subspace angle between the sampling and the reconstruction space. This gives rise to the discussion in §3.1 about the question for which sampling and reconstruction spaces the stable sampling rate is linear.
In [7, 10, 17] the PBDW-method from  is analysed. In contrast to the prominent application for image and signal analysis this method arises from the application with PDEs. One tries to estimate a state of a physical system by solving a parametric family of PDEs
where is a differential operator. The parameter may not be known exactly. Therefore, the value of cannot be attained by simply solving the PDE. Hence, other information is necessary. In most applications, one has access to linear measurements , of the state . This alone is not sufficient to estimate or more ambitiously even the parameter . Fortunately, one also has information about the PDE, which can be used to analyse the solution manifold . The solution manifold is usually quite complicated and not given directly. Hence, approximations are used. The method used in this context is the approximation by a sequence of nested finite subspaces
where the approximation error is known to be for each subspace . There are a lot of different methods that allow us to construct the spaces , such as the reduced basis method [9, 12, 16, 41] and the use of wavelets .
The given setting leads to the goal of trying to merge the data driven and model based information, which leads to the concept of PBDW-method. Let the measurement be given. The idea is to combine the information given by the measurements and the PDE. This means we are searching for an approximation where
The intersection is then the space of possible solutions, i.e. . The aim is to reduce the distance between the approximation and the true solution .
is solved. The solution to the reconstruction problem is then given by the mapping defined by
For the performance analysis, we want to compare the reconstruction algorithm with all other mappings . In the following, let represent any alternative reconstruction algorithm which also only depends on the information of in the sampling space .
We first examine the instance optimality. This means we analyse the error for a given measurement , i.e.
The algorithm which leads to the smallest constant is called instance optimal. It is clear that the error scales with the distance of the element from the reconstruction space . Due to the fact that we normally do not know a priori, this estimate is not very helpful. Hence, one is interested in the performance for any kind of input . Therefore, the performance of a recovery algorithm on any subset is given by
Taking the infimum over the whole class gives the class optimal performance on a given set defined by
It is shown in  that the presented algorithm is both instance and class optimal and gives the estimate
Hence, with this approach we do not get reasonable estimates, if . Moreover, a detailed knowledge about the stable sampling rate is necessary to get a method which can be used in practice. A reconstruction method is of little help if we do not have good error bounds or if the condition number is too high. We will see in the next chapter that the condition number of this reconstruction method is also bounded if the stable sampling rate has been taken into account, see Equation (27). Next, it was shown in  that this estimate can even be improved to
Note that (21) demonstrates how the PBDW-method is dependent on the SSR. Moreover, it was shown in  that the constant cannot be improved. Thus, the estimate is sharp, which further demonstrates the importance of the SSR.
2.2.2. Generalized Sampling
After the examination of the concept of the PBDW-method, we want to study the different reconstruction technique generalized sampling. Here, the approach is a stable improvement of concepts as finite section methods [11, 20, 22, 29]. The main difference is that generalized sampling allows for different dimensions on and , whereas in the finite section method they are always the same. However, the finite section method becomes a special case of generalized sampling when the dimensions of the sampling space and reconstruction space are equal. The method is defined now. Afterwards, we discuss how the equation can be attained by solving the equivalent least square problem.
Definition 2 ().
For and we define the reconstruction method of generalized sampling by
We also refer to as the generalized sampling reconstruction of .
We stress at this point that generalized sampling is also a linear reconstruction scheme. In particular, Equation (22) is equivalent to solving the following linear equation for .
and . The matrix can be seen in Figure 7 for different sampling and reconstruction spaces. The reconstruction is given by . An interesting fact about this is that the matrix is very ill-conditioned for most cases. As pointed out in , in the case of Fourier samples and wavelet reconstructions, the condition number grows exponentially in . This means that this approach is only feasible due to the above mentioned allowance of a different number of samples and reconstructed coefficients. It can be shown that there exists a certain number of samples such that (22) obeys a solution.
Theorem 1 ().
Let . Then, there exists such that for every Equation (22) has a unique solution for all . Moreover, the smallest is the least number such that .
Just as for the PBDW-method, the performance bounds of generalized sampling were studied. Here we observe again that the reconstruction quality highly depends on the subspace angle and on the relation between the data and the reconstruction space.
Theorem 2 ().
Retaining the definitions and notations from this chapter, for all we have
In particular, these bounds are sharp.
Additionally, it was shown in  that the condition number of the reconstruction method can also be controlled by the SSR, i.e.
Hence, not only the accuracy but also the stability is controlled by the SSR.
We conclude that along with mappings that map into the reconstruction space , generalized sampling is also optimal in terms of achieving a low condition number and high accuracy.
2.2.3. Non-linear extension of generalized sampling
In  a consistent approach to generalized sampling was introduced. This means that, in contrast to generalized sampling, the output of the reconstruction method has the same measurements in the sampling space as the input. Let , then it can be represented as . The measurements can be written as , where is the orthogonal projection onto space spanned by the first elements of the canonical bases of , and is defined as in (24). The introduced method solves the non-linear minimization problem
The solution is given by
The measurements of and are naturally equal, because of . Hence, this approach is consistent and maps into the reconstruction space. In contrast to generalized sampling, it is not necessary to decide the number of reconstructed coefficients a priori.
In the setting of arbitrary sampling and reconstruction spaces, it was shown that for every number of reconstructed coefficients there exists some such that the method reconstructs the data correct up to its first coefficients. Hence, this approach is convergent as the error is given by , which goes to zero for . The speed of convergence is then dependent upon the sampling and reconstruction space. This speed was analysed in  for the case of Fourier measurements and wavelet reconstruction. Let the reconstruction space be given by wavelets and the sampling space be the space representing Fourier measurements. Then, for and the following holds:
If for , for some and , the Fourier transform of the scaling function , the wavelet and their first two derivatives decays with
then there exists some constant independent of (but dependent on , and ) such that for , any solution to (28) satisfies
For Dauchechies wavelets among other wavelets, the assumptions of this result are satisfied by their construction. The first assumption is fulfilled by all Daubechies wavelets and the second one is fulfilled for Daubechies wavelets of or more vanishing moments. Numerical experiments suggest that even Daubechies wavelets with less vanishing moments might have a linear relationship .
In this chapter we want to point out the similarity of generalized sampling and the PBDW-method in terms of optimality and the dependence on the SSR. Additionally, we want to discuss the differences concerning the output space and the consistency. We also include the non-linear extension of generalized sampling into this comparison and examine the overall robustness.
as part of the reconstruction, they also have the same condition number . In more detail, we have that the PBDW-method first solves the generalized sampling problem with a solution , see Equation (15). Then, the solution is tweaked to be consistent, i.e.
Moreover, both methods have been proven to be optimal in their setting. This means that linear methods cannot outperform them. Additionally, the bounds are sharp. This underlines the importance of the analysis of the stable sampling rate. Therefore, we dedicate §3 to the investigation of the relation between the dimension of the sampling and the reconstruction space to bound the subspace angle.
Besides the similarities both methods take different parts into consideration for the construction of the algorithms. Generalized sampling ensures that the output is contained in the reconstruction space. This is very desirable when the signal is sparse in the reconstruction bases, because we do not get the artefacts from the measurements in the output signal. Nevertheless, the approach is not consistent. This means that the projection onto the sampling space of the input and output may not be equal.
This is overcome with the non-linear approach presented in . For this gain in consistency one, unfortunately, has to compromise with the robustness. Stability is considered in the setting, and the used definition is equal to the existence of the condition number of the method. This means that the problem is well-conditioned in terms of solving an minimization problem rather than the reconstruction of from samples. Particularly, this means that the problem is not robust against noise.
Finally, the PBDW method is linear and consistent. This is possible due to the fact that the solution is not forced to stay in the reconstruction space . Remark that (21) shows that the error gets smaller with larger even though we keep the reconstruction space the same. Hence, with increasing we are leaving the reconstruction space and get further away as increases. This can be desirable, if the artefacts from the sampling space are not too severe, as it allows us a mix of properties from the sampling and the reconstruction space. Nevertheless, if the function is very sparse in the reconstruction space and has a lot of artefacts in the sampling space, this approach leads to less impressive reconstructions. The impact is further illustrated in §4 Numerical experiments.
3. Stable Sampling Rate
3.1. Linearity of the Stable Sampling Rate
In §2.2 we saw that the subspace angle between the sampling and the reconstruction space controls the reconstruction accuracy. Therefore, one is interested in the relation between the number of samples and the number of reconstructed coefficients, such that the subspace angle is bounded. In detail, we are interested in the stable sampling rate:
The stable sampling rate has been analysed for important cases which appear frequently in practice, i.e. for the Fourier-Wavelet , Fourier-Polynomial bases  and Walsh-Wavelet cases . For the reconstruction with wavelets we get for both Fourier and Walsh sampling that the stable sampling rate is linear. This is the best relation one can wish for and means that the methods discussed above are, up to a constant, as good as if one could access the wavelet coefficients directly. In particular, for the Walsh case we have the following theorem.
Theorem 3 ().
Let and be the sampling and reconstruction space spanned by the d-dimensional Walsh functions and separable boundary wavelets respectively. Moreover, let with . Then for all there exists such that for all we have . In particular one gets . Hence, the relation holds for all .
A natural question which arises from this theorem is whether it is possible to give sharp bounds on the constant . In Figure 7 we can see the stable sampling rate for different Wavelets and the bound . The slope is unknown in most cases and very difficult to find. This comes from the fact that for the majority of wavelets the reconstruction matrix is not perfectly block diagonal, as in (d)d and (f)f. Hence, one has to take the off diagonals into consideration. The numerics suggest that the slope is higher the further away the reconstruction matrix gets from block diagonal. Only for the case of Haar wavelets and Walsh functions we get that the reconstruction matrix is perfectly block diagonal. This can be seen in (b)b. Note, that from the numerical example one may deduce that . This is indeed the case, and the analysis detailed below establishes that for all
3.2. Sharpness for the Haar wavelet - Walsh case
The sharp bound on can be summarised in the following theorem.
Let the sampling space be spanned by the Walsh functions and the reconstruction space by the Haar wavelets in . If for some , then for every we have that the stable sampling rate is the identity, i.e. .
For the proof we first study the behaviour of Haar wavelets in one dimension under the Walsh transform. This way we also get a theoretical argument for the block structure that can be seen in the numerical implementation.
Let be the Haar wavelet. Then, we have that
For the scalarproduct we have
where . We know from  that the function for takes the value on the interval or and on the other one for .
Now, we consider three different cases.
Case 1: . There exist such that . Then, the function is constant on the interval for any . Note that for we have that the rounding error is bounded as follows
Then, we have the interval inclusion
Hence, takes the same value on and . Therefore, the two integrals are equal and the scalarproduct vanishes.
Case 2: . We have for that is equal to on or and on the other. Therefore, we have either
Or in the other case analogously
Now, we are left with the last case.
Case 3: . There exists an integer such that . This is similar to the first case. Moreover, we have for that
With the fact that takes the value on one of the invervals and and on the other, we have that the function takes the values and on half of the interval of . Therefore, the integral vanishes. The same holds true for such that we get the desired result.
Before we prove Theorem 4, we analyse the reconstruction matrix for the Haar wavelet - Walsh case in two dimensions. The number of functions which span the wavelet space in dimensions grows exponentially with . Therefore, we restrict ourselves to two dimensions to underline the main idea. In Figure 11 we can see that the reconstruction matrix in two dimensions has an additional structure in each level. Similar to the one-dimensional case we have perfect block structure for the Haar case and nearly block structure for the higher order wavelets. For the analysis of this phenomena in the Haar wavelet - Walsh case we examine the definition and order of the two dimensional Haar wavelet. Note that this is necessary for the analysis of the reconstruction matrix, instead of the SSR. In two dimensions the wavelets are constructed by the tensor product. Due to the multi-resolution analysis we have that . Hence, the wavelets can be in three different spaces, , and . In the first space wavelets are constructed by the tensor product of the one-dimensional scaling function and the one-dimensional wavelet. For the second one the one-dimensional wavelet and the one-dimensional scaling function are combined by the tensor product. And finally, for the third space we take the tensor products of two one-dimensional wavelets. This results for in
The scaling function is simply the tensor product of two one-dimensional scaling functions:
For the order of the reconstruction matrix, we first take the first level scaling function . Then, we increase by levels, in each level we let first go from and then . Finally, we let such that we get for the order of the wavelets: , , , .
Due to the fact that the higher dimensional wavelets are constructed also by means of the scaling functions, it is necessary to analyse the decay rate for the scaling function as well. Moreover, this is also a main ingredient for the proof of Theorem 4 as we represent the union of the wavelet spaces by the scaling space.
Let be the Haar scaling function. Then, we have that the Walsh transform obeys the following block and decay structure
The scalarproduct can be expressed as an integral over the interval .
We look at the two different cases
Case 1: Remember from Lemma 1 that is constant to or on the interval for . Hence, we get that
Case 2: This follows as in Case 3 of Lemma 1. With the difference that we are looking at the integral over the interval instead of the two integrals and . Nevertheless, they vanish for the same reason. ∎
With this in hand we can now state the structure of the reconstruction matrix in two dimensions.
Let be the Haar wavelet defined as in (49). Then, the Walsh transform has the following property for
and for the third version
After this detailed analysis of the behaviour of the wavelet and the scaling function under the Walsh transform, we are able to proof Theorem 4.
Proof of Theorem 4.
We want to analyse the subspace angle for . In detail, we are interested in bounding by for all . Hence, we try to show that or equally for . Due to the fact that the circle is compact, and the orthogonal projection is continuous there exists , such that we have
The minimal element can be represented by
where the multi-index notation is used. Then, we have that
With (51) we get that this sum vanishes. Hence,
as desired. ∎
3.3. Approximation Rate
In this chapter we discuss the approximation rate for Walsh functions and wavelets. With approximation theory we get a good insight in the representation properties of bases. For a given orthonormal basis for we have that for all it holds
Unfortunately, in most applications we cannot store or access for all but only for for some . Hence, instead of the true function we can only have an approximation . In the field of approximation theory one studies for different representation systems how good this estimate is. Particularly, one is interested in the error
In general, representation systems are desirable with the property that this error decays very fast in . The reason why is that we get a good representation from only a small amount of information. This results in more efficient compression or less measurement time.
In  it is shown that for all continuous functions in the approximation error decays with . The block artefacts are reflected in the poor approximation rate, and it underlines the need of reconstruction methods, which change the basis to achieve better approximation rates.
Daubechies wavelets obey this property. This is analysed in detail in . Consider the representation with Daubechies wavelets of order , and let the data , which we try to represent, be in the Sobolev space for some . Then, we get an improved approximation rate of
Hence, for all Daubechies wavelets of order and all functions for some we get an improved decay rate with Daubechies wavelets in contrast to the representation with Walsh functions.
From Theorem 3 we know that the stable sampling rate is linear for Walsh functions and wavelets and in the following chapter we will even see that the constant is reasonably low, i.e. for Daubechies wavelets. This means that with only a constant increase of measurements we get from a decay rate of to an approximation rate of .