# Sparse non-negative super-resolution - simplified and stabilised

The convolution of a discrete measure, x=∑_i=1^ka_iδ_t_i, with a local window function, ϕ(s-t), is a common model for a measurement device whose resolution is substantially lower than that of the objects being observed. Super-resolution concerns localising the point sources {a_i,t_i}_i=1^k with an accuracy beyond the essential support of ϕ(s-t), typically from m samples y(s_j)=∑_i=1^k a_iϕ(s_j-t_i)+η_j, where η_j indicates an inexactness in the sample value. We consider the setting of x being non-negative and seek to characterise all non-negative measures approximately consistent with the samples. We first show that x is the unique non-negative measure consistent with the samples provided the samples are exact, i.e. η_j=0, m> 2k+1 samples are available, and ϕ(s-t) generates a Chebyshev system. This is independent of how close the sample locations are and does not rely on any regulariser beyond non-negativity; as such, it extends and clarifies the work by Schiebinger et al. and De Castro et al., who achieve the same results but require a total variation regulariser, which we show is unnecessary. Moreover, we characterise non-negative solutions x̂ consistent with the samples within the bound ∑_j=1^mη_j^2<δ^2. Any such non-negative measure is within O(δ^1/7) of the discrete measure x generating the samples in the generalised Wasserstein distance, converging to one another as δ approaches zero. We also show how to make these general results, for windows that form a Chebyshev system, precise for the case of ϕ(s-t) being a Gaussian window. The main innovation of these results is that non-negativity alone is sufficient to localise point sources beyond the essential sensor resolution.

## Authors

• 12 publications
• 15 publications
• 7 publications
• 2 publications
• 18 publications
07/10/2019

### Super-resolution meets machine learning: approximation of measures

The problem of super-resolution in general terms is to recuperate a fini...
12/26/2018

### Generalized Score Matching for Non-Negative Data

A common challenge in estimating parameters of probability density funct...
09/04/2019

### Learning Distributions Generated by One-Layer ReLU Networks

We consider the problem of estimating the parameters of a d-dimensional ...
12/17/2018

### Information theoretical clustering is hard to approximate

An impurity measures I: R^d R^+ is a function that assigns a d-dimension...
05/24/2018

### Stable Super-Resolution of Images

We study the ubiquitous problem of super-resolution in which one aims at...
09/30/2021

### Adversarial Regression with Doubly Non-negative Weighting Matrices

Many machine learning tasks that involve predicting an output response c...
07/05/2017

### ProtoDash: Fast Interpretable Prototype Selection

In this paper we propose an efficient algorithm ProtoDash for selecting ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Super-resolution concerns recovering a resolution beyond the essential size of the point spread function of a sensor. For instance, a particularly stylised example concerns multiple point sources which, because of the finite resolution or bandwidth of the sensor, may not be visually distinguishable. Various instances of this problem exist in applications such as astronomy [3], imaging in chemistry, medicine and neuroscience [4, 5, 6, 7, 8, 9, 10, 11]

, spectral estimation

[12, 13], geophysics [14], and system identification [15]. Often in these application much is known about the point spread function of the sensor, or can be estimated and, given such model information, it is possible to identify point source locations with accuracy substantially below the essential width of the sensor point spread function. Recently there has been substantial interest from the mathematical community in posing algorithms and proving super-resolution guarantees in this setting, see for instance [16, 17, 18, 19, 20, 21, 22, 23]. Typically these approaches borrow notions from compressed sensing [24, 25, 26]. In particular, the aforementioned contributions to super-resolution consider what is known as the Total Variation norm minimisation over measures which are consistent with the samples. In this manuscript we show first that, for suitable point spread functions, such as the Gaussian, any discrete non-negative measure composed of point sources is uniquely defined from of its samples, and moreover that this uniqueness is independent of the separation between the point sources. We then show that by simply imposing non-negativity, which is typical in many applications, any non-negative measure suitably consistent with the samples is similarly close to the discrete non-negative measure which would generate the noise free samples. These results substantially simply results by [1, 2] and show that, while regularisers such as Total Variation may be particularly effective, in the setting of non-negative point sources such regularisers are not necessary to achieve stability.

### 1.1 Problem setup

Throughout this manuscript we consider non-negative measures in relation to discrete measures. To be concrete, let be a -discrete non-negative Borel measure supported on the interval , given by

 x=k∑i=1ai⋅δtiwithai>0% and ti∈int(I) for all i. (1)

Consider also real-valued and continuous functions and let be the possibly noisy measurements collected from by convolving against sampling functions :

 yj=∫Iϕj(t)x(\dift)+ηj=k∑i=1aiϕj(ti)+ηj, (2)

where with can represent additive noise. Organising the samples from (2) in matrix notation by letting

 y:=[y1⋯ym]T∈Rm,Φ(t):=[ϕ1(t)⋯ϕm(t)]T∈Rm (3)

allows us to state the program we investigate:

 (4)

with . Herein we characterise non-negative measures consistent with measurements (2) in relation to the discrete measure (1). That is, we consider any non-negative Borel measure from the Program (4) 111An equivalent formulation of Program (4) minimises over all non-negative measures on (without any constraints). In this context, however, we find it somewhat more intuitive to work with Program (4), particularly considering the importance of the case . and show that any such is close to given by (1) in an appropriate metric, see Theorems 4, 5, 11, 12 and 13. Moreover, we show that the from (1) is the unique solution to Program (4) when ; e.g. in the setting of exact samples, for all . Program (4) is particularly notable in that there is no regulariser of beyond imposing non-negativity and, rather than specify an algorithm to select a which satisfies Program (4), we consider all admissible solutions. The admissible solutions of Program (4) are determined by the source and sample locations, which we denote as

 T={ti}ki=1⊂int(I) and S={sj}mj=1⊆I (5)

respectively, as well as the particular functions used to sample the -sparse non-negative measure from (1). Lastly, we introduce the notions of minimum separation and sample proximity, which we use to characterise solutions of Program (4).

###### Definition 1.

(Minimum separation and sample proximity) For finite , let be the minimum separation between the points in along with the endpoints of , namely

 Δ(T)=minTi,Tj∈~T,i≠j|Ti−Tj|. (6)

We define the sample proximity to be the number such that, for each source location , there exists a closest sample location to with

 |ti−sl(i)|≤λΔ(T). (7)

We describe the nearness of solutions to Program (4) in terms of an additional parameter associated with intervals around the sources ; that is we let and define intervals as:

 Ti,ϵ:={t:|t−ti|≤ϵ}∩I,i∈[k],Tϵ:=k⋃i=1Ti,ϵ, (8)

where , and set and to be the complements of these sets with respect to . In order to make the most general result of Theorems 11 and 12 more interpretable, we turn to presenting them in Section 1.2 for the case of being shifted Gaussians.

### 1.2 Main results simplified to Gaussian window

In this section we consider to be shifted Gaussians with centres at the source locations , specifically

 ϕj(t)=g(t−sj)=e−(t−sj)2σ2. (9)

We might interpret (9) as the “point spread function” of the sensing mechanism being a Gaussian window and the sample locations in the sense that

 ∫Iϕj(t)x(\dift)=∫Ig(t−sj)x(\dift)=(g⋆x)(sj),∀j∈[m], (10)

evaluates the “filtered” copy of at locations where denotes convolution.

As an illustration, Figure 1 shows the discrete measure in blue for , the continuous function in red, and the noisy samples at the sample locations represented as the black circles.

The conditions we impose to ensure stability of Program (4) for Gaussian as in (9) are as follows:

###### Conditions 2.

(Gaussian window conditions) When the window function is a Gaussian , we require its width and the source and sampling locations from (5) to satisfy the following conditions:

1. Samples define the interval boundaries: and ,

2. Samples near sources: for every , there exists a pair of samples , one on each side of , such that and , for small enough; which is quantified in Lemma 24.

3. Sources away from the boundary: for every and ,

4. Minimum separation of sources: and , where the minimum separation of the sources is defined in Definition 1.

The four properties in Conditions 2 can be interpreted as follows: Property 1 imposes that the sources are within the interval defined by the minimum and maximum sample; Property 2 ensures that there is a pair of samples near each source which translates into a sampling density condition in relation to the minimum separation between sources and in particular requires the number of samples ; Property 3 is a technical condition to ensure sources are not overly near the sampling boundary; and Property 4 relates the minimum separation between the sources to the width of the Gaussian window.

We can now present our main results on the robustness of Program (4) as they apply to the Gaussian window; these are Theorem 4, which follows from Theorem 11, and Theorem 5, which follows from Theorem 12. However, before stating the stability results, it is important to note that, in the setting of exact samples, , the solution of Program (4) is unique when .

###### Proposition 3.

(Uniqueness of exactly sampled sparse non-negative measures for Gaussian) Let be a non-negative -sparse discrete measure supported on , see (1). If , and are shifted Gaussians as in (9), then is the unique solution of Program (4) with .

Proposition 3 states that Program (4) successfully localises the impulses present in given only measurements when are shifted Gaussians whose centres are in . Theorems 4 and 5 extend this uniqueness condition to show that any solution to Program (4) with is proportionally close to the unique solution when .

###### Theorem 4.

(Wasserstein stability of Program (4) for Gaussian) Let and consider a -sparse non-negative measure supported on . Consider also an arbitrary increasing sequence and, for positive , let be defined in (9), which form according to (3). If and Conditions 2 hold, then Program (4) with is stable in the sense that

 dGW(x,ˆx)≤F1⋅δ+∥x∥TV⋅ϵ (11)

for all where is the generalised Wasserstein distance as defined in (19) and the exact expression of is given in the proof (see (64) in Section 3.4.2). In particular, for and , we have:

 F1(k,Δ(T),1σ,1ϵ,η)

if

 η≤min⎧⎪⎨⎪⎩c3σ6(1−3σ2)(k+1)32,c4¯C16σ23(k+1)13⎫⎪⎬⎪⎭, (13)

where are universal constants and is given by (59) in Section 3.4.1

The central feature of Theorem 4 is that the proportionality to and of the Wasserstein distance between any solution to Program (4) and the unique solution for is of the form (11). The particular form of is not believed to be sharp; in particular, the exponential dependence on in (12) follows from bounding the determinant of a matrix similar to (see (128

)) by a lower bound on the minimum eigenvalue to the

power. The scaling with respect to is a feature of in Program (4) not being normalized with respect to which, for and fixed, decays with due to the increased localisation of the Gaussian. Note that the dependence is a feature of the proof and the which minimises the bound in (11) is proportional to to some power as determined by from (12). Theorem 4 follows from the more general result of Theorem 11, whose proof is given in Section 3 and the appendices.

As an alternative to showing stability of Program (4) in the Wasserstein distance, we also prove in Theorem 5 that any solution to Program (4) is locally consistent with the discrete measure in terms of local averages over intervals as given in (8). Moreover, for Theorem 5, we make Property 2 of Conditions 2 more transparent by using the sample proximity from Definition 1; that is, defined in Conditions 2 is related to the sample proximity from Definition 1 by .

###### Theorem 5.

(Average stability of Program (4) for Gaussian: source proximity dependence) Let and consider a k-sparse non-negative measure supported on and sample locations as given in (5) and for positive , let as defined in (9). If the Conditions 2 hold, then, in the presence of additive noise, Program (4) is stable and it holds that, for any solution of Program (4) with :

 ∣∣∣∫Ti,ϵ^x(\dift)−ai∣∣∣≤[(c1+F2)⋅δ+c2∥^x∥TVσ2⋅ϵ]F3, (14) ∫TCϵ^x(\dift)≤F2⋅δ, (15)

where the exact expressions of and are given in the proof (see (70) in Section 3.4.3), provided that , and satisfy (27). In particular, for , and , we have and:

 F2(k,Δ(T),1σ,1ϵ)

Above, are universal constants and is given by (60) in Section 3.4.1.

The bounds in Theorems 4 and 5 are intentionally similar, and though their proofs make use of the same bounds, they have some fundamental differences. While both (11) and (14) have the same proportionality to and , the role of in particular differs substantially in that Theorem 5 considers averages of over . Also different in their form is the dependence on and in Theorems 4 and 5 respectively. The presence of in Theorem 5 is a feature of the proof which we expect can be removed and replaced with by proving any solution of Program (4) is necessarily bounded due to the sampling proximity condition of Definition 1. It is also worth noting that (14) avoids an unnatural dependence present in (11). Theorem 5 follows from the more general result of Theorem 12, whose proof is given in Section 3.4.3.

Lastly, we give a corollary of Theorems 4 and 5 where we show that, for but sufficiently small, one can equate the and dependent terms in Theorems 4 and 5 to show that their respective errors approach zero as goes to zero.

###### Corollary 6.

Under the conditions in Theorems 4 and 5 and for , and , there exists such that:

 dGW(x,^x)≤¯C1⋅δ17, (17) ∣∣∣∫Ti,ϵ^x(\dift)−ai∣∣∣≤¯C2⋅δ16, (18)

for all , where and are given in the proof in Section 3.4.4.

### 1.3 Organisation and summary of contributions

##### Organisation:

The majority of our contributions were presented in the context of Gaussian windows in Section 1.2. These are particular examples of a more general theory for windows that form a Chebyshev system, commonly abbreviated as T-system, see Definition 7. A T-system is a collection of continuous functions that loosely behave like algebraic monomials. It is a general and widely-used concept in classical approximation theory [27, 28, 29] that has also found applications in modern signal processing [1, 2]. The framework we use for these more general results is presented in Section 2.1, the results presented in Section 2.2, and their proof sketched in Section 3. Proofs of the lemmas used to develop the results are deferred to the appendices.

##### Summary of contributions:

We begin discussing results for general window function with Proposition 8, which establishes that for exact samples, namely , a T-system, and from measurements, the unique solution to Program (4) with is the -sparse measure given in (1). In other words, we show that the measurement operator in (3) is an injective map from -sparse non-negative measures on to when form a T-system. No minimum separation between impulses is necessary here and need only to be continuous. As detailed in Section 1.4, Proposition 8 is more general and its derivation is far simpler and more intuitive than what the current literature offers. Most importantly, no explicit regularisation is needed in Program (4) to encourage sparsity: the solution is unique.

Our main contributions are given in Theorems 11 and 12, namely that solutions to Program (4) with are proportionally close to the unique solution (1) with ; these theorems consider nearness in terms of the Wasserstein distance and local averages respectively. Furthermore, Theorem 11 allows to be a general non-negative measure, and shows that solutions to Program (4) must be proportional to both how well might be approximated by a -sparse measure, , with minimum source separation , and a proportional distance between and solutions to Program (4). These theorems require and loosely-speaking the measurement apparatus forms a T*-system, which is an extension of a T-system to allow the inclusion of an additional function which may be discontinuous, and enforcing certain properties of minors of . To derive the bounds in Theorems 4 and 5 we show that shifted Gaussians as given in (9) augmented with a particular piecewise constant function form a T*-system.

Lastly, in Section 2.2.1, we consider an extension of Theorem 12 where the minimum separation between sources is smaller than . We extend the intervals from (8) to in (31), where intervals which overlap are combined. The resulting Theorem 13 establishes that, while sources closer than may not be identifiable individually by Program (4), the local average over of both in (1) and any solution to Program (4) will be proportionally within of each other.

To summarise, the results and analysis in this work simplify, generalise and extend the existing results for grid-free and non-negative super-resolution. These extensions follow by virtue of the non-negativity constraint in Program (4), rather than the common approach based on the TV norm as a sparsifying penalty. We further put these results in the context of existing literature in Section 1.4.

### 1.4 Comparison with other techniques

We show in Proposition 8 that a non-negative -sparse discrete measure can be exactly reconstructed from samples (provided that the atoms form a -system, a property satisfied by Gaussian windows for example) by solving a feasibility problem. This result is in contrast to earlier results in which a TV norm minimisation problem is solved. De Castro and Gamboa [2] proved exact reconstruction using TV norm minimisation, provided the atoms form a homogeneous T-system (one which includes the constant function). An analysis of TV norm minimisation based on T-systems was subsequently given by Schiebinger et al. in  [1], where it was also shown that Gaussian windows satisfy the given conditions. We show in this paper that the TV norm can be entirely dispensed with in the case of non-negative super-resolution. Moreover, analysis of Program (4) is substantially simpler than its alternatives. In particular, Proposition 8 for noise-free super-resolution immediately follows from the standard results in the theory of T-systems. The fact that Gaussian windows form a T-system is immediately implied by well-known results in the T-system theory, in contrast to the heavy calculations involved in [1].

While neither of the above works considers the noisy setting or model mismatch, Theorems 11 and 12 in our work show that solutions to the non-negative super-resolution problem which are both stable to measurement noise and model inaccuracy can also be obtained by solving a feasibility program. The most closely related prior work is by Doukhan and Gamboa [30], in which the authors bound the maximum distance between a sparse measure and any other measure satisfying noise-corrupted versions of the same measurements. While [30]

does not explicitly consider reconstruction using the TV norm, the problem is posed over probability measures, that is those with TV norm equal to one. Accuracy is captured according to the Prokhorov metric. It is shown that, for sufficiently small noise the Prokhorov distance between the measures is bounded by

, where is the noise level and depends upon properties of the window function. In contrast, we do not make any total variation restrictions on the underlying sparse measure, we extend to consider model inaccuracy and we consider different error metrics (the generalised Wasserstein distance and the local averaged error).

More recent results on noisy non-negative super-resolution all assume that an optimisation problem involving the TV norm is solved. Denoyelle et al. [21] consider the non-negative super-resolution problem with a minimum separation between source locations. They analyse a TV norm-penalized least squares problem and show that a -sparse discrete measure can be stably approximated provided the noise scales with , showing that the minimum separation condition exhibits a certain stability to noise. In the gridded setting, stability results for noisy non-negative super-resolution were obtained in the case of Fourier convolution kernels in [31] under the assumption that the spike locations satisfy a Rayleigh regularity property, and these results were extended to the case of more general convolution kernels in [32].

Super-resolution in the more general setting of signed measures has been extensively studied. In this case, the story is rather different, and stable identification is only possible if sources satisfy some separation condition. The required minimum separation is dictated by the resolution of the sensing system, e.g., the Rayleigh limit of the optical system or the bandwidth of the radar receiver. Indeed, it is impossible to resolve extremely close sources with equal amplitudes of opposite signs; they nearly cancel out, contributing virtually nothing to the measurements. A non-exhaustive list of references is [33, 17, 18, 19, 20, 22, 23].

In Theorem 12 we give an explicit dependence of the error on the sampling locations. This result relies on local windows, hence it requires samples near each source, and we give a condition that this distance must satisfy. The condition that there are samples near each source in order to guarantee reconstruction also appears in a recent manuscript on sparse deconvolution [34]. However, this work relies on the minimum separation and differentiability of the convolution kernel, which we overcome in Theorem 12.

## 2 Stability of Program (4) to inexact samples for ϕj(t) T-systems

The main results stated in the introduction, Theorems 4 and 5, are for Gaussian windows, which allows the results to omit technical details of the more general results of Theorems 11-13. These more general results apply to windows that form Chebyshev systems, see Definition 7, and an extension to -systems, see Definition 9, which allows for explicit control of the stability of solutions to Program (4). These Chebyshev systems and other technical notions needed are introduced in Section 2.1 and our most general contributions are presented using these properties in Section 2.2.

### 2.1 Chebyshev systems and sparse measures

Before establishing stability of Program (4) to inexact samples, we show that solutions to Program (4) with , that is with in (2) having , has from (1) as its unique solution once . This result relies on forming a Chebyshev system, commonly abbreviated T-system [27].

###### Definition 7.

(Chebyshev, T-system [27]) Real-valued and continuous functions form a T-system on the interval if the matrix is nonsingular for any increasing sequence .

Example of T-systems include the monomials on any closed interval of the real line. In fact, T-systems generalise monomials and in many ways preserve their properties. For instance, any “polynomial” of a T-system has at most distinct zeros on . Or, given distinct points on , there exists a unique polynomial in

that interpolates these points. Note also that linear independence of

is a necessary condition for forming a T-system, but not sufficient. Let us emphasise that T-system is a broad and general concept with a range of applications in classical approximation theory and modern signal processing. In the context of super-resolution for example, translated copies of the Gaussian window, as given in (9), and many other measurement windows form a T-system on any interval. We refer the interested reader to [27, 29] for the role of T-systems in classical approximation theory and to [35] for their relationship to totally positive kernels.

#### 2.1.1 Sparse non-negative measure uniqueness from exact samples

Our analysis based on T-Systems has been inspired by the work by Schiebinger et al. [1], where the authors use the property of T-Systems to construct the dual certificate for the spike deconvolution problem and to show uniqueness of the solution to the TV norm minimisation problem without the need of a minimum separation. The theory of T-Systems has also been used in the same context by De Castro and Gamboa in [2]. However, both [1] and [2] focus on the noise-free problem exclusively, while we will extend the T-Systems approach to the noisy case as well, as we will see later.

Our work, in part, simplifies the prior analysis considerably by using readily available results on T-Systems and we go one step further to show uniqueness of the solution of the feasibility problem, which removes the need for TV norm regularisation in the results of Schiebinger et al. [1]; this simplification in the presence of exact samples is given in Proposition 8.

###### Proposition 8.

(Uniqueness of exactly sampled sparse non-negative measures) Let be a non-negative -sparse discrete measure supported on as given in (1). Let form a T-system on , and given measurements as in (2), then is the unique solution of Program (4) with .

Proposition 8 states that Program (4) successfully localises the impulses present in given only measurements when form a T-system on . Note that only need to be continuous and no minimum separation is required between the impulses. Moreover, as discussed in Section 1.4, the noise-free analysis here is substantially simpler as it avoids the introduction of the TV norm minimisation and is more insightful in that it shows that it is not the sparsifying property of TV minimisation which implies the result, but rather it follows from the non-negativity constraint and the T-system property, see Section 3.1.

#### 2.1.2 T*-systems in terms of source and sample configuration

While Proposition 8 implies that T-systems ensure unique non-negative solutions, more is needed to ensure stability of these results to inexact samples; that is . This is to be expected as T-systems imply invertibility of the linear system in (3) for any configuration of sources and samples as given in (5), but doe not limit the condition number of such a system. We control the condition number of by imposing further conditions on the source and sample configuration, such as those stated in Conditions 2, which is analogous to imposing conditions that there exists a dual polynomial which is sufficiently bounded away from zero in regions away from sources, see Section 2.2. In particular, we extend the notion of T-system in Definition 7 to a T*-system which includes conditions on samples at the boundary of the interval, additional conditions on the window function, and a condition ensuring that there exist samples sufficiently near sources as given by the notation (8) but stated in terms of a new variable so as to highlight its different role here.

###### Definition 9.

(T*-system) For an even integer , real-valued functions form a T*-system on if the following holds for every when is sufficiently small. For any increasing sequence such that

• , ,

• except exactly three points, namely , , and say , the other points belong to ,

• every contains an even number of points,

we have that

1. the determinant of the matrix is positive, and

2. the magnitudes of all minors of along the row containing approach zero at the same rate222A function approaches zero at the rate when . See, for example [36], page 44. when .

Let us briefly discuss T*-systems as an alternative to T-systems in Definition 7. The key property of a T-system to our purpose is that an arbitrary polynomial of a T-system on has at most zeros. Polynomials of a T*-system may not have such a property as T-systems allow arbitrary configurations of points while T*-systems only ensure the determinant in condition 1 of Definition 9 be positive for configurations where the majority of points in are paired in . However, as the analysis later shows, condition 1 in Definition 9 is designed for constructing dual certificates for Program (4). We will also see later that condition 2 in Definition 9 is meant to exclude trivial polynomials that do not qualify as dual certificates. Lastly, rather than any increasing sequence , Definition 9 only considers subsets that mainly cluster around the support , whereas in our use all but one entry in is taken from the set of samples ; this is only intended to simplify the burden of verifying whether a family of functions form a T*-system. While the first and third bullet points in Definition 9 require that there need to be at least two samples per interval as well as samples which define the interval endpoints which gives a sampling complexity , we typically require to include additional samples, , due to the location of being unknown. In fact, as is unknown, the third bullet point imposes a sampling density of being proportional to the inverse of the minimum separation of the sources . The additional point is not taken from the set , it instead acts as a free parameter to be used in the dual certificate. In Figure 2, we show an example of points which satisfy the conditions in Definition 9 for sources.

We will state some of our more general stability results for solutions of Program (4) in terms of the generalised Wasserstein distance [37] between and , both non-negative measures supported on , defined as

 dGW(x1,x2)=infz1,z2(∥x1−z1∥TV+dW(z1,z2)+∥x2−z2∥TV), (19)

where the infimum is over all non-negative Borel measures on such that . Here, is the total variation of measure , akin to the -norm in finite dimensions, and is the standard Wasserstein distance, namely

 dW(z1,z2)=infγ∫I|τ1−τ2|⋅γ(\difτ1,\difτ2), (20)

where the infimum is over all measures on that produce and as marginals. In a sense, extends to allow for calculating the distance between measures with different masses. 333 In [37], the authors consider the p-Wasserstein distance, where popular choices of are and . In our work, we only use the 1-Wasserstein distance.

Moreover, in some of our most general results we consider the extension to where need not be a discrete measure, see Theorem 11. In that setting, we introduce an intermediate -discrete measure which approximates in the metric. That is, given an integer and positive , let be a -sparse -separated measure supported on of size and with such that, for ,

 R(x,k,ϵ):=dGW(x,xk,ϵ)≤βinfχdGW(x,χ), (21)

where the infimum is over all -sparse -separated non-negative measures supported on and the parameter allows for near projections of onto the space of -sparse -separated measures.

Lastly, we also assume that the measurement operator in (3) is Lipschitz continuous, namely there exists such that

 ∥∥∥∫IΦ(t)(x1(\dift)−x2(\dift))∥∥∥2≤L⋅dGW(x1,x2), (22)

for every pair of measures supported on .

### 2.2 Stability of Program (4)

Equipped with the definitions of T and T*-systems, Definitions 7 and 9 respectively, we are able to characterise any solution to Program (4) for which form a T-system and suitable source and sample configurations (5). We control the stability to inexact measurements by introducing two auxiliary functions in Definition 10, which quantify the dual polynomials and associated with Program (4) to be at least away from the necessary constraints for all values of at least away from the sources. Specifically, for and defined below, we will require that and for all .

###### Definition 10.

(Dual polynomial separators) Let be a bounded function with , be positive constants, and the neighbourhoods as defined in (8). We then define

 F(t):=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩f0,t=0,f1,t=1,f(t−ti),when there exists i∈[k] such that t∈Ti,ϵ,¯f,elsewhere on int(I). (23)

Moreover, let be an arbitrary sign pattern. We define as

 Fπ(t):=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩f0,t=0,f1,t=1,±1−f(t−ti),when there exists i∈[k] such that % t∈Ti,ϵ and πi=±1,−¯¯¯f,everywhere else on int(I). (24)

We defer the introduction of dual polynomials and and the precise role of the above dual polynomial separators to Section 3, but state our most general results characterising the solutions to Program (4) in terms of these separators.

###### Theorem 11.

(Wasserstein stability of Program (4) for a T-system) Consider a non-negative measure supported on and assume that the measurement operator is -Lipschitz, see (3) and (22). Consider a -sparse non-negative discrete measure supported on and fix , see (6), and consider functions and as defined in Definition 10. For , suppose that

• form a T-system on ,

• form a T*-system on , and

• form a T*-system on for any sign pattern .

Let be a solution of Program (4) with

 δ′=δ+L⋅dGW(x,χ). (25)

Then there exist vectors

such that

 dGW(x,ˆx)≤((6+2¯f)∥b∥2+6minπ∥bπ∥2)δ′+ϵ∥χ∥TV+dGW(x,χ). (26)

where the minimum is over all sign patterns and the vectors above are the vectors of coefficients of the dual polynomials and associated with Program (4), see Lemmas 16 and 17 in Section 3 for their precise definitions.

Theorem 4 follows from Theorem 11 by considering Gaussian as stated in (9) which is known to be a T-system [27], and introducing Conditions 2 on the source and sample configuration (5) such that the conditions of Theorem 11 can be proved and the dual coefficients and bounded; the details of these proofs and bounds are deferred to Section 3 and the appendices.

The particular form of and in Theorem 11, constant away from the support of , is purely to simplify the presentation and proofs. Note also that the error depends both on the noise level and the residual , not unlike the standard results in finite-dimensional sparse recovery and compressed sensing [24, 38]. In particular, when , we approach the setting of Proposition 8, where we have uniqueness of -sparse non-negative measures from exact samples.

Note that the noise level and the residual are not independent; that is, specifies confidence in the samples and the model for how the samples are taken while reflects nearness to the model of -discrete measures. Corollary 6 show that the parameter can be removed, for shifted Gaussians, in the setting where is -discrete, that is , in which case is bounded by .

The more general variant of Theorem 5 follows from Theorem 12 by introducing alternative conditions on the source and sample configuration and omitting the need for the functions , which is the cause of the unnatural dependence in Theorem 4.

###### Theorem 12.

(Average stability for Program (4) for a T-system) Let be a solution of Program (4) and consider the function as defined in Definition 10. Suppose that:

• form a T-system on ,

• form a T*-system on , and

• and from Definition 1 satisfy

 ϕ(λΔ)=ϕ(Δ−λΔ)+ϕ(Δ+λΔ)+1Δ∫1/2−λΔΔ−λΔϕ(x)\difx+1Δ∫1/2+λΔΔ+λΔϕ(x)\difx. (27)

Then, for any and for all ,

 ∣∣∣∫Ti,ϵ^x(\dift)−ai∣∣∣≤(2(1+ϕ∞∥b∥2¯f)⋅δ+L∥^x∥TV⋅ϵ)k∑j=1(A−1)ij, (28) ∫TCϵ^x(\dift)≤2∥b∥2δ¯f, (29)

where:

• is the same vector of coefficients of the dual certificate as in Theorem 11 and is given in Definition 10, which is used to construct the dual certificate , as described in Lemma 16 in Section 3,

• ,

• is the Lipschitz constant of ,

• is the matrix

 A=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣|ϕ1(t1)|−|ϕ1(t2)|…−|ϕ1(tk)|−|ϕ2(t1)||ϕ2(t2)|…−|ϕ2(tk)|⋮⋮⋱⋮−|ϕk(t1)|−|ϕk(t2)|…|ϕk(tk)|⎤⎥ ⎥ ⎥ ⎥ ⎥⎦, (30)

with evaluated at as defined in (7).

Theorem 12 bounds the difference between the average over the interval of any solution to Program (4) and the discrete measure whose average is simply . The condition on to satisfy (27) is used to ensure the matrix from (30) is strictly diagonally dominant. It relies on the windows being sufficiently localised about zero. Though Theorem 12 explicitly states that the location of the closest samples to each source is less than , this can be achieved without knowing the locations of the sources by placing the samples uniformly at interval which gives a sampling complexity of . Lastly, a similar bound on the integral of over is given by Lemma 16 in Section 3.

#### 2.2.1 Clustering of indistinguishable sources

Theorems 11 and 12 give uniform guarantees for all sources in terms of the minimum separation condition , which measures the worst proximity of sources. One might imagine that, for example, if all but two sources are sufficiently well separated, then Theorem 12 might hold for the sources that are well separated; moreover, assuming is fixed, then if two sources and with magnitudes and are closer than , namely , we might imagine that a variant of Theorem 12 might hold but with sources and approximated with source near and and with .

In this section we extend Theorem 12 to this setting by considering fixed and alternative intervals a partition of such that each contains a group of consecutive sources (with weights respectively) which are within at most of each other. Define

 ~Ti,ϵ=ki⋃l=1Til,ϵ, where % til∈Til,ϵ and |til+1−til|<2ϵ,∀l∈[ki−1], (31)

for , so that we have

 Tϵ=~k⋃i=1~Ti,ϵ and ~Ti,ϵ⋂~Tj,ϵ≠∅,∀i≠j. (32)
###### Theorem 13.

(Average stability for Program (4): grouped sources) Let be a solution of Program (4) and be partitioned as described by (31). If the samples are placed uniformly at interval where satisfies (27) with , then there exist with such that

 ∣∣ ∣∣∫~Ti,ϵ^x(\dift)−ki∑r=1air∣∣ ∣∣≤(2(1+ϕ∞∥b∥2¯f)⋅δ+(2k−1)L∥^x∥TV⋅ϵ)k∑j=1(~A−1)ij, (33)

where the constants are the same as in (12) and the matrix is

 ~A=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣|ϕ1(ξ1)|−|ϕ1(ξ2)|…−|ϕ1(ξ~k)|−|ϕ2(ξ1)||ϕ2(ξ2)|…−|ϕ2(ξ~k)|⋮⋮⋱⋮−|ϕ~k(ξ1)|−|ϕ~k(ξ2)|…|ϕ~k(ξ~k)|⎤⎥ ⎥ ⎥ ⎥ ⎥⎦.

Note that Lemma 16 still holds if we replace any group of sources from an interval with some , so the bound from Lemma 16 on remains valid without modification.

As an exemplar source location where Theorem 13 might be applied, consider the situation where the source locations comprising are drawn uniformly at random in , where we have that (from [39] page 42, Exercise 22)

 P(Δ(T)>θ)=[1−(k+1)θ]k,θ∈[0,1k+1].

Then, the cumulative distribution function is

 F(θ)=P(Δ(T)≤θ)=1−[1−(k+1)θ]k,

and so the distribution of is

 f(θ)=