1 Introduction
1.1 Background on superresolution
Superresolution consists of estimating a signal , given blurred observations obtained after convolution with a point spread function which is assumed to represent the impulse response of the measurement system, such as for eg., a microscope in high density single molecule imaging. Mathematically, is typically modeled as a superposition of dirac’s, i.e., a sparse atomic measure of the form
while is a low pass filter. Denoting
(1.1) 
to be the convolution of and , one is usually given information about either as samples of , or the Fourier samples of . This problem has a number of important applications arising for instance in geophysics [27], astronomy [41], medical imaging [21] etc. The reader is referred to [10] for a more comprehensive list of applications. Superresolution can be seen as a highly nontrivial “off the grid” extension of the finite dimensional sparse estimation problem in compressed sensing [19], [17] and statistics [8]
. In the new setting, instead of estimating a sparse vector in a finite dimensional space, the goal is to estimate a sparse measure over the real line
endowed with its Borel algebra.Recently, the problem of superresolution has received a great deal of interest in the signal processing community, triggered by the seminal work of Candès and FernandezGranda [10, 9]. They considered the setup where one is given the first few Fourier coefficients of , i.e., for a cutoff frequency , we observe where
(1.2) 
Note that this corresponds to taking in (1.1), and sampling the Fourier transform of on the regular grid . The authors consider the noiseless setting and propose solving an optimization problem over the space of measures which involves minimizing the total variation (TV) norm amongst all measures which are consistent with the observations. The resulting minimization problem is an infinite dimensional convex program over Borel measures on . It was shown that the dual of this problem can be formulated as a semidefinite program (SDP) in finitely many variables, and thus can be solved efficiently. Since then, there have been numerous followup works such as by Schiebinger et al. [42], Duval and Peyre [14], Denoyelle et al. [13], Bendory et al. [3], Azaïs et al. [2] and many others. For instance, [42] considers the noiseless setting by taking realvalued samples of with a more general choice of (such as a Gaussian) and also assumes to be nonnegative. Their proposed approach again involves TV norm minimization with linear constraints. Bendory et al. [3] consider to be Gaussian or Cauchy, do not place sign assumptions on , and also analyze TV norm minimization with linear fidelity constraints for estimating from noiseless samples of . The approach adopted in [14, 13] is to solve a leastsquarestype minimization procedure with a TV norm based penalty term (also referred to as the Beurling LASSO (see for eg., [2])) for recovering from samples of . The approach in [15] considers a natural finite approximation on the grid to the continuous problem, and studies the limiting behaviour as the grid becomes finer; see also [16].
From a computational perspective, the aforementioned approaches all admit a finite dimensional dual problem with an infinite number of linear constraints; this is a semi infinite program (SIP) for which there is an extensive literature [23]. For the particular case of nonnegative , Boyd et al. [5] proposed an improved FrankWolfe algorithm in the primal. In certain instances, for eg., with Fourier samples (such as in [10, 9]), this SIP can also be reformulated as a SDP. From a practical point of view, SDP is notoriously slow for moderately large number of variables. The algorithm of [5] is a first order scheme with potential local correction steps, and is practically more viable.
From a statistical view point, Candès and FernandèzGranda [10] showed that their approach exactly recovers in the noiseless case provided , where denotes the minimum separation between the spike locations. Similar results for other choices of were shown by Schiebinger et al. [42] (for positive measures and without any minimum separation condition), and by Bendory et al. [3] (for signed measures and with a separation condition). In the noisy setting, the state of affairs is radically different since it is known (see for eg., [10, Section 3.2], [36, Corollary 1.4]) that some separation between the spike locations is indeed necessary for stable recovery. When sufficient separation is assumed and provided the noise level is small enough, then stable recovery of is possible (see for eg., [18, 14, 13, 2]).
When one is given the first few Fourier samples of the spike train (i.e., (1.2)), then there exist other approaches that can be employed. Prony’s method [12] is a classical method that involves finding the roots of a polynomial, whose coefficients form a null vector of a certain Hankel matrix. Prony’s method and its variants have also been recently studied by Potts and Tasche (for eg., [39, 40]), Plonka and Tasche [38], and others. The matrix pencil (MP) method [24]
is another classical approach that involves computing the generalized eigenvalues of a suitable matrix pencil. Both these methods recover
exactly in the absence of noise provided (so samples), and are also computationally feasible. Recently, Moitra [36, Theorem 2.8] showed that the MP method is stable in the presence of noise provided the noise level is not too large, and . Moitra also showed [36, Corollary 1.4] that such a dependence is necessary, in the sense that if , then the noise level would have to be to be able to stably recover . More recently, Tang et al. [49, 48] studied approaches based on atomic norm minimization, which can be formulated as a SDP. In [49, Theorem 1.1], the authors considered the signs of the amplitudes of to be generated randomly, with noiseless samples. It was shown that if , and if the Fourier samples are obtained at indices selected uniformly at random from , thenis recovered exactly with high probability. In
[48, Thorem 1], the authors considered Gaussian noise in the samples and showed that if , then the solution returned by the SDP estimates the vector of original Fourier samples (i.e., ) at a mean square rate , withdenoting variance of noise. Moreover, they also show
[48, Theorem 2] that the spike locations and amplitude terms corresponding to the SDP solution are localized around the true values. Very similar in spirit to the MP method for sum of exponential estimation are the Ponylike methods ESPRIT and MUSIC which can also be used for spike train estimation using the same Fourier domain measurement trick as in [36]. These methods can be interpreted as model reduction methods based on low–rank approximation of Hankel matrices [35]. However, to the best of our knowledge, a precise, finite sample based analysis of ESPRIT, similar in spirit to the analysis of the MP method proposed in [36] is still lacking, although a first step in this direction (using Adamian Arov and Krein theory) can be found in [38].1.2 Superresolution with multiple kernels
In this paper, we consider a generalization of the standard superresolution problem by assuming that the measurement process now involves a superposition of convolutions of several spike trains with different point spread functions. This problem, which we call the “multikernel unmixing superresolution problem” appears naturally in many practical applications such as single molecule spectroscopy [25]
, spike sorting in neuron activity analysis
[29, 6], DNA sequencing [30, 31], spike hunting in galaxy spectra [7, 33] etc.A problem of particular interest at the National Physical Laboratory, the UK’s national metrology institute, is isotope identification [34, 47] which is of paramount importance in nuclear security. Handheld radioisotope identifiers are known to suffer from poor performance [47] and new and precise algorithms have to be devised for this problem. While it is legitimately expected for Radio Isotope Identifier Devices to be accurate across all radioactive isotopes, the US Department of Homeland Security requires all future identification systems to be able to meet a minimum identification standard set forth in ANSI N42.34. Effective identification from low resolution information is known to be reliably achievable by trained spectroscopists whereas automated identification using standard algorithms sometimes fails up to three fourth of the time [46]. For instance, the spectrum of is plotted in Figure 1, which is extracted from [46, p.9].
Isotope identification involves the estimation of a set of peak locations in the gamma spectrum where the signal is blurred by convolution with kernels of different window sizes. Mixtures of different spectra can be particularly difficult to analyse using traditional methods and a need for precise unmixing algorithms in such generalised superresolution problems may play an important role in future applications such as reliable isotope identification.
Another application of interest is DNA sequencing in the vein of [30]. Sequencing is usually performed using some of the variants of the enzymatic method invented by Frederick Sanger [1]. Sequencing is based on enzymatic reactions, electrophoresis, and some detection technique. Electrophoresis is a technique used to separate the DNA subfragments produced as the output of four specific chemical reactions, as described in more detail in [30]. DNA fragments are negatively charged in solution. An efficient colorcoding strategy has been developed to permit sizing of all four kinds of DNA subfragments by electrophoresis in a single lane of a slab gel or in a capillary. In each of the four chemical reactions, the primers are labeled by one of four different fluorescent dyes. The dyes are then excited by a laser based confocal fluorescent detection system, in a region within the slab gel or capillary. In that process, fluorescence intensities are emitted in four wavelength bands as shown with different color codes in Figure 2 below. However, quoting [4], “because the electrophoretic process often fails to separate peaks adequately, some form of deconvolution filter must be applied to the data to resolve overlapping events. This process is complicated by the variability of peak shapes, meaning that conventional deconvolution often fails.” The methods developed in the present paper aim at achieving accurate deconvolution with different peak shapes and might therefore be useful for practical deconvolution problems in DNA sequencing.
We will now provide the mathematical formulation of the problem and describe the main idea of our approach along with our contributions, and discuss related work for this problem.
1.3 Problem formulation
Say we have groups of point sources where and (with ) denote the locations and (complexvalued) amplitudes respectively of the sources in the group. Our signal of interest is defined as
Let be a positive definite function^{1}^{1}1Recall from [51, Theorem 6.11] that a continuous function is positive definite if and only if is bounded and its Fourier transform is nonnegative and non vanishing. with its Fourier transform for . Consider distinct kernels , where . Let where denotes the convolution operator. Let denote the Fourier transform of , i.e., . Denoting the Fourier transform of by , we get
(1.3) 
Assuming black box access to the complex valued function , our aim is to recover estimates of and for each from few possibly noisy samples of . Here, denotes measurement noise at location .
Gaussian kernels.
For ease of exposition, we will from now on consider to be a Gaussian, i.e., so that , . It is well known that is positive definite [51, Theorem 6.10], moreover, . We emphasize that our restriction to Gaussian kernels is only to minimize the amount of tedious computations in the proof. However, our proof technique can be extended to handle more general positive definite .
1.4 Main idea of our work: Fourier tail sampling
To explain the main idea, let us consider the noiseless setting . Our main algorithmic idea stems from observing (1.3), wherein we notice that for sufficiently large, is equal to plus a perturbation term arising from the tails of . Thus, is equal to (which is a weighted sum of complex exponentials) plus a perturbation term. We control this perturbation by choosing to be sufficiently large, and recover estimates (up to a permutation ) via the Modified Matrix Pencil (MMP) method of Moitra [36] (outlined as Algorithm 1). Given these, we form the estimate to where
Provided the estimates are good enough, we will have . Therefore, by applying the above procedure to , we can hope to recover estimates of ’s and ’s as well. By proceeding recursively, it is clear that we can perform the above procedure to recover estimates to each and for all . A delicate issue that needs to be addressed for each intermediate group is the following. While estimating the parameters for group , the samples that we obtain will have perturbation arising due to (a) the tails of and, (b) the estimates computed thus far. In particular, going “too deep” in to the tail of would blow up the perturbation term in (b), while not going sufficiently deep would blow up the perturbation term in (a). Therefore, in order to obtain stable estimates to the parameters , we will need to control these perturbation terms by carefully choosing the locations, as well as the number of sampling points in the tail.
Further remarks.
Our choice of using the MMP method for estimating the spike locations and amplitudes at each stage is primarily dictated by two reasons. Firstly, it is extremely simple to implement in practice. Moreover, it comes with precise quantitative error bounds (see Theorem 3) for estimating the source parameters – especially for the setting of adversarial bounded noise – which fit seamlessly in the theoretical analysis of our method. Of course in practice, other methods tailored to handle sums of complex exponentials (such as ESPRIT, MUSIC etc.) could also be used. To our knowledge, error bounds of the form in Theorem 3 do not currently exist for these other methods.
1.5 Main results
Our algorithm based on the aforementioned idea is outlined as Algorithm 2 which we name KrUMMP (Kernel Unmixing via Modified Matrix Pencil). Our main result for the noiseless setting () is stated as Theorem 5 in its full generality. We state its following informal version assuming the spike amplitudes in each group to be , and with used to hide positive constants. Note that is the usual wrap around distance on (see (2.1))
Theorem 1 (Noiseless case).
The conditions on imply that the estimation errors corresponding to group should be sufficiently smaller than that of group . This is because the estimation errors arising in stage percolate to stage , and hence need to be controlled for stable recovery of source parameters for the group. Note that the conditions on (sampling offset at stage ) involve upper and lower bounds for reasons stated in the previous section. If is close to then will have to be suitably large in order to distinguish between and , as one would expect intuitively. An interesting feature of the result is that it only depends on separation within a group (specified by ), and so spikes belonging to different groups are allowed to overlap.
2 Notation and Preliminaries
Notation.
Vectors and matrices are denoted by lower and upper cases letters respectively, For , we denote . The imaginary unit is denoted by . The () norm of a vector is denoted by (defined as ). In particular, . For a matrix
, we will denote its spectral norm (i.e., largest singular value) by
and its Frobenius norm by (defined as ). For positive numbers , we denote to mean that there exists an absolute constant such that . The wrap around distance on is denoted by where we recall that(2.1) 
2.1 Matrix Pencil (MP) method
We now review the classical Matrix Pencil (MP) method for estimating positions of point sources from Fourier samples. Consider the signal where , are unknown. Let be the Fourier transform of so that . For any given offset , let where ; clearly
where . Choose to form the matrices
(2.2) 
and
(2.3) 
Denoting for , and the Vandermonde matrix
(2.4) 
clearly and . Here, and are diagonal matrices. One can readily verify^{2}^{2}2Recall (see [26, Definition 2.1]) that (where ) is a generalized eigenvalue of if it satisfies Clearly, this is only satisfied if (see also [24, Theorem 2.1]). that the non zero generalized eigenvalues of () are equal to the ’s. Hence by forming the matrices , we can recover the unknown ’s exactly from samples of . Once the ’s are recovered, we can recover the ’s exactly, as the solution of the linear system
Thereafter, the ’s are found as .
2.2 The Modified Matrix Pencil (MMP) method
We now consider the noisy version of the setup defined in the previous section. For a collection of point sources with parameters , , we are given noisy samples
(2.5) 
for and where denotes noise.
Let us choose for a given offset , and for a positive integer . Using , let us form the matrices as in (2.2),(2.3). We now have , where are as defined in (2.2),(2.3), and
represent the perturbation matrices. Algorithm 1 namely the Modifed Matrix Pencil (MMP) method [36] outlines how we can recover for .
Before proceeding we need to make some definitions. Let and . We denote the largest and smallest non zero singular values of by respectively, and the condition number of by where . Let . We will define as the minimum separation between the locations of the point sources where .
The following theorem is a more precise version of [36, Theorem 2.8], with the constants computed explicitly. Moreover, the result in [36, Theorem 2.8] was specifically for the case . We outline Moitra’s proof in Appendix A.1 for completeness and fill in additional details (using auxiliary results from Appendix B) to arrive at the statement of Theorem 3.
Theorem 3 ([36]).
For , say . Moreover, for , say
(2.6) 
Then, there exists a permutation such that the output of the MMP method satisfies for each
(2.7) 
where is formed by permuting the indices of w.r.t .
The following Corollary of Theorem 3 simplifies the expression for the bound on in (2.7), and will be useful for our main results later on. The proof is deferred to Appendix A.2.
Corollary 1.
For where is a constant, say . Moreover, for , say
(2.8) 
Then, there exists a permutation such that the output of the MMP method satisfies for each
(2.9) 
where , and is formed by permuting the indices of w.r.t .
3 Unmixing Gaussians in Fourier domain: Noiseless case
We now turn our attention to the main focus of this paper, namely that of unmixing Gaussians in the Fourier domain. We will in general assume the Fourier samples to be noisy as in (2.5), with defined in (1.3). In this section, we will focus on the noiseless setting wherein for each . The noisy setting is analyzed in the next section.
Let us begin by noting that when , i.e., in the case of a single kernel, the problem is solved easily. Indeed, we have from (1.3) that . Clearly, one can exactly recover and via the MP method by first obtaining the samples , , …, , and then working with .
The situation for the case is however more delicate. Before proceeding, we need to make some definitions and assumptions.

We will denote and .

The sources in the group are assumed to have a minimum separation of
3.1 The case of two kernels
We first consider the case of two Gaussian kernels as the analysis here is relatively easier to digest compared to the general case. Note that is now of the form
where we recall ; . The following theorem provides sufficient conditions on the choice of the sampling parameters for approximate recovery of for each .
Comments
There are no comments yet.