# Multi-kernel unmixing and super-resolution using the Modified Matrix Pencil method

Consider L groups of point sources or spike trains, with the l^th group represented by x_l(t). For a function g:R→R, let g_l(t) = g(t/μ_l) denote a point spread function with scale μ_l > 0, and with μ_1 < ... < μ_L. With y(t) = ∑_l=1^L (g_l x_l)(t), our goal is to recover the source parameters given samples of y, or given the Fourier samples of y. This problem is a generalization of the usual super-resolution setup wherein L = 1; we call this the multi-kernel unmixing super-resolution problem. Assuming access to Fourier samples of y, we derive an algorithm for this problem for estimating the source parameters of each group, along with precise non-asymptotic guarantees. Our approach involves estimating the group parameters sequentially in the order of increasing scale parameters, i.e., from group 1 to L. In particular, the estimation process at stage 1 ≤ l ≤ L involves (i) carefully sampling the tail of the Fourier transform of y, (ii) a deflation step wherein we subtract the contribution of the groups processed thus far from the obtained Fourier samples, and (iii) applying Moitra's modified Matrix Pencil method on a deconvolved version of the samples in (ii).

There are no comments yet.

## Authors

• 17 publications
• 18 publications
05/02/2019

### Conditioning of restricted Fourier matrices and super-resolution of MUSIC

This paper studies stable recovery of a collection of point sources from...
03/01/2013

### On a link between kernel mean maps and Fraunhofer diffraction, with an application to super-resolution beyond the diffraction limit

We establish a link between Fourier optics and a recent construction fro...
03/10/2015

### Parallel Statistical Multi-resolution Estimation

We discuss several strategies to implement Dykstra's projection algorith...
04/20/2020

### How exponentially ill-conditioned are contiguous submatrices of the Fourier matrix?

We show that the condition number of any cyclically contiguous p× q subm...
08/12/2020

### Vectorized Hankel Lift: A Convex Approach for Blind Super-Resolution of Point Sources

We consider the problem of resolving r point sources from n samples at t...
10/16/2014

### Super-resolution method using sparse regularization for point-spread function recovery

In large-scale spatial surveys, such as the forthcoming ESA Euclid missi...
09/28/2018

### Single Snapshot Super-Resolution DOA Estimation for Arbitrary Array Geometries

We address the problem of search-free direction of arrival (DOA) estimat...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### 1.1 Background on super-resolution

Super-resolution consists of estimating a signal , given blurred observations obtained after convolution with a point spread function which is assumed to represent the impulse response of the measurement system, such as for eg., a microscope in high density single molecule imaging. Mathematically, is typically modeled as a superposition of dirac’s, i.e., a sparse atomic measure of the form

 x(t)=k∑i=1uiδ(t−ti);ui∈C,ti∈[0,1),

while is a low pass filter. Denoting

 y(t)=k∑i=1uig(t−ti) (1.1)

to be the convolution of and , one is usually given information about either as samples of , or the Fourier samples of . This problem has a number of important applications arising for instance in geophysics [27], astronomy [41], medical imaging [21] etc. The reader is referred to [10] for a more comprehensive list of applications. Super-resolution can be seen as a highly non-trivial “off the grid” extension of the finite dimensional sparse estimation problem in compressed sensing [19], [17] and statistics [8]

. In the new setting, instead of estimating a sparse vector in a finite dimensional space, the goal is to estimate a sparse measure over the real line

endowed with its Borel -algebra.

Recently, the problem of super-resolution has received a great deal of interest in the signal processing community, triggered by the seminal work of Candès and Fernandez-Granda [10, 9]. They considered the setup where one is given the first few Fourier coefficients of , i.e., for a cut-off frequency , we observe where

 f(s)=∫10eι2πstx(dt)=k∑i=1uieι2πsti;s∈{−m,−m+1,…,m}. (1.2)

Note that this corresponds to taking in (1.1), and sampling the Fourier transform of on the regular grid . The authors consider the noiseless setting and propose solving an optimization problem over the space of measures which involves minimizing the total variation (TV) norm amongst all measures which are consistent with the observations. The resulting minimization problem is an infinite dimensional convex program over Borel measures on . It was shown that the dual of this problem can be formulated as a semi-definite program (SDP) in finitely many variables, and thus can be solved efficiently. Since then, there have been numerous follow-up works such as by Schiebinger et al. [42], Duval and Peyre [14], Denoyelle et al. [13], Bendory et al. [3], Azaïs et al. [2] and many others. For instance, [42] considers the noiseless setting by taking real-valued samples of with a more general choice of (such as a Gaussian) and also assumes to be non-negative. Their proposed approach again involves TV norm minimization with linear constraints. Bendory et al. [3] consider to be Gaussian or Cauchy, do not place sign assumptions on , and also analyze TV norm minimization with linear fidelity constraints for estimating from noiseless samples of . The approach adopted in [14, 13] is to solve a least-squares-type minimization procedure with a TV norm based penalty term (also referred to as the Beurling LASSO (see for eg., [2])) for recovering from samples of . The approach in [15] considers a natural finite approximation on the grid to the continuous problem, and studies the limiting behaviour as the grid becomes finer; see also [16].

From a computational perspective, the aforementioned approaches all admit a finite dimensional dual problem with an infinite number of linear constraints; this is a semi infinite program (SIP) for which there is an extensive literature [23]. For the particular case of non-negative , Boyd et al. [5] proposed an improved Frank-Wolfe algorithm in the primal. In certain instances, for eg., with Fourier samples (such as in [10, 9]), this SIP can also be reformulated as a SDP. From a practical point of view, SDP is notoriously slow for moderately large number of variables. The algorithm of [5] is a first order scheme with potential local correction steps, and is practically more viable.

From a statistical view point, Candès and Fernandèz-Granda [10] showed that their approach exactly recovers in the noiseless case provided , where denotes the minimum separation between the spike locations. Similar results for other choices of were shown by Schiebinger et al. [42] (for positive measures and without any minimum separation condition), and by Bendory et al. [3] (for signed measures and with a separation condition). In the noisy setting, the state of affairs is radically different since it is known (see for eg., [10, Section 3.2], [36, Corollary 1.4]) that some separation between the spike locations is indeed necessary for stable recovery. When sufficient separation is assumed and provided the noise level is small enough, then stable recovery of is possible (see for eg., [18, 14, 13, 2]).

When one is given the first few Fourier samples of the spike train (i.e., (1.2)), then there exist other approaches that can be employed. Prony’s method [12] is a classical method that involves finding the roots of a polynomial, whose coefficients form a null vector of a certain Hankel matrix. Prony’s method and its variants have also been recently studied by Potts and Tasche (for eg., [39, 40]), Plonka and Tasche [38], and others. The matrix pencil (MP) method [24]

is another classical approach that involves computing the generalized eigenvalues of a suitable matrix pencil. Both these methods recover

exactly in the absence of noise provided (so samples), and are also computationally feasible. Recently, Moitra [36, Theorem 2.8] showed that the MP method is stable in the presence of noise provided the noise level is not too large, and . Moitra also showed [36, Corollary 1.4] that such a dependence is necessary, in the sense that if , then the noise level would have to be to be able to stably recover . More recently, Tang et al. [49, 48] studied approaches based on atomic norm minimization, which can be formulated as a SDP. In [49, Theorem 1.1], the authors considered the signs of the amplitudes of to be generated randomly, with noiseless samples. It was shown that if , and if the Fourier samples are obtained at indices selected uniformly at random from , then

is recovered exactly with high probability. In

[48, Thorem 1], the authors considered Gaussian noise in the samples and showed that if , then the solution returned by the SDP estimates the vector of original Fourier samples (i.e., ) at a mean square rate , with

denoting variance of noise. Moreover, they also show

[48, Theorem 2] that the spike locations and amplitude terms corresponding to the SDP solution are localized around the true values. Very similar in spirit to the MP method for sum of exponential estimation are the Pony-like methods ESPRIT and MUSIC which can also be used for spike train estimation using the same Fourier domain measurement trick as in [36]. These methods can be interpreted as model reduction methods based on low–rank approximation of Hankel matrices [35]. However, to the best of our knowledge, a precise, finite sample based analysis of ESPRIT, similar in spirit to the analysis of the MP method proposed in [36] is still lacking, although a first step in this direction (using Adamian Arov and Krein theory) can be found in [38].

### 1.2 Super-resolution with multiple kernels

In this paper, we consider a generalization of the standard super-resolution problem by assuming that the measurement process now involves a superposition of convolutions of several spike trains with different point spread functions. This problem, which we call the “multi-kernel unmixing super-resolution problem” appears naturally in many practical applications such as single molecule spectroscopy [25]

, spike sorting in neuron activity analysis

[29, 6], DNA sequencing [30, 31], spike hunting in galaxy spectra [7, 33] etc.

A problem of particular interest at the National Physical Laboratory, the UK’s national metrology institute, is isotope identification [34, 47] which is of paramount importance in nuclear security. Hand-held radio-isotope identifiers are known to suffer from poor performance [47] and new and precise algorithms have to be devised for this problem. While it is legitimately expected for Radio Isotope Identifier Devices to be accurate across all radioactive isotopes, the US Department of Homeland Security requires all future identification systems to be able to meet a minimum identification standard set forth in ANSI N42.34. Effective identification from low resolution information is known to be reliably achievable by trained spectroscopists whereas automated identification using standard algorithms sometimes fails up to three fourth of the time [46]. For instance, the spectrum of is plotted in Figure 1, which is extracted from [46, p.9].

Isotope identification involves the estimation of a set of peak locations in the gamma spectrum where the signal is blurred by convolution with kernels of different window sizes. Mixtures of different spectra can be particularly difficult to analyse using traditional methods and a need for precise unmixing algorithms in such generalised super-resolution problems may play an important role in future applications such as reliable isotope identification.

Another application of interest is DNA sequencing in the vein of [30]. Sequencing is usually performed using some of the variants of the enzymatic method invented by Frederick Sanger [1]. Sequencing is based on enzymatic reactions, electrophoresis, and some detection technique. Electrophoresis is a technique used to separate the DNA sub-fragments produced as the output of four specific chemical reactions, as described in more detail in [30]. DNA fragments are negatively charged in solution. An efficient color-coding strategy has been developed to permit sizing of all four kinds of DNA sub-fragments by electrophoresis in a single lane of a slab gel or in a capillary. In each of the four chemical reactions, the primers are labeled by one of four different fluorescent dyes. The dyes are then excited by a laser based con-focal fluorescent detection system, in a region within the slab gel or capillary. In that process, fluorescence intensities are emitted in four wavelength bands as shown with different color codes in Figure 2 below. However, quoting [4], “because the electrophoretic process often fails to separate peaks adequately, some form of deconvolution filter must be applied to the data to resolve overlapping events. This process is complicated by the variability of peak shapes, meaning that conventional deconvolution often fails.” The methods developed in the present paper aim at achieving accurate deconvolution with different peak shapes and might therefore be useful for practical deconvolution problems in DNA sequencing.

We will now provide the mathematical formulation of the problem and describe the main idea of our approach along with our contributions, and discuss related work for this problem.

### 1.3 Problem formulation

Say we have groups of point sources where and (with ) denote the locations and (complex-valued) amplitudes respectively of the sources in the group. Our signal of interest is defined as

 x(t)=L∑l=1xl(t)=L∑l=1(K∑j=1 ul,jδ(t−tl,j)).

Let be a positive definite function111Recall from [51, Theorem 6.11] that a continuous function is positive definite if and only if is bounded and its Fourier transform is nonnegative and non vanishing. with its Fourier transform for . Consider distinct kernels , where . Let where denotes the convolution operator. Let denote the Fourier transform of , i.e., . Denoting the Fourier transform of by , we get

 f(s) =L∑l=1¯gl(s)(K∑j=1ul,jexp(ι2πstl,j))fl(s). (1.3)

Assuming black box access to the complex valued function , our aim is to recover estimates of and for each from few possibly noisy samples of . Here, denotes measurement noise at location .

#### Gaussian kernels.

For ease of exposition, we will from now on consider to be a Gaussian, i.e., so that , . It is well known that is positive definite [51, Theorem 6.10], moreover, . We emphasize that our restriction to Gaussian kernels is only to minimize the amount of tedious computations in the proof. However, our proof technique can be extended to handle more general positive definite .

### 1.4 Main idea of our work: Fourier tail sampling

To explain the main idea, let us consider the noiseless setting . Our main algorithmic idea stems from observing (1.3), wherein we notice that for sufficiently large, is equal to plus a perturbation term arising from the tails of . Thus, is equal to (which is a weighted sum of complex exponentials) plus a perturbation term. We control this perturbation by choosing to be sufficiently large, and recover estimates (up to a permutation ) via the Modified Matrix Pencil (MMP) method of Moitra [36] (outlined as Algorithm 1). Given these, we form the estimate to where

 ˆf1(s)=¯g1(s)K∑j=1ˆu1,ϕ1(j)exp(ι2πsˆt1,ϕ1(j)).

Provided the estimates are good enough, we will have . Therefore, by applying the above procedure to , we can hope to recover estimates of ’s and ’s as well. By proceeding recursively, it is clear that we can perform the above procedure to recover estimates to each and for all . A delicate issue that needs to be addressed for each intermediate group is the following. While estimating the parameters for group , the samples that we obtain will have perturbation arising due to (a) the tails of and, (b) the estimates computed thus far. In particular, going “too deep” in to the tail of would blow up the perturbation term in (b), while not going sufficiently deep would blow up the perturbation term in (a). Therefore, in order to obtain stable estimates to the parameters , we will need to control these perturbation terms by carefully choosing the locations, as well as the number of sampling points in the tail.

#### Further remarks.

Our choice of using the MMP method for estimating the spike locations and amplitudes at each stage is primarily dictated by two reasons. Firstly, it is extremely simple to implement in practice. Moreover, it comes with precise quantitative error bounds (see Theorem 3) for estimating the source parameters – especially for the setting of adversarial bounded noise – which fit seamlessly in the theoretical analysis of our method. Of course in practice, other methods tailored to handle sums of complex exponentials (such as ESPRIT, MUSIC etc.) could also be used. To our knowledge, error bounds of the form in Theorem 3 do not currently exist for these other methods.

### 1.5 Main results

Our algorithm based on the aforementioned idea is outlined as Algorithm 2 which we name KrUMMP (Kernel Unmixing via Modified Matrix Pencil). Our main result for the noiseless setting () is stated as Theorem 5 in its full generality. We state its following informal version assuming the spike amplitudes in each group to be , and with used to hide positive constants. Note that is the usual wrap around distance on (see (2.1))

###### Theorem 1 (Noiseless case).

Denote for each . Let satisfy for each . Moreover, let and hold for with depending on the problem parameters (see (3.26), (3.28)). Finally, let , , and

 ml=Θ(1/△l),sl=Θ(ml+1(μ2l+1−μ2l)1/2log1/2(K3/2(L−l)μLεlμl));1≤l≤L−1.

Then, for each , there exists a permutation such that

 dw(ˆtl,ϕl(j),tl,j) ≤εl,|ˆul,ϕl(j)−ul,j|

Here, , for .

The conditions on imply that the estimation errors corresponding to group should be sufficiently smaller than that of group . This is because the estimation errors arising in stage percolate to stage , and hence need to be controlled for stable recovery of source parameters for the group. Note that the conditions on (sampling offset at stage ) involve upper and lower bounds for reasons stated in the previous section. If is close to then will have to be suitably large in order to distinguish between and , as one would expect intuitively. An interesting feature of the result is that it only depends on separation within a group (specified by ), and so spikes belonging to different groups are allowed to overlap.

Our second result is for the noisy setting. Say at stage of Algorithm 2, we observe where denotes noise at location . Our main result for this setting is Theorem 6. Denoting to be the noise vector at stage , we state its informal version below assuming the spike amplitudes in each group to be .

###### Theorem 2 (Noisy case).

Say at stage of Algorithm 2, we observe where denotes noise at location . For each , let be chosen as specified in Theorem 1. Say , and also

 ∥wl∥∞≲(εlμl)1+C(μl,μl+1,△l)√K(K3/2LμL)C(μl,μl+1,△l);2≤l≤L−1

where depends only on . Then, for each , there exists a permutation such that

 dw(ˆtl,ϕl(j),tl,j)≤εl,|ˆul,ϕl(j)−ul,j|

where is as in Theorem 1.

## 2 Notation and Preliminaries

#### Notation.

Vectors and matrices are denoted by lower and upper cases letters respectively, For , we denote . The imaginary unit is denoted by . The () norm of a vector is denoted by (defined as ). In particular, . For a matrix

, we will denote its spectral norm (i.e., largest singular value) by

and its Frobenius norm by (defined as ). For positive numbers , we denote to mean that there exists an absolute constant such that . The wrap around distance on is denoted by where we recall that

 dw(t1,t2)=min{|t1−t2|,1−|t1−t2|}. (2.1)

### 2.1 Matrix Pencil (MP) method

We now review the classical Matrix Pencil (MP) method for estimating positions of point sources from Fourier samples. Consider the signal where , are unknown. Let be the Fourier transform of so that . For any given offset , let where ; clearly

 f(s0+i)=K∑j=1ujexp(ι2π(s0+i)tj)=K∑j=1u′jexp(ι2πitj)

where . Choose to form the matrices

 H0=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣f(s0)f(s0+1)⋯f(s0+m−1)f(s0−1)f(s0)⋯f(s0+m−2)⋮⋮⋮ f(s0−m+1)f(s0−m+2)⋯f(s0)⎤⎥ ⎥ ⎥ ⎥ ⎥⎦ (2.2)

and

 H1=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣f(s0−1)f(s0)⋯f(s0+m−2)f(s0−2)f(s0−1)⋯f(s0+m−3)⋮⋮⋮f(s0−m)f(s0−m+1)⋯f(s0−1)⎤⎥ ⎥ ⎥ ⎥ ⎥⎦. (2.3)

Denoting for , and the Vandermonde matrix

 V=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣11⋯1α1α2⋯αK⋮⋮⋮αm−11αm−12⋯αm−1K⎤⎥ ⎥ ⎥ ⎥ ⎥⎦, (2.4)

clearly and . Here, and are diagonal matrices. One can readily verify222Recall (see [26, Definition 2.1]) that (where ) is a generalized eigenvalue of if it satisfies Clearly, this is only satisfied if (see also [24, Theorem 2.1]). that the non zero generalized eigenvalues of () are equal to the ’s. Hence by forming the matrices , we can recover the unknown ’s exactly from samples of . Once the ’s are recovered, we can recover the ’s exactly, as the solution of the linear system

 ⎡⎢ ⎢ ⎢ ⎢ ⎢⎣f(0)f(1)⋮f(m−1)⎤⎥ ⎥ ⎥ ⎥ ⎥⎦

Thereafter, the ’s are found as .

### 2.2 The Modified Matrix Pencil (MMP) method

We now consider the noisy version of the setup defined in the previous section. For a collection of point sources with parameters , , we are given noisy samples

 ˜f(s)=f(s)+ηs, (2.5)

for and where denotes noise.

Let us choose for a given offset , and for a positive integer . Using , let us form the matrices as in (2.2),(2.3). We now have , where are as defined in (2.2),(2.3), and

represent the perturbation matrices. Algorithm 1 namely the Modifed Matrix Pencil (MMP) method [36] outlines how we can recover for .

Before proceeding we need to make some definitions. Let and . We denote the largest and smallest non zero singular values of by respectively, and the condition number of by where . Let . We will define as the minimum separation between the locations of the point sources where .

The following theorem is a more precise version of [36, Theorem 2.8], with the constants computed explicitly. Moreover, the result in [36, Theorem 2.8] was specifically for the case . We outline Moitra’s proof in Appendix A.1 for completeness and fill in additional details (using auxiliary results from Appendix B) to arrive at the statement of Theorem 3.

###### Theorem 3 ([36]).

For , say . Moreover, for , say

 ηmax≤εuminσ2min2mC√K(1+16 κ2umaxumin)−1. (2.6)

Then, there exists a permutation such that the output of the MMP method satisfies for each

 dw(ˆtϕ(i),ti)≤ε,∥ˆuϕ−u∥∞≤⎛⎜ ⎜ ⎜⎝2πm3/2Kumax+uminσ2min2C√mK(1+16 κ2umaxumin)−1(m−1△−2ε−1)1/2+2πumaxs0⎞⎟ ⎟ ⎟⎠ε (2.7)

where is formed by permuting the indices of w.r.t .

The following Corollary of Theorem 3 simplifies the expression for the bound on in (2.7), and will be useful for our main results later on. The proof is deferred to Appendix A.2.

###### Corollary 1.

For where is a constant, say . Moreover, for , say

 ηmax≤εumin5C√K(1+48umaxumin)−1. (2.8)

Then, there exists a permutation such that the output of the MMP method satisfies for each

 dw(ˆtϕ(i),ti)≤ε,∥ˆuϕ−u∥∞<(~C+2πumaxs0)ε (2.9)

where , and is formed by permuting the indices of w.r.t .

## 3 Unmixing Gaussians in Fourier domain: Noiseless case

We now turn our attention to the main focus of this paper, namely that of unmixing Gaussians in the Fourier domain. We will in general assume the Fourier samples to be noisy as in (2.5), with defined in (1.3). In this section, we will focus on the noiseless setting wherein for each . The noisy setting is analyzed in the next section.

Let us begin by noting that when , i.e., in the case of a single kernel, the problem is solved easily. Indeed, we have from (1.3) that . Clearly, one can exactly recover and via the MP method by first obtaining the samples , , …, , and then working with .

The situation for the case is however more delicate. Before proceeding, we need to make some definitions and assumptions.

• We will denote and .

• The sources in the group are assumed to have a minimum separation of

 △l:=mini≠jdw(tl,i,tl,j)>0.
• Denoting , will denote the Vandermonde matrix

 ⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣11⋯1αl,1αl,2⋯αl,K⋮⋮⋮αml−1l,1αml−1l,2⋯αml−1l,K⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦

for each analogous to (2.4). will denote its largest and smallest non-zero singular values, and its condition number. Recall from Theorem 8 that if , then . In particular, and .

### 3.1 The case of two kernels

We first consider the case of two Gaussian kernels as the analysis here is relatively easier to digest compared to the general case. Note that is now of the form

 f(s) =¯g1(s)(K∑j=1u1,jexp(ι2πst1,j))+¯g2(s)(K∑j=1u2,jexp(ι2πst2,j))

where we recall ; . The following theorem provides sufficient conditions on the choice of the sampling parameters for approximate recovery of for each .

###### Theorem 4.

Let for a constant , , and satisfy . For constant as in Theorem 3, let also satisfy

 e(2π2M22,up(μ22−μ21)) (2πumaxM2,up+¯C1+¯C2log1/2(¯C3ε1))ε1 ≤ε2uminμ25Cμ1K3/2(1+48umaxumin)−1, (3.1)

where are constants depending (see (3.12), (3.13)) only on (see (3.2)), and a constant . Say are chosen to satisfy

 2△1−2ε1+1≤m1<2△1(1−c)+1 ( =M1,up),S1≤s1≤˜cS1,

where . Then, there exist permutations such that for ,

 dw(ˆt1,ϕ1(j),t1,j) ≤ε1,|ˆu1,ϕ1(j)−u1,j<