# Stability of partial Fourier matrices with clustered nodes

We prove sharp lower bounds for the smallest singular value of a partial Fourier matrix with arbitrary "off the grid" nodes (equivalently, a rectangular Vandermonde matrix with the nodes on the unit circle), in the case when some of the nodes are separated by less than the inverse bandwidth. The bound is polynomial in the reciprocal of the so-called "super-resolution factor", while the exponent is controlled by the maximal number of nodes which are clustered together. This generalizes previously known results for the extreme cases when all of the nodes either form a single cluster, or are completely separated. We briefly discuss possible implications for the theory and practice of super-resolution under sparsity constraints.

There are no comments yet.

## Authors

• 4 publications
• 5 publications
• 3 publications
• 3 publications
07/16/2019

### On the smallest singular value of multivariate Vandermonde matrices with clustered nodes

We prove lower bounds for the smallest singular value of rectangular, mu...
09/04/2019

### The spectral properties of Vandermonde matrices with clustered nodes

We study rectangular Vandermonde matrices V with N+1 rows and s irregula...
03/15/2021

### Multivariate Vandermonde matrices with separated nodes on the unit circle are stable

We prove explicit lower bounds for the smallest singular value and upper...
05/02/2019

### Conditioning of restricted Fourier matrices and super-resolution of MUSIC

This paper studies stable recovery of a collection of point sources from...
05/29/2021

### Stability and Super-resolution of MUSIC and ESPRIT for Multi-snapshot Spectral Estimation

This paper studies the spectral estimation problem of estimating the loc...
04/20/2020

### How exponentially ill-conditioned are contiguous submatrices of the Fourier matrix?

We show that the condition number of any cyclically contiguous p× q subm...
08/29/2019

### Smoothed analysis of the least singular value without inverse Littlewood-Offord theory

We study the lower tail behavior of the least singular value of an n× n ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### 1.1 Problem definition

Consider the matrix

 G(x,Ω):=[sin(Ω(ti−tj))Ω(ti−tj)]1≤i,j≤s, (1)

where

is a vector of distinct nodes

with , and

is the normalized bandwidth. The scaling of the smallest eigenvalue

111It is well-known that is positive-definite – for instance, because is a positive-definite function. is of interest in applied harmonic analysis and in particular the theory of super-resolution, where this quantity controls the worst-case stability of recovering an atomic measure from bandlimited data (see sub:discussion below). Since

 sin(Ωt)Ωt=12Ω∫Ω−Ωexp(ıωt)dω =limN→∞12NN∑k=−Nexp(ıkNΩt),

we see that is the limit as of the matrix

 GN:=12N[DN(Ω(ti−tj)N)]i,j,

where is the Dirichlet (periodic sinc) kernel

 DN(ξ):=N∑k=−Nexp(ıkξ)=sin((N+12)ξ)sinξ2. (2)

For each , let

 VN(x,Ω):=1√2N[exp(ıktjΩN)]j=1,…,sk=−N,…,N (3)

be the rectangular Vandermonde matrix with complex nodes where . Clearly , and so .

The question of lower bounds for (or, equivalently, ) received much attention in the literature, see e.g. [3, 7, 19, 20, 15, 18, 5, 11].

For , we denote

 ∥t∥T:=|Argexp(ıt)|=|tmod(−π,π]|,

where is the principal value of the argument of , taking values in .

Given as above, we define the minimal separation (in the wrap-around sense) as

 Δ=Δ(x):=mini≠j∥ti−tj∥T.

It is well-known that there are two very different scaling regimes for , depending on the quantity which is frequently called the “super-resolution factor” (see sub:discussion below)

 SRF:=1ΔΩ.

If and is fixed, the matrix is well-conditioned, and in fact it can be shown that in this case

 λmin≈(1−SRF). (4)

The case is somewhat more relevant to super-resolution applications, however all known results provide sharp bounds only in the particular case when all the nodes are clustered together, or approximately equispaced. In this setting we have the fast decay

 λmin(G)≈(ΩΔ)2(s−1),ΩΔ≪1. (5)

For details on (4) and (5) see sec:known-bounds below.

### 1.2 Main results

It turns out that the bound (5) is too pessimistic if only some of the nodes are known to be clustered. Consider for instance the configuration , then, as can be seen in fig:sigma.min.first.simulation, we have in fact , decaying much slower than – which would be the bound given by (5).

In this paper we bridge this theoretical gap. We consider the partially clustered regime where at most neighboring nodes can form a cluster (there can be several such clusters), with two additional parameters controlling the distance between the clusters and the uniformity of the distribution of nodes within the clusters.

###### Definition 1.1.

The node vector is said to form a -clustered configuration for some , , and , if for each , there exist at most distinct nodes

 x(j)={tj,k}k=1,…,rj⊂x,1≤rj≤ℓ,tj,1≡tj,

such that the following conditions are satisfied:

1. For any , we have

 Δ≤∥y−tj∥T≤τΔ.
2. For any , we have

 ∥y−tj∥T≥ρ.

Our main result is the following generalization of (5) for clustered configurations.

###### Theorem 1.1.

There exists a constant such that for any , any forming a -clustered configuration, and any satisfying

 4πsρ≤Ω≤πsτΔ, (6)

we have

 σmin(VN(x,Ω)) ≥\Crmain−c−n⋅(ΔΩ)ℓ−1, whenever N>2s3⌈Ω4s⌉; (7) λmin(G(x,Ω)) ≥\Crmain−c−n2⋅(ΔΩ)2(ℓ−1). (8)

The proof of thm:main-theorem is presented in sub:theproof below. It is based on the “decimation” technique, previously used in the context of super-resolution in [1, 2, 4, 5, 6] and references therein.

###### Remark 1.1.

The same node vector can be regarded as a clustered configuration with different choices of the parameters . For example, the vector from the beginning of this section (and also fig:sigma.min.first.simulation) is both -clustered and -clustered, with any . To obtain as tight a bound as possible, one should choose the minimal such that the condition (6) is satisfied for within the range of interest. For instance, might be too small if is small enough, however by choosing one is able to increase without bound. See fig:breakdown for a numerical example.

###### Remark 1.2.

The constant is given explicitly in (30), and it decays in like . We do not know whether this rate can be substantially improved, however it is plausible that the best possible bound would scale like for some absolute constant .

For the case of finite , one might be interested to consider the rectangular Vandermonde matrix without any reference to , i.e.

 VN(ξ):=1√2N[exp(ıkξj)]j=1,…,sk=−N,…,N (9)

for some node vector . Our next result is the analogue of (7) in this setting, albeit under an extra assumption that the nodes are restricted to the interval .

###### Corollary 1.1.

There exists a constant such that for any , any forming a -clustered configuration, and any satisfying

 max(4πsρ,4s3)≤N≤πsτΔ, (10)

we have

 σmin(VN(ξ))≥\Crmain−vand⋅(NΔ)ℓ−1. (11)
###### Proof.

Let us choose so that for all we have

 ˜tj:=Nξj˜Ω∈(−π2,π2].

Further define , and . We immediately obtain that the vector forms a -clustered configuration according to def:partial-cluster, and the rectangular Vandermonde matrix in (9) is precisely . Clearly, , and also

 ˜Ωs2=N≥4s3⟹˜Ω4s≥1⟹2˜Ω4s>⌈˜Ω4s⌉⟹N=˜Ωs2>2s3⌈˜Ω4s⌉. (12)

Using (10), we obtain precisely the conditions (6) with in place of respectively. Therefore the conditions of thm:main-theorem are satisfied for , and so (11) follows immediately from (12) and (7), with . ∎

Returning back to thm:main-theorem, it turns out that the bound (8) is asymptotically optimal.

###### Theorem 1.2.

There exists an absolute constant and a constant such that for any and any satisfying , there exists a -clustered configuration with nodes and certain depending only on , for which

 λmin(G(xmin,Ω))≤\Crupper⋅(ΔΩ)2(ℓ−1),ΔΩ<η.

The proof of thm:optimality is presented in sub:optimality. Numerical experiments validating the above results are presented in sec:Numerical-evidence.

### 1.3 Related work and discussion

Our main result has direct implications for the problem of super-resolution under sparsity constraints. For simplicity suppose that the nodes must belong to the grid of step size . As demonstrated in [11, 18] and several other works, the minimax error rate for recovery of sparse point measures from the bandlimited and inexact measurements is directly proportional to where is any vector of length . Moreover, it is established in those works that without any further constraints on the support of , the bound (5) holds and it is the best possible.

It is fairly straightforward to extend the results of [18] and [11] to our setting: if the support of is known to be partially clustered (as in def:partial-cluster), then the minimax error rate will satisfy

 (13)

for any estimator

and the norm , and it will be attained by the intractable sparse -minimization, with the additional restriction that the solutions should exhibit the appropriate clustered sparsity pattern instead of the unconstrained sparsity.

A different but closely related setting was considered in the seminal paper [12], where the measure was assumed to have infinite number of spikes on a grid of size , with one spike per unit of time on average, but whose local complexity was constrained to have not more than spikes per any interval of length . is called the “Rayleigh index”, being the maximal number of spikes which can be clustered together (a related notion of Rayleigh regularity was introduced in [23]). It was shown in [12] that the minimax recovery rate for such measures essentially scales like (13) where is replaced with (the work [12] had a small gap in the exponents between the lower and upper bounds, which was later closed in [11] for the finite sparse case). Our partial cluster model can therefore be regarded as the finite-dimensional version of these “sparsely clumped” measures with finite Rayleigh index, showing the same scaling of the error – polynomial in and exponential in the “local complexity” of the signal.

If the grid assumption is relaxed, then one might wish to measure the accuracy of recovery by comparing the locations of the recovered signal with the true ones . In this case, there are additional considerations which are required to derive the minimax rate, and it is possible to do so under the partial clustering assumptions. See [2, 6] for details, where we prove (13) in this scenario, for uniform bound on the noise . The extreme case has been treated recently in [4, 5].

In the case of well-separated spikes (i.e. clusters of size ), a recent line of work using minimization ([9, 8, 13, 10] and the great number of follow-up papers) has shown that the problem is stable and tractable.

Therefore, the partial clustering case is somewhat mid-way between the extremes and , and while our results in this paper (and also in [6]) show that it is much more stable than in the unconstrained sparse case, it is an intriguing open question whether provably tractable solution algorithms exist.

Several candidate algorithms for sparse super-resolution are well-known – MUSIC, ESPRIT/matrix pencil, and variants; these have roots in parametric spectral estimation [27]. In recent years, the super-resolution properties of these algorithms are a subject of ongoing interest, see e.g. [14, 19, 25] and references therein. Smallest singular values of the partial Fourier matrices , for finite , play a major role in these works, and therefore we hope that our results and techniques may be extended to analyze these algorithms as well.

## 2 Known bounds

### 2.1 Well-separated regime

Consider the well-separated case , and let be as defined in (3), i.e. a rectangular Vandermonde matrix with nodes on the unit circle with , so that .

Several more or less equivalent bounds on are available in this case, using various results from analysis and number theory such as Ingham and Hilbert inequalities, large sieve inequalities and Selberg’s majorants [17, 20, 24, 3, 21, 22, 15, 7].

The tightest bound was obtained by Moitra in [20], where he showed that if then

 σmin(√NVN)≥√N−1−1ΔN.

In our setting, we have and so as we obtain

 σmin(VN)≥√1−1N−1NΔN→√1−1ΩΔ,

which is exactly (4).

### 2.2 Single clustered regime

Let us now assume , i.e. or, equivalently, .

If all the nodes are equispaced, say , then the matrix is the so-called prolate matrix, whose spectral properties are known exactly [28, 26]. Indeed, we have in this case

 Gi,j=sin(Ω(ti−tj))Ω(ti−tj)=sin(ΩΔ(i−j))ΩΔ(i−j)=πΩΔ⋅sin(2πW(i−j))π(i−j),W:=ΩΔ2π,

and therefore where is the matrix defined in [26, eq. (21)]. The smallest eigenvalue of , denoted by in the same paper, has the exact asymptotics for small, given in [26, eqs. (64,65)]:

 (14)

which gives

 λmin(G)=\Crslepian(s)(ΩΔ)2s−2(1+O(ΩΔ)),ΩΔ≪1,

proving (5).

The same scaling was shown using Szego’s theory of Toeplitz forms in [11] – see also sub:discussion. The authors showed that there exist and such that for

 C16(sin2ΩΔπ)2s−2≤λmin(G)≤16(sin2ΩΔπ)2s−2.

Essentially the same result was obtained in [18], where the authors considered partial discrete Fourier matrices

 ΦM,N,S=[exp(−2πımnN)]m,n,

obtained from the un-normalized Discrete Fourier Transform matrix of size

by taking the first rows and an arbitrary set of columns, with and . The authors showed that as with the ratio fixed, we have the bound

 σmin(ΦM,N,S)≈√M(MN)S−1,

which is attained for the configuration of consecutive columns. In our equispaced setting, it is easy to see that the matrix for large is precisely with and . Therefore the above result reduces to

 σmin(VN)≈(ΔΩ)s−1,

which is the same as (5).

## 3 Proofs

### 3.1 Blowup

Here we introduce the uniform blowup of a node vector by a positive parameter , and study the effect of such a blowup mapping on the minimal wrap-around distance between the mapped nodes.

###### Lemma 3.1.

Let form a cluster, and suppose that . Then, for any there exists a set of total measure such that for every the following holds for every :

 ∥λy−λtj∥T ≥λΔ≥ΔΩ2s, ∀y∈x(j)∖{tj}; (15) ∥λy−λtj∥T ≥1−ξs2π, ∀y∈x∖x(j). (16)

Furthermore, the set is a union of at most intervals.

###### Proof.

We begin with (15). Let , then and since we immediately conclude that

 ∥λtj−λy∥T=λ∥tj−y∥T≥λΔ.

To show (16), let

be the uniform probability measure on

. Let and be fixed and put . For , let

be the random variable on

, defined by

 γ(tj,y)(λ):=∥λtj−λy∥T.

We now show that for any

 ν{γ(tj,y)(λ)≤απ}≤2α. (17)

Since , we can write where is an integer and . We break up the probability (17) as follows:

 (18)

Now, consider the number . As varies between and , the number traverses the unit circle exactly once, and therefore the variable traverses the interval exactly twice. Consequently,

 ν{γ(λ)≤απ∣∣∣λ−Ω2s∈2πδ[k−1,k]}=2απ2π=α.

Similarly, when varies between and , we have

 ν{γ(λ)≤απ∣∣∣λ−Ω2s∈2πδ[n,n+ζ]}≤2απ2πζ≤αζ.

Overall,

 ν{γ(λ)≤απ}≤αnn+ζ+αζζn+ζ=αn+1n+ζ≤2α,

proving (17).

It is clear from the above that is a union of intervals, each of length , repeating with the period of . Consequently the set is a union of at most intervals. Since we have , and so the set is a union of at most intervals.

Now we put and apply (17) for every pair where and . By the union bound, we obtain

 ν{∃tj∃y∈x∖x(j):γ(tj,y)(λ)≤α0π}≤∑tj,y2α0=2(s2)1−ξs2<1−ξ. (19)

Fixing as the complement of the above set, , we have that is of total measure greater or equal to , and for every the estimate (16) holds. Clearly is a union of at most intervals. ∎

Fix and consider the set given by lem:blowup-lemma. Let us also fix a finite and positive integer , and consider the set of equispaced points in :

 PN:={kΩN}k=−N,…,N.

If , then .

###### Proof.

By lem:blowup-lemma, the set consists of intervals, and by (19) the total length of is at most . Denote the lengths of those intervals by . The distance between neighboring points in is , and therefore each interval contains at most points. Overall, the interval contains at most

 K∑j=1(djNΩ+1)≤Ω4sNΩ+K

points from , and since the total number of points in is at least , we have

 |PN∩I|≥N2s−N4s−K≥N4s−s22⌈Ω4s⌉>0.

### 3.2 Square Vandermonde matrices

Let be a vector of pairwise distinct complex numbers. Consider the square Vandermonde matrix

 V(ξ):=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣11…1ξ1ξ2…ξsξ21ξ22…ξ2s⋮⋮⋱⋮ξs−11ξs−12…ξs−1s⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦. (20)
###### Theorem 3.1 (Gautschi, [16]).

For a matrix , let denote the induced matrix norm

 ∥A∥∞:=max1≤i≤m∑1≤j≤n|ai,j|.

Then we have

 ∥V−1(ξ)∥∞≤max1≤i≤s∏j≠i1+|ξj||ξj−ξi|. (21)
###### Proposition 3.2.

Suppose that is a vector of pairwise distinct complex numbers with , , and let be arbitrary. Let

 V(ξ,r):=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣ξr1ξr2…ξrsξr+11ξr+12…ξr+1sξr+21ξr+22…ξr+2s⋮⋮⋱⋮ξr+s−11ξr+s−12…ξr+s−1s⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦. (22)

For , denote by the angular distance between and :

Then

 σmin(V(ξ,r))≥π1−s√smin1≤j≤s∏k≠jδj,k. (23)
###### Proof.

Clearly, the matrix can be factorized as

 V(ξ,r)=V(ξ,0)×diag{ξr1,…,ξrs}.

Since as in (20), using (21) we immediately have

 ∥V−1(ξ,r)∥∞≤2s−1max1≤j≤s∏k≠j|ξj−ξk|−1. (24)

For any we have

 2π|θ|≤sin|θ|≤|θ|,

and since for any

 ∣∣ξj−ξk∣∣=∣∣∣1−ξjξk∣∣∣=2sin∣∣∣12Argξjξk∣∣∣=2sin∣∣∣δj,k2∣∣∣,

we therefore obtain

 2πδj,k≤∣∣ξj−ξk∣∣≤δj,k. (25)

Plugging (25) into (24) we have

 σmax(V−1(ξ,r))≤√s∥V−1(ξ,r)∥∞≤√sπs−1max1≤j≤s∏k≠jδ−1j,k,

which is precisely (23). ∎

### 3.3 Proof of thm:main-theorem

We shall bound defined as in (3) for sufficiently large . For any subset let , be the submatrix of containing only the rows in . By the Rayleigh characterization of singular values, it is immediately obvious that if is any partition of the rows of then

 σ2min(VN)≥P∑n=1σ2min(VN,Rn). (26)

Let be the set from lem:blowup-lemma for . By prop:finite-n-blowup we have that for all , will contain a rational multiple of of the form for some .

Consider the ”new” nodes

 uj,N:=tjΩNm=tjΩNλNNΩ=λNtj,j=1,…,s. (27)

Since , we conclude by lem:blowup-lemma that for every

 ∥uj,N−uk,N∥T ≥12s(ΔΩ), ∀tk∈x(j)∖{tj}; (28) ∥uj,N−uk,N∥T ≥π2s2, ∀tk∈x∖x(j). (29)

Since it follows that . Now consider the particular interleaving partition of the rows by blocks of rows each, separated by rows between them (some rows might be left out):

 R0 ={0,m,…,(s−1)m}, R1 ={1,m+1,…,(s−1)m+1}, R−1 ={−1,−m−1,…,−(s−1)m−1}, … Rm−1 ={m−1,2m−1,…,sm−1}, R−m+1 ={−m+1,−2m+1,…,−sm+1}.

For , each is a square Vandermonde-type matrix as in (22),

 VN,Rn=1√2NV(ξ,n),

with node vector

 ξ={eıuj,N}sj=1,

where are given by (27). We apply prop:vand-sing-estimate with the crude bound obtained from (28) and (29) above:

 min1≤j≤s∏k≠jδj,k≥12s−1s2s−2(ΔΩ)ℓ−1

and obtain

 σmin(VN,Rn)≥\Claux1(s)√2N(ΔΩ)ℓ−1,\Craux1(s):=1(2π)s−1s2s−2√s.

Now we use (26) to aggregate the bounds on for each square matrix and obtain

 λmin(VHNVN)=σ2min(VN)≥(2m−1)\Craux122N(ΔΩ)2(ℓ−1).

Since and since by assumption , we have that and so

 σ2min(VN)≥\Craux124s(ΔΩ)2(ℓ−1).

This proves (7) and (8) with

 \Crmain−c