Blind super-resolution is the problem of estimating high-resolution information of a signal from its low-resolution measurements when the point spread functions (PSFs) are unknown. Such problem arises in a wide variety of applications, including seismic data analysis, nuclear magnetic resonance spectroscopy , multi-user communication system , and 3D single-molecule microscopy . In particular, when the knowledge of PSFs is available, blind super-resolution reduces to the super-resolution problem [5, 6].
Without any additional assumptions, blind super-resolution of point sources is an ill-posed problem. To alleviate this issue, it is common to assume that the PSFs belong to a known low-dimensional subspace. Under this assumption and utilizing the lift technique, blind super-resolution of point sources can be formulated as a matrix recovery problem. By exploiting low dimensional structures of the target matrix, a series of works [12, 29, 17, 9, 26] theoretically studied under which conditions the target matrix can be recovered. The author in  considered the setting where the PSF is shared among all point sources, and established the recovery guarantees for the atomic norm minimization method. Yang et al.  further studied the same method, but with multiple unknown PSFs. Li et al.  provided robust analysis of blind 1D super-resolution and later the work in  generalized  to 2D case. Recently, Chen et al.  proposed a nuclear norm minimization method based on the vectorized Hankel lift, and established the corresponding exact recovery guarantees. While strong theoretical guarantees have been built for blind super-resolution based on convex methods, these approaches are computational inefficient for the high dimensional setting. Therefore it is necessary to design efficient and provable algorithms to deal with the large-scale regime.
In the past few years, substantial progress has been made on designing and analyzing provable fast algorithms for applications from science and engineering via non-convex optimization [16, 11], including matrix completion [10, 33], phase retrieval , blind deconvolution , spectrally sparse signal recovery [3, 4] and so on. Inspired by [33, 3], we propose an efficient algorithm for blind super-resolution of point sources based on the Vectorized Hankel Lift framework [30, 32, 9]. More precisely, we parameter the vectorized Hankel matrix corresponding to a candidate solution in terms of the Burer-Monteiro factorization and develop a projected gradient descent method to directly recover the low rank factors. Different from , this paper utilizes the low rank structure of the vectorized Hankel matrix associated with the target matrix while the previous work studied the low rank Hankel matrix recovery problem in the context of spectrally sparse recovery. Moreover, the sensing operator in this work differs substantially from that in . Therefore, the theoretical guarantee for spectrally sparse recovery in  is not applicable for the blind super-resolution setting.
The main contributions are two folds. Firstly, we present a new non-convex algorithm called projected gradient descent via vectorized Hankel lift (PGD-VHL) for blind super-resolution. Numerical experiments show that PGD-VHL is competitive with convex recovery methods (VHL and ANM) in terms of recovery ability, but is much more efficient. Secondly, we establish the recovery performance of PGD-VHL. Our results show that PGD-VHL started from a spectral initialization converges linearly to the target matrix under the similar sample complexity as convex approaches.
The rest of this paper is organized as follows. Section 2 gives the problem setup of blind super-resolution of point sources. Section 3 presents the PGD-VHL algorithm whose the exact recovery guarantee is provided in Section 4. Numerical evaluations are presented to illustrate the performance of PGD-VHL in Section 5. All proofs are deferred to Section 6. Finally, we conclude this paper and propose some future work in Section 7.
Some notations used throughout this paper are presented as follows. Symbols for vectors, matrices and operators are in bold lowercase letters, bold uppercase letters and calligraphic letters, respectively. For a complex number , its real part is denoted by . The transpose, complex conjugate, complex transpose, spectral norm and Frobenius norm of matrix are denoted as , , , and , respectively. The inner product of two matrices and is defined as . We use to denote the -th entry of and to denote the th row of . Moreover, we use the MATLAB notation to denote a vector of length , with entries . The identity operator are denoted as . Let be the vectorized Hankel lift operator which maps a matrix into an matrix,
where is the -th column of and . We denote the adjoint of by , which is a linear mapping from to . In particular, for any matrix , the -th column of is given by
where is the -th block of such that . Letting , we have
where the scale is defined as
Moreover, we define . The adjoint of denoted is given by . Additionally, and satisfy
We use to denote the matrix defined by
Then one has
where denotes the Kronecker product between and .
Throughout this paper, denote absolute positive numerical constants whose values may vary from line to line. The notation means that there exists an absolute constant such that .
2 Problem formulation
The point source signal model can be represented as a superposition of spikes
where is the Dirac function, and are the amplitude and location of the -th point source, respectively. Let be the unknown point spread functions depending on the locations of point sources. The observation is a convolution between and , that is,
After taking the Fourier transform and sampling, we obtain the measurements as
Let be a vector corresponding to the -th unknown point spread function. The goal is to estimate as well as from (2.3).
Obviously, the problem of blind super-resolution is ill-posed without any additional assumptions, because the number of unknowns in (2.3) is , which is larger than the number of samples . To tackle this issue, we follow the same route as that in [1, 12, 29, 9] and assume that all the Fourier samples of the unknown PSFs belong to a known low-dimensional subspace spanned by the columns of with , i.e.,
where denotes the unknown directional vector of in the subspace. According to the subspace assumption (2.4) and using the lift trick , we can easily rewrite (2.3) as a set of linear measurements with respect to the target matrix ,
where , is the th column vector of , is the -th standard basis of . The measurement model (2.5) can be rewritten succinctly as
where is the linear operator. Let be the adjoint operator of which is given by . Furthermore, define . We have for any . The measurements can be reformulated as
Note that once the data matrix is reconstructed, the frequencies can be retrieved from it through spatial smoothing MUSIC [14, 13, 31, 9], and the amplitudes and coefficients can be estimated by solving an over-determined linear system. Therefore in this work we focus on the problem of recovering from its linear measurements (2.6).
It has been shown that is a rank- matrix  and thus the matrix admits low rank structure when . Equipped with the low rank structure of , it is natural to recover by solving the constrained least squares problem
Letting for any , it can be verified that . To eliminate the rank constrain in (2.8), we apply the Burer-Monteiro factorization  to parameterize as , where and are two rank- matrices. Therefore, the optimization problem (2.8) can be rewritten as
Before introducing our algorithm, we make an assumption that is -incoherent which is defined below.
Let be the singular value decomposition of
be the singular value decomposition of, where and . Denote , where is the -th block of for . The matrix is -incoherent if and obey that
for some positive constant .
Let and be two numerical constants such and , and be a convex set defined as follows
where is the -th block of . Define
Since is -incoherent, we have . Therefore, we consider a penalized version of (2.9) for recovering the factorized matrices:
3 Algorithm: projected gradient descent
The initialization involves two steps: (1) computes the best rank approximation of via one step hard thresholding , where is the adjoint of and is the best rank approximation of ; (2) projects the low rank factors of best rank- approximated matrix onto the convex feasible set . Given a matrix , the projection onto , denoted by , has a closed form solution:
for . Let be the current estimator. The algorithm updates along gradient descent direction with step size , followed by projection onto the set . The gradient of is computed with respect to Wirtinger calculus given by where
Indeed PGD-VHL algorithm can be efficiently implemented. To obtain the computational cost of , we first introduce some notations. Let be the Hankel operator which maps a vector into an matrix,
where is the -th entry of . The adjoint of , denoted by , is a linear mapping from to . It can be seen that can be computed via fast convolutions by noting that
In addition, we can compute by fast Hankel matrix–vector multiplications, that is,
where is a vector reversing the order of . Therefore the computational complexity of both and is flops. Moreover, the authors in  show that , where is a matrix constructed by stacking all on top of one another, and is a permutation matrix. Therefore we can compute and by using flops. Thus the implementation of our algorithm is very efficient and the main computational complexity in each step is .
4 Main result
In this section, we provide a theoretical analysis of PGD-VHL under a random subspace model.
The column vectors of are independently and identically drawn from a distribution which satisfies the following conditions
Assumption 4.1 is a standard assumption in blind super-resolution [12, 29, 17, 25, 9], and holds with by many common random ensembles, for instance, when the components of are Rademacher random variables taking the values with equal probability or when
are Rademacher random variables taking the values
with equal probability or whenis uniformly sampled from the rows of a Discrete Fourier Transform (DFT) matrix.
Now we present the main result, whose proofs are deferred to Section 6.
Compared with the sample complexity established in  for the nuclear norm minimization method, which is , Theorem 4.1 implies that PGD-VHL is sub-optimal in terms of and . We suspect that it is merely an artifact of our proof since the numerical experiments indicate that there approximately exists a linear relationship between and or and .
Theorem 4.1 implies that PGD-VHL converges to with a linear rate. Therefore, after iterations, we have . Given the iterates returned by PGD-VHL, we can estimate by .
5 Numerical simulations
In this section, we provide numerical results to illustrate the performance of PGD-VHL. The locations of the point source signal is randomly generated from and the amplitudes are selected to be , where is uniformly sampled from and is uniformly sampled from . The coefficients are i.i.d. sampled from standard Gaussian with normalization. The columns of are uniformly sampled from the DFT matrix. The stepsize of PGD-VHL is chosen via backtracking line search and PGD-VHL will be terminated if or a maximum number of iterations is reached. We repeat 20 random trials and record the probability of successful recovery in our tests. A trial is declared to be successful if .
The first experiment studies the recovery ability of PGD-VHL through the framework of phase transition and we compare it with two convex recovery methods: VHL and ANM . Both VHL and ANM are solved by CVX . The tests are conducted with and the varied and . Figure 1(a), 1(b) and 1(c) show the phase transitions of VHL, ANM and PGD-VHL when the locations of point sources are randomly generated, and Figure 1(d), 1(e) and 1(f) illustrate the phase transitions of VHL, ANM and PGD-VHL when the separation condition is imposed. In this figure, white color means successful recovery while black color indicates failure. It is interesting to observe that PGD-VHL has a higher phase transition curve than VHL whether the separation condition is satisﬁed or not. Moreover, by comparing Figure 1(b) with (c), we observe that PGD-VHL is less sensitive to the separation condition than ANM.
In the second experiment, we study the phase transition of PGD-VHL when one of and is ﬁxed. Note that in this test, the separation condition is not imposed for PGD-VHL. Figure 2(a) indicates a approximately linear relationship between and for the successful recovery when the number of point sources is ﬁxed to be . The same linear relationship between and can be observed when the dimension of the subspace is ﬁxed to be , see Figure 2(b). Therefore there exists a gap between our theory and empirical observation and we leave it as future work.
In the third simulation, we investigate the convergence rate of PGD-VHL for different large . The results are shown in Figure 3. The -axis denotes and the -axis represents the iteration number. It can be clearly seen that PGD-VHL converges linearly as shown in our main theorem. Also it is worth pointing out that PGD-VHL can be implemented in high dimensional regimes, where we take , respectively.
Finally, we conduct the tests to demonstrate the robustness of PGD-VHL to additive noise. More specifically, we collect the measurements corrupted by the noise vector
where is the uncontaminated observations, is the standard Gaussian vector with i.i.d. entries and denotes the noise level. In the tests, the noise level is taken from to
, corresponding to the signal-to-noise ratio (SNR) fromto dB. For each , 10 random trials are conducted with . As for the number of measurements, we choose and for comparison. PGD-VHL is set to be terminated when . In Figure 4, the average relative reconstruction error is plotted with SNR. It can be clearly seen that the relationship between the relative reconstruction error and the noise level is linear for PGD-VHL. Moreover, the relative reconstruction error decreases with the increase of the number of measurements.
6 Proof of Theorem 4.1
The proof follows a well established route that has been widely used in non-convex optimization for low rank matrix recovery [7, 33, 3]. In a nutshell, the initialization provided in Algorithm 1 will be shown to lie in a basin of attraction where the sequence returned by Algorithm 1 converges linearly to the true solution. Despite this, the proof details are quite involved and substantially different. We first list two useful lemmas, whose proofs are deferred to Section 6.1 and 6.2.
Suppose is -incoherent and . Then one has
with probability at least .
Let be the sequence returned by Algorithm 1. Denote , where . Then with probability , one has
6.1 Proof of Lemma 6.1
We begin our presentation of the proof with a useful lemma whose proof is provided in Section 6.3.
Suppose that is -incoherent. Then with probability at least , the matrix obeys
where and are absolute constants.
By Lemma 6.3 and the assumption , one has
occurs with probability at least , which implies that
By the definition of in (2.10), it can be seen that , where . Thus we have
To complete the proof, it suﬃces to control . A straightforward computation yields that
6.2 Proof of Lemma 6.2
We first establish a regularity condition in the following lemma whose proof is provided in Section 6.4.
Let , where . Then one has
happens with high probability for all such that , where , and .