 # Expectation Propagation Line Spectral Estimation

Line spectral estimation (LSE) is a fundamental problem in signal processing fields, as it arises in various fields such as radar signal processing and communication fields. This paper develops expectation propagation (EP) based LSE (EPLSE) method. The proposed method automatically estimates the model order, noise variance, and can deal with the nonlinear measurements. Numerical experiments show the excellent performance of EPLSE.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Line spectral estimation  aiming to extract the frequencies of a mixture of sinusoids is a fundamental problem in signal processing fields, as it arises in numerous disciplines in engineering, physics, and natural sciences such as radar signal processing, channel estimation, etc [2, 3]. Classical methods including maximum likelihood (ML)  or subspace methods such as MUSIC and ESPRIT [5, 6, 7]. The ML approach involves maximizing a nonconvex function which has a multimodal shape with a sharp global maximum, which requires accurate initialization. Compared to subspace based approaches, ML performs better when the measurements or the SNR is small.

Compressed sensing (CS) based approaches which exploit the sparsity of the frequency domain are proposed. By discretizing the frequency into a number of grids to construct a dictionary matrix, the original nonlinear frequency estimation problem is transformed to be a sparse linear inverse problem . Since the frequency is continuous-valued, discretization incurs grid mismatch . As a result, off-grid methods are proposed, which refine the grid gradually to overcome the grid mismatch effectively [10, 11, 12, 13, 14].

To work directly with continuously parameterized dictionaries, gridless methods are proposed and can be classified into two categories: atomic norm based

[15, 16, 17, 18, 19, 20, 21, 22]and variational LSE (VALSE) [23, 24]

approaches. The atomic norm based approaches allow for working with an infinite, continuous dictionary and are proved to recover the well separated frequencies perfectly in the noiseless case. While for the VALSE, it treats the frequency as random variables, and it automatically estimates the model order, the noise variance and the posterior probability density functions (PDF) of the frequencies. As for the computation complexity, the atomic norm based approaches involve solving a semidefinite programming (SDP), which is usually very high. In contrast, VALSE can be implemented with much lower computation complexity.

From the Bayesian algorithm point of view, approximate message passing (AMP)  and generalized approximate message passing (GAMP)  are proposed to deal with the sparse signal recovery from linear and nonlinear measurements. It is shown that both AMP and GAMP can be derived from expectation propagation (EP) [27, 28, 29, 30, 31, 32]

. By treating the frequency and amplitudes as random variables and formalize the factor graph, we derive EP LSE (EPLSE). We adopt the Bernoulli Gaussian prior distribution to estimate the model order. In addition, expectation maximization (EM) is incorporated to iteratively learn both the parameters of the prior and output distribution

.

For a univariate function , let and denote the first and second derivative of , respectively.

denote the complex normal distribution of

with mean and covariance , and let denote the von Mises distribution of with mean direction and concentration parameter . Let return the real part. Let

denote the identity matrix of dimension

. For a distribution , let with mean and variances . For a set , let

denotes its cardinality. For a complex vector

, returns its elementwise magnitude.

## Ii Problem Setup

Let be a line spectrum signal consisting of complex sinusoids

 z=K∑k=1a(~θk)~xk, (1)

where is the complex amplitude of the th frequency, is the th frequency, and

 a(θ)=[1,ejθ,⋯,ej(M−1)θ]T. (2)

The LSE undergoes a componentwise (linear or nonlinear) transform, which is described as a conditional PDF

 p(y|z;ωz)=M∏m=1p(ym|zm;ωz), (3)

where are the unknown nuisance parameters. In the following text, we illustrate some examples about .

• The LSE is corrupted by the additive white Gaussian noise and is described by

 y=z+w, (4)

where , is the variance of the noise.

• Incomplete measurements, where only a subset of the measurements is observed.

• Quantized measurements : , where denotes a quantizer which maps the continuous values into discrete numbers.

• Magnitude measurements : , or more generally where is also AWGN.

• Quantized magnitude measurements [38, 39]: , which is a concatenation of the magnitude and quantization nonlinear operations.

• Impulsive noise : and is non-Gaussian. For example, denotes the impulsive noise and follows a Laplace distribution.

For all the listed cases, refers to the variance of the noise.

Since the sparsity level is usually unknown, the line spectrum signal consisting of complex sinusoids is assumed 

 z=N∑i=1xia(θi)≜A(θ)x, (5)

where and . For the frequencies and coefficients, i.i.d. Bernoulli Gaussian prior distribution is used, i.e.,

 p(θ)=N∏n=1p(θn),p(x;ωx)=N∏n=1p(xn;ωx), (6)

where

 p(xn;ωx)=(1−πn)δ(xn)+πnCN(xn;μ0,τ0), (7)

are unknown parameters. Note that Bernoulli Gaussian distribution can be imposed to enforce sparsity. For the prior distribution

, von Mises distribution can be encoded . For uninformative prior, we assume , .

Let

 Θ={θ,x,z}, (8) ω={ωx, ωz} (9)

be the set of all random variables and the model parameters, respectively. According to the Bayes rule, the joint PDF is

 p(y,Θ;ω)=p(y|z)δ(z−A(θ)x)N∏n=1p(θn)p(xn;ωx). (10)

Given the above joint PDF (10), the type II maximum likelihood (ML) estimation of the model parameters is

 ^ωML=argmaxω p(y;ω)=∫p(y,Θ;ω)dzdΘ. (11)

Then the minimum mean square error (MMSE) estimate of the parameters is

 ^Θ=E[Θ|y;ωML], (12)

where the expectation is taken with respect to

 p(Θ|y;^ωML)=p(y,Θ;^ωML)p(y;^ωML) (13)

Directly solving the ML estimate of (11) or the MMSE estimate of (12) are both intractable. As a result, an iterative algorithm is designed in Section III.

## Iii EPLSE Algorithm

The factor graph is presented in Fig. 1 and a delta factor node is introduced 111Such a factor graph was originally introduced in 

for the generalized linear model (GLM), and a unified Bayesian inference framework for GLM was proposed.

. Before derivation, the following notations in Table I is used. Fig. 1: The whole factor graph. Here we introduce a delta factor node δ(⋅), which simplifies the calculation as shown later .

First, we initialize , and . Then we show how to update the messages to perform line spectral estimation.

### Iii-a Update mz→δ(z)

According to EP, can be updated as

 mz→δ(z)=Proj[mδ→z(z)p(y|z;ωoldz)]mδ→z(z)≜Proj[qB(z)]mδ→z(z). (14)

Then we calculate the componentwise posterior means and variances of with respect to as

 zpostB =E[z|qB(z)], (15) vpostB =Var[z|qB(z)]. (16)

Therefore is 222Note that for the linear measurement model , we have , i.e., the projection operation is unnecessary.

 Proj[qB(z)]=CN(z;zpostB,diag(vpostB)). (17)

According to (14), is calculated to be

 mz→δ(z)=CN(z;zextB,diag(vextB)), (18)

where and are

 vextB(t)=(1vpostB(t)−1vextA(t))−1, (19a) zextB(t)=vextB(t)(zpostB(t)vpostB(t)−zextA(t)vextA(t)). (19b)

We also incorporate the expectation maximization (EM) algorithm to learn . The posterior distribution of is approximated as (17). We compute the expected complete log-likelihood function with respect to , and drop the irrelevant terms to have

 S(ωz;ωoldz)=Eq(z|y;ωoldz)[logp(y|z;ωz)]. (20)

Then is updated as

 ωnewz=argmaxωz S(ωz;ωoldz). (21)

For the AWGN model

 y=z+w, (22)

where , the posterior PDF is . Therefore we obtain

 σ2w=∥y−zpostB∥2+1TvpostBM. (23)

For arbitrary , we also obtain an approximate update equation. From the definition of and , we obtain an pseudo measurement model

 ~y=z+~w, (24)

where , and . The noise variance is updated as 

 σ2w=∥~y−zpostB∥2+1TvpostBM. (25)

Note that (25) includes (23) as a special case as for the AWGN case, .

### Iii-B Updating of ~mm→n(θn)

The subfactor graph is presented in Fig. 3. According to EP, is updated as

 ~mm→n(θn) ∝Proj[∫N∏n=1(mn→m(θn)mn→m(xn))δ(zm−(N∑l=1ej(m−1)θlxl))mzm→δ(zm)N∏n=1dxnN∏l=1,l≠ndθldzm]mn→m(θn) =Proj[efm(θn)]mn→m(θn). (26)

Now we calculate the numerator term. First, and implies the following pseudo measurement model

 ~ym =ej(m−1)θ1x1+⋯+ej(m−1)θnxn+⋯+ej(m−1)θNxN+~w =ej(m−1)θnxn+⎛⎝N∑l=1,l≠nej(m−1)θlxl⎞⎠zm+~wm, (27)

where . Conditioned on , we approximate as a Gaussian distribution by averaging over and . Straightforward calculation yields

 E[zm|θn]=ej(m−1)θnE[xn]+∑l≠nE[ej(m−1)θlxl] = ej(m−1)θnxn→m+N∑l=1,l≠nej(m−1)μl→mIm−1(κl→m)I0(κl→m)xl→mv∖n,mm, (28) Var[zm|θn]=σ2n→m+∑l≠nσ2l→m+∑l≠n|xl→m|2⎛⎝1−(Im−1(κl→m)I0(κl→m))2⎞⎠ = N∑l=1σ2l→m+∑l≠n|xl→m|2⎛⎝1−(Im−1(κl→m)I0(κl→m))2⎞⎠, (29)

which does not depend on and we use in the following text. Now we perform integration (III-B) with respect to . Conditioned on , by viewing

 zm|θn∼CN(zmm;E[zm|θn],Var[zm]), (30)

and

 ~ym=zm+~wm, (31)

as the prior and likelihood of , respectively, we perform integration over to yield

 ~ym∼CN(~ym;ej(m−1)θnxn→m+v∖n,mm,Var[zm]+~σ2m). (32)

Then is

 fm(θn) =−|ym−(ej(m−1)θnxn→m+v∖n,mm)|2Var[zm]+~σ2m+lnmn→m(θn)+const =2R{yr∖n,mx∗n→me−j(m−1)θn}/(~σ2m+Var[zm])+lnmn→m(θn) (33) =2|yr∖n,mxn→m|cos((m−1)θn+∠xn→m−∠yr∖n,m)/(~σ2m+Var[zm])+κn→mcos(θn−μn→m), (34)

where

 yr∖n,m=ym−v∖n,m. (35)

Then we project as a von Mises distribution

 Proj[efm(θn)]=VM(θn;μm,n,κm,n). (36)

Note that the PDF proportional to is a product of wrapped normal distribution and von Mises distribution. Numerically, it was found that the maximum of is near . Thus we perform a single Newton refinement at and obtain the approximated von Mises distribution, as shown in VI-A.

As a result, using (III-B) we have

 ~κm→nej~μm→n=κm,nejμm,n−κn→mejμn→m. (37)

### Iii-C Updating of mn→m(θn)

The subfactor graph is presented in Fig. 4.

 mn→m(θn)∝Proj[M∏m=1~mm→n(θn)p(θn)]~mm→n(θn) ∝Proj[M∏m=1VM(θn;~μm→n,~κm→n)p(θn)]VM(θn;~μm→n,~κm→n) ≜VM(θn;~μn,~κn)VM(θn;~μm→n,~κm→n), (38)

where

 κn→mejμn→m=~κnej~μn−~κm→nej~μm→n. (39)

Given that we do not have any knowledge of prior distribution , and is calculated to be

 κn→mejμn→m=M∑l=1,l≠m~κl→nej~μl→n. (40)

### Iii-D Updating of ~mm→n(xn)

The subfactor graph is presented in Fig. 5. can be updated as

 ~mm→n(xn) =Proj[∫N∏n=1(mn→m(θn)mn→m(xn))δ(zm−(N∑l=1ej(m−1)θlxl))mzm→δ(zm)N∏n=1dθnN∏l=1,l≠ndxl]mn→m(xn) (41) =Proj[mn→m(xn)egm(xn)]mn→m(xn). (42)

Similar to (III-B), can be equivalently formulated as

 ∫(N∏n=1mn→m(θn))⎛⎝N∏l=1,l≠nml→m(xl)⎞⎠δ(zm−(N∑l=1ej(m−1)θlxl))mzm→δ(zm)N∏n=1dθnN∏l=1,l≠ndxl = ∫(N∏n=1mn→m(θn))⎛⎝N∏l=1,l≠nml→m(xl)⎞⎠p(~ym|zm) ×δ⎛⎝zm−ej(m−1)θnxn−⎛⎝∑l≠nej(m−1)θlxl⎞⎠⎞⎠N∏n=1dθnN∏l=1,l≠ndxldzm. (43)

Conditioned on , we approximate as a Gaussian distribution by averaging over and . Straightforward calculation yields

 E[zm|xn]=E[ej(m−1)θn]xn+∑l≠nE[ej(m−1)θlxl] (44) =ej(m−1)μn→mIm−1(κn→m)I0(κn→m)xn+∑l≠nej(m−1)μl→mIm−1(κl→m)I0(κl→m)xl→mv∖n,m (45) =λn→mxn+v∖n,m, (46) Var[zm|xn]=|xn|2⎛⎝1−(Im−1(κn→m)I0(κn→m))2⎞⎠+∑l≠nσ2l→m+∑l≠n|xl→m|2⎛⎝1−(Im−1(κl→m)I0(κl→m))2⎞⎠ = δ2n→m|xn|2+ν∖n,m, (47)

Thus

 gm(xn) =(≈)−|~ym−v∖n,m−λn→mxn|2Var[zm|xn]+~σ2m−ln(Var[zm|xn]+~σ2m)+const =−|~ym−v∖n,m−λn→mxn|2δ2n→m|xn|2+ν∖n,m+~σ2m−ln(δ2n→m|xn|2+ν∖n,m+~σ2m)+const. (48)

Then we project as a Gaussian distribution. For simplicity, we first project as a Gaussian distribution. Then we have

 ~mm→n(xn)=CN(xn;~xm→n,~σ2m→n)=Proj[egm(xn)]. (49)

According to (III-D), we approximate and as

 ~xm→n =~ym−v∖n,mλn→m, (50a) ~σ2m→n =δ2n→m|xn|2+ν∖n,m+~σ2m|λn→m|2|xn=~xm→n=δ2n→m|~xm→n|2+ν∖n,m+~σ2m|λn→m|2. (50b)