I Introduction
The accuracy of channel estimation is a crucial factor determining the overall performance in wireless communication systems and networks, in terms of biterrorrate (BER) and throughput but also of location accuracy when these systems are equipped with positioning capabilities. When the underlying structure of the channel responses to be estimated is sparse, compressive sensing and sparse signal representation can be very powerful tools for the design of channel estimators.
Compressive sensing techniques have attracted considerable attention in recent years due to their ability to be incorporated in a wide range of applications. Typically, the signal model considered reads
(1) 
where
is the measurement vector and
is the known dictionary matrix with column vectors , . The vector represents the samples of additive white Gaussian noise with covariance matrix and precision parameter . Finally, is the vector of weights whose entries are mostly zero. By obtaining a sparse estimate of we can accurately represent with a minimal number of column vectors in .In the literature many Bayesian and nonBayesian methods have been proposed for sparse signal representation. The latter methods include the very popular convex optimization based methods for LASSO regression
[1, 2] and greedy constructive algorithms such as orthogonal matching pursuit (OMP) [3] and compressive sampling MP (CoSaMP) [4]. In sparse Bayesian learning (SBL) [5, 6], a prior probability density function (pdf)
is specified so that a sparse estimate is obtained. A widely applied SBL algorithm is the relevance vector machine (RVM) [5], where a hierarchical representation^{1}^{1}1The hierarchical representation involves specifying a conditional prior pdfand a hyperprior pdf
. of the studentt pdf is used for the prior pdf . An EM algorithm is then derived based on this prior model for the estimation of the weights. Similarly, [7] uses the EM algorithm based on a hierarchical representation of the Laplace pdf.^{2}^{2}2Note that the hierarchical representation of the Laplace pdf used in [7] and [8] is only valid for realvalued variables. In [9], we extend this representation to cover complexvalued variables as well. This algorithm can be seen as the Bayesian version of the LASSO estimator. Though the sparse Bayesian inference algorithms proposed in [5] and [7] are guaranteed to converge, they are also known to suffer from high computational complexity and low convergence rate  many iterations are needed before they terminate. To circumvent this, a fast Bayesian inference algorithm, known as FastRVM, is proposed in [10]. Following this approach, the FastLaplace algorithm is formulated in [8]. However, even though the algorithms in [10] and [8] do lead to faster convergence than their EM counterparts in [5] and [7], they still suffer from slow convergence especially in low and moderate signaltonoise ratio (SNR) regimes as we show in this paper.The estimation of the wireless channel is a practical example where compressive sensing techniques are utilized. The reason is that the response of the wireless channel typically holds a few dominant multipath components and therefore has the characteristic of being sparse [11]. When sparse channel models are assumed it seems natural to use tools available from compressive sensing and sparse signal representation to estimate the parameters of said channel models. LASSO regression, OMP, and CoSaMP have been widely applied to the problem of pilotassisted channel estimation in orthogonal frequencydivision multiplexing (OFDM), cf., [12, 13, 14]. Bayesian methods have also been previously proposed for wireless communication systems. Examples include the estimation of the dominant multipath components in the response of wireless channels [15] and joint channel estimation and decoding for clustered sparse channels [16]. In [17], we have proposed a variational Bayesian inference algorithm for the estimation of the wireless channel in OFDM. The resulting estimator, however, suffers from the same complexity and convergence rate issues as those in [5] and [7].
In this paper, we present a fast iterative sparse Bayesian estimation algorithm for pilotassisted channel estimation in OFDM wireless receivers. We follow the fast inference framework outlined in [10] based on the hierarchical prior model of the Bessel K pdf for sparse estimation that we propose in [9, 17]. Our estimator drastically increases the convergence speed compared to similar algorithms such as FastRVM and FastLaplace with no penalization in performance and achieves favorable BER and meansquared error (MSE) performance as compared to both Bayesian and nonBayesian stateoftheart methods.
Ii System Description
Iia OFDM Signal Model
We consider a singleinput singleoutput OFDM system with subcarriers. A cyclic prefix (CP) is added to eliminate intersymbol interference between consecutive OFDM blocks and the channel response is assumed static during the transmission of each OFDM block. The received baseband signal for a given OFDM block reads
(2) 
The diagonal matrix contains the complexmodulated symbols. The entries in are the samples of the channel frequency response at all subcarriers. Finally,
is a zeromean complex symmetric Gaussian random vector whose entries are independent with variance
.Let the pilot pattern be characterized by the set containing the indices of subcarriers reserved for pilot transmission. The received signals observed at the pilot positions are then divided each by their corresponding pilot symbol in to produce the vector of observations
(3) 
where and are defined analogously to . We assume that all pilot symbols hold unit power so that the statistics of the noise term remain unchanged.
We consider a frequencyselective, blockfading wireless channel with impulse response modeled as a sum of multipath components:
(4) 
In this expression, and are respectively the complex weight and the (continuous) delay of the th multipath component, is the total number of multipath components, and is the Dirac delta function. The channel parameters , , and
are all random variables and may vary from the transmission of one OFDM block to the next. Additional details regarding the assumptions on the channel model are provided in Section
IV.By using the parametric model (
4) of the channel, we can rewrite (3) as(5) 
with , , , , and with entries
(6) 
where denotes the frequency of the th pilot subcarrier.
IiB Compressive Sensing Signal Model
In order to apply sparse representation methods for the estimation of in (2), we must first recast the signal model in (5) into the form of (1). The main limitation to do so is that the delay entries in are, a priori, unknown at the receiver. To circumvent this, we consider a grid of uniformlyspaced delay samples in the interval :
(7) 
with such that is an integer. The symbols and denote respectively the maximum excess delay of the channel and the sampling time. The dictionary matrix is now defined as . Thus, the entries of are of the form (6) with argument . The number of columns in is thereby inversely proportional to the selected delay resolution . The selection of impacts the dimension of . By assuming a vector with many more entries than the number of multipath components, we expect most of the entries in to be zero. Therefore, we use compressive sensing techniques to obtain sparse estimates of .
Notice that the signal model (1) with is an approximation of the true signal model (5). The estimate of the channel vector at the pilot subcarriers is then . In order to estimate the full channel in (2) the dictionary is appropriately expanded to include a row corresponding to each of the subcarrier frequencies. Thus, with
(8) 
where denotes the frequency of the th subcarrier.
Iii Bayesian Inference Learning
We now present the iterative sparse Bayesian inference algorithm for channel estimation proposed in this paper. First, we detail the hierarchical prior model leading to the Bessel K pdf for each entry of . Based on this model, we apply a fast Bayesian algorithm to estimate the unknown model parameters. Finally, we briefly comment on the relationship between our algorithm and other similar stateoftheart approaches.
Iiia The Probabilistic Model
Instead of working directly with the prior pdf , in the SBL framework, is usually modeled using a twolayer hierarchical prior model involving a conditional prior pdf and a hyperprior pdf . With this design, the resulting probabilistic model for signal model (1) is given by
(9) 
Due to (1), is multivariate Gaussian: .^{3}^{3}3Here, denotes a complex Gaussian pdf with mean vector and covariance matrix . We shall also make use of , which denotes a gamma pdf with shape parameter and rate parameter . For the noise precision , we select a constant prior, i.e., .
The design of the factors and for each weight heavily influences the sparsityinducing property of the prior model. We adopt the hierarchical structure of the Bessel K pdf, where the first layer is defined as and the second layer is selected to be . With these choices, we compute the marginal pdf
(10) 
In this expression, is the modified Bessel function of the second kind and order . The parameter determines the sparsityinducing property of the Bessel K pdf [9]. The selection greatly enforces sparseness on the estimate as more probability mass concentrates around the origin. As a consequence, the mode of the resulting posterior pdf is more likely to be found close to the axes. However, selecting a too high () may lead to overfitting and thereby nonsparse results. Thus, this parameter has a similar functionality as the parameter in the FOCUSS algorithm [18].
IiiB Fast Iterative Bayesian Inference
Given fixed estimates and , the posterior pdf is a multivariate Gaussian, i.e., with
(11)  
(12) 
where
. The hyperparameters
and are estimated by maximizing [5, 6](13) 
The cost function (13) can be iteratively maximized using the EM algorithm by noting that and are complete data for and . Following the classical EM formulation, the Estep equivalently computes (11)(12) and the Mstep computes
(14)  
(15) 
The expectation in the above expressions are evaluated with respect to the posterior pdf , where and are the estimates computed in the previous iteration. After an initialization procedure, the individual quantities in (11)–(12) and (14)–(15) are iteratively updated until convergence.
The above EM algorithm suffers from two main disadvantages: high computational complexity of the update (11) and low rate of convergence. In order to overcome the first drawback a greedy procedure as in [10] can be adopted: as most of the entries in are mostly zero, one may start out with an “empty” dictionary matrix and incrementally fill the dictionary by adding column vectors. To circumvent the drawback of low convergence rate, we compute the stationary points of the EM update in (14). For this, we fix , at their current estimates, while computing a sequence of estimates according to (14) for .^{4}^{4}4Notice that in (14) is a function of as seen from (11) and (12). In this way, we update the estimates of the components in sequentially, instead of jointly. The generalized EM framework justifies this modification. As shown in [9], corresponds in fact to the (local) extrema of
(16) 
with being a constant encompassing the terms independent of and the definitions , , and .^{5}^{5}5For the derivation of , we exploit that is Gaussian with mean zero and covariance matrix . Note that the definition domain of is . Now, taking the derivative of with respect to and equating the result to zero yields the cubic equation
(17) 
In general (17) has three solutions when ranges through . These can be determined analytically with a feasible solution for constrained to be positive. The analysis of the sparsityinducing property of the Bessel K pdf in [9] shows that we should select small. When , (17) has at least one negative solution as . Therefore, (17) has either no real positive solution or two real positive solutions and . In the former case, no feasible solution to exists and the corresponding column vector is not added to the dictionary. In the latter case, we simply select if and otherwise.
We follow the approach in [10] and realize the proposed fast iterative Bayesian inference algorithm by computing each , , and selecting the one that gives rise to the greatest increase in between two consecutive iterations. Depending on the new value , we may then add, delete, or keep the corresponding column vector in the dictionary. The quantities , , and are updated using (11), (12), and (15) together with the computation of and , . The computational complexity of each iteration is when , where is the number of nonzero components in . If is not updated between two consecutive iterations, , , , and can be updated efficiently according to the update procedures in [10]. In this case the cost in complexity is only . We refer to the proposed algorithm as FastBesselK.
IiiC FastRVM and FastLaplace
The FastBesselK algorithm described in Section IIIB is parametrized by and . In the following, we will show how, by appropriately setting these parameters, we can obtain FastRVM [10] and FastLaplace [8] as particular instances of FastBesselK. For FastRVM, the estimation of relies on the maximization of the likelihood , i.e., a constant prior is assumed for the hyperprior, . Hence, by selecting and we obtain the cost function used in [10]. In case of FastLaplace [8], the exponential pdf is selected for . As the gamma pdf reduces to the exponential pdf by choosing its shape parameter , we obtain used in [8] from this choice.
Iv Numerical Results
Sampling time,  32.55 ns 

CP length  4.69 s / 144 
Subcarrier spacing  15 kHz 
Pilot pattern  Evenly spaced, QPSK 
Modulation  QPSK () 
Subcarriers,  1200 
OFDM symbols  1 
Information bits  1091 
Channel interleaver  Random 
Convolutional code  
Decoder  BCJR algorithm [19] 
We perform Monte Carlo simulations to evaluate the performance of FastBesselK derived in Section III. We consider a scenario inspired by the 3GPP LTE standard [20] with the settings specified in Table I. In all investigations conducted we fix the spectral efficiency of information bits per subcarrier, which corresponds to a rate code. We note that we employ a rate1/3 convolutional code and use puncturing in order to increase the spectral efficiency. Unless otherwise specified, evenlyspaced pilot symbols are used.
The multipath channel (4) is based on the model used in [21] where, for each realization of the channel, the total number of multipath components
is Poisson distributed with mean
and the delays ,, are independent and uniformly distributed random variables drawn from the continuous interval
. Conditioned on , , the weights , , are independent, and weighthas a zeromean complex circular symmetric Gaussian distribution with variance
and parameters .^{6}^{6}6The parameter is computed such that . In the considered simulation scenario, , , and . In this way form a marked Poisson process.For FastBesselK, we set and in all investigations. We empirically observed that this is a proper selection of parameters for channel models with both few and numerous multipath components. FastBesselK is compared to two Bayesian methods, FastRVM [10]^{7}^{7}7The software is available at http://people.ee.duke.edu/~lcarin/BCS.html. and FastLaplace [8]^{8}^{8}8The software is available at http://ivpl.eecs.northwestern.edu/.. For these three algorithms the noise precision is estimated at every third iteration with the initialization [10]. The stopping criterion is based on the difference in between two consecutive iterations [22]. Two nonBayesian methods, LASSO and OMP, are also included for comparison. For LASSO, we use the sparse reconstruction by separable approximation (SpaRSA) algorithm [23]^{9}^{9}9The software is available online at http://www.lx.it.pt/~mtf/SpaRSA/. The required regularization parameter is chosen as [24], which has been empirically observed to provide satisfactory results. For OMP, an a priori estimate of the sparsity of needs to be set. In all investigations we use . Finally, the commonly employed robustly designed Wiener filter (RWF) [25] for OFDM channel estimation is used as a reference.
Unless otherwise specified, we set the number of rows in to (pilot subcarriers) and the number of columns in to , which corresponds to a delay resolution of . The performance versus SNR is compared in Figs. 1(a)1(b). From Fig. 1(a), we see that FastBesselK and FastLaplace outperform the other algorithms in terms of BER across all the SNR range considered. Specifically, at 1 BER the gain is apporiximatly 1 dB over FastRVM, LASSO, and OMP and 2 dB over RWF. Fig. 1(b) shows how FastBesselK yields a lower MSE than the other algorithms. Surprisingly, the improved performance in MSE achieved by FastBesselK does not lead to a better BER performance when compared to FastLaplace.
The convergence speed of the Bayesian iterative algorithms is shown in Fig. 1(c). Here, FastBesselK achieves a remarkable improvement compared to FastRVM and FastLaplace with MSE values converging in about 1030 iterations. As Fig. 1(c) shows, there is no guarantee that the MSE is reduced at each iteration, due to the objective function (13). FastRVM and FastLaplace suffer a significant increase in MSE after a certain number of iterations; this drawback is significantly mitigated in the case of FastBesselK. The superior convergence speed of FastBesselK can be explained by observing Figs. 2(a)2(b). Fig. 2(b) shows that the improvement in convergence rate comes as the Besssel K prior can handle channels with few multipath components better (i.e., yields lower MSE). As a consequence, the other methods tend to add more column vectors to the dictionary matrix, thus, increasing the number of add, delete, and reestimate iterations as seen from Fig. 2(a).
Fig. 2(c) shows the MSE versus the number of pilots . We observe that, for a given MSE performance, FastBesselK is able to significantly reduce the required pilot overhead. In particular, FastBesselK achieves an MSE on pair with LASSO, OMP, and RWF using less than half the number of pilots. Finally, in Fig. 2(d) we evaluate the MSE performance versus available delay resolution determined by the number of columns in (cf., Section II).^{10}^{10}10Naturally, RWF does not require a dictionary matrix to be specified and its performance is thereby independent of . Several observations are worth being noticed. FastBesselK leads to a noticeable MSE performance gain as the delay resolution improves as opposed to the other algorithms. In fact, it appears that, besides FastBesselK, only OMP is able to exploit the improved delay resolution. The reason for this is that LASSO, FastRVM, and FastLaplace produce a solution with an increasing number of nonzero components in when increasing (there are simply more column vectors in to be added or deleted). Thus, these algorithms also require an increasing amount of iterations to be run as opposed to FastBesselK (results not shown).
V Conclusion
In this work, we presented a fast iterative Bayesian inference channel estimation algorithm based on the hierarchical Bayesian prior model of the Bessel K probability density function. Following the framework for fast Bayesian inference in [10], we proposed an algorithm that significantly lowers the number of needed iterations as compared to stateoftheart Bayesian inference methods with no penalization in performance. This improvement in convergence rate is directly related to the Bessel K prior’s ability to handle channels with few multipath components better than other commonly employed prior models. Furthermore, our algorithm shows improved performance when compared to both Bayesian and nonBayesian stateoftheart methods.
Acknowledgment
This work was supported in part by the 4GMCT cooperative research project, funded by Intel Mobile Communications, Agilent Technologies, Aalborg University and the Danish National Advanced Technology Foundation, and by the project ICT248894 Wireless Hybrid Enhanced Mobile Radio Estimators (WHERE2).
References
 [1] R. Tibshirani, “Regression shrinkage and selection via the LASSO,” J. R. Statist. Soc., vol. 58, pp. 267–288, 1994.
 [2] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Journal on Scientific Computing, vol. 20, pp. 33–61, 1998.
 [3] J. A. Tropp, “Greed is good: Algorithmic results for sparse approximation,” IEEE Trans. on Inf. Theory, vol. 50, pp. 2231–2242, 2004.
 [4] D. Needell and J. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301–321, 2009.

[5]
M. Tipping, “Sparse Bayesian learning and the relevance vector machine,”
Journal of Machine Learning Research
, vol. 1, pp. 211–244, 2001.  [6] D. Wipf and B. Rao, “Sparse Bayesian learning for basis selection,” IEEE Trans. on Signal Proc., vol. 52, no. 8, pp. 2153 – 2164, 2004.

[7]
M. Figueiredo, “Adaptive sparseness for supervised learning,”
IEEE Trans. on Pattern Analysis and Machine Intel., vol. 25, no. 9, pp. 1150–1159, 2003.  [8] S. D. Babacan, R. Molina, and A. K. Katsaggelos, “Bayesian compressive sensing using Laplace priors,” IEEE Trans. on Image Proc., vol. 19, no. 1, pp. 53–63, 2010.
 [9] N. L. Pedersen, D. Shutin, C. N. Manchón, and B. H. Fleury, “Sparse estimation using Bayesian hierarchical prior modeling for real and complex models,” in preparation, 2013.

[10]
M. E. Tipping and A. C. Faul, “Fast marginal likelihood maximisation for
sparse Bayesian models,” in
Proc. 9th International Workshop on Artificial Intelligence and Statistics
, Key West, FL, 2003.  [11] W. Bajwa, J. Haupt, A. Sayeed, and R. Nowak, “Compressed channel sensing: A new approach to estimating sparse multipath channels,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1058–1076, 2010.
 [12] C. R. Berger, S. Zhou, J. C. Preisig, and P. Willett, “Sparse channel estimation for multicarrier underwater acoustic communication: From subspace methods to compressed sensing,” IEEE Trans. on Signal Proc., vol. 58, no. 3, pp. 1708–1721, 2010.
 [13] J. Huang, C. R. Berger, S. Zhou, and J. Huang, “Comparison of basis pursuit algorithms for sparse channel estimation in underwater acoustic OFDM,” in Proc. OCEANS 2010 IEEE  Sydney, pp. 1–6, 2010.
 [14] G. Tauböck, F. Hlawatsch, D. Eiwen, and H. Rauhut, “Compressive estimation of doubly selective channels in multicarrier systems: leakage effects and sparsityenhancing processing,” IEEE Journal of Selected Topics in Signal Proc., vol. 4, no. 2, pp. 255–271, 2010.
 [15] D. Shutin and B. H. Fleury, “Sparse variational Bayesian SAGE algorithm with application to the estimation of multipath wireless channels,” IEEE Trans. on Signal Proc., vol. 59, pp. 3609–3623, 2011.
 [16] P. Schniter, “A messagepassing receiver for BICMOFDM over unknown clusteredsparse channels,” IEEE Journal of Selected Topics in Signal Proc., vol. 5, no. 8, pp. 1662–1474, 2011.
 [17] N. L. Pedersen, C. N. Manchón, D. Shutin, and B. H. Fleury, “Application of Bayesian hierarchical prior modeling to sparse channel estimation,” in Proc. IEEE Int. Communications Conf. (ICC), pp. 3487–3492, 2012.
 [18] I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using focuss: a reweighted minimum norm algorithm,” IEEE Trans. on Signal Proc., vol. 45, no. 3, pp. 600–616, 1997.
 [19] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. on Inf. Theory, vol. 20, no. 2, pp. 284–287, 1974.
 [20] 3rd Generation Partnership Project (3GPP) Technical Specification, “Evolved universal terrestrial radio access (eutra); base station (bs) radio transmission and reception,” TS 36.104 V8.4.0, Tech. Rep., 2008.
 [21] M. L. Jakobsen, K. Laugesen, C. Navarro Manchón, G. E. Kirkelund, C. Rom, and B. Fleury, “Parametric modeling and pilotaided estimation of the wireless multipath channel in OFDM systems,” in Proc. IEEE Int Communications Conf. (ICC), pp. 1–6, 2010.
 [22] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans. on Signal Proc., vol. 56, no. 6, pp. 2346–2356, 2008.
 [23] S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo, “Sparse reconstruction by separable approximation,” IEEE Trans. on Sig. Proc., vol. 57, no. 7, pp. 2479–2493, 2009.
 [24] Z. BenHaim and Y. C. Eldar, “The CramérRao bound for sparse estimation,” 2009, arXiv:0905.4378v4.

[25]
O. Edfors, M. Sandell, J.J. van de Beek, S. K. Wilson, and P. O. Börjesson, “OFDM channel estimation by singular value decomposition,”
IEEE Trans. on Communications, vol. 46, no. 7, pp. 931–939, 1998.
Comments
There are no comments yet.