A Fast Iterative Bayesian Inference Algorithm for Sparse Channel Estimation

03/06/2013 ∙ by Niels Lovmand Pedersen, et al. ∙ Aalborg University 0

In this paper, we present a Bayesian channel estimation algorithm for multicarrier receivers based on pilot symbol observations. The inherent sparse nature of wireless multipath channels is exploited by modeling the prior distribution of multipath components' gains with a hierarchical representation of the Bessel K probability density function; a highly efficient, fast iterative Bayesian inference method is then applied to the proposed model. The resulting estimator outperforms other state-of-the-art Bayesian and non-Bayesian estimators, either by yielding lower mean squared estimation error or by attaining the same accuracy with improved convergence rate, as shown in our numerical evaluation.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The accuracy of channel estimation is a crucial factor determining the overall performance in wireless communication systems and networks, in terms of bit-error-rate (BER) and throughput but also of location accuracy when these systems are equipped with positioning capabilities. When the underlying structure of the channel responses to be estimated is sparse, compressive sensing and sparse signal representation can be very powerful tools for the design of channel estimators.

Compressive sensing techniques have attracted considerable attention in recent years due to their ability to be incorporated in a wide range of applications. Typically, the signal model considered reads



is the measurement vector and

is the known dictionary matrix with column vectors , . The vector represents the samples of additive white Gaussian noise with covariance matrix and precision parameter . Finally, is the vector of weights whose entries are mostly zero. By obtaining a sparse estimate of we can accurately represent with a minimal number of column vectors in .

In the literature many Bayesian and non-Bayesian methods have been proposed for sparse signal representation. The latter methods include the very popular convex optimization based methods for LASSO regression

[1, 2] and greedy constructive algorithms such as orthogonal matching pursuit (OMP) [3] and compressive sampling MP (CoSaMP) [4]. In sparse Bayesian learning (SBL) [5, 6]

, a prior probability density function (pdf)

is specified so that a sparse estimate is obtained. A widely applied SBL algorithm is the relevance vector machine (RVM) [5], where a hierarchical representation111The hierarchical representation involves specifying a conditional prior pdf

and a hyperprior pdf

of the student-t pdf is used for the prior pdf . An EM algorithm is then derived based on this prior model for the estimation of the weights. Similarly, [7] uses the EM algorithm based on a hierarchical representation of the Laplace pdf.222Note that the hierarchical representation of the Laplace pdf used in [7] and [8] is only valid for real-valued variables. In [9], we extend this representation to cover complex-valued variables as well. This algorithm can be seen as the Bayesian version of the LASSO estimator. Though the sparse Bayesian inference algorithms proposed in [5] and [7] are guaranteed to converge, they are also known to suffer from high computational complexity and low convergence rate - many iterations are needed before they terminate. To circumvent this, a fast Bayesian inference algorithm, known as Fast-RVM, is proposed in [10]. Following this approach, the Fast-Laplace algorithm is formulated in [8]. However, even though the algorithms in [10] and [8] do lead to faster convergence than their EM counterparts in [5] and [7], they still suffer from slow convergence especially in low and moderate signal-to-noise ratio (SNR) regimes as we show in this paper.

The estimation of the wireless channel is a practical example where compressive sensing techniques are utilized. The reason is that the response of the wireless channel typically holds a few dominant multipath components and therefore has the characteristic of being sparse [11]. When sparse channel models are assumed it seems natural to use tools available from compressive sensing and sparse signal representation to estimate the parameters of said channel models. LASSO regression, OMP, and CoSaMP have been widely applied to the problem of pilot-assisted channel estimation in orthogonal frequency-division multiplexing (OFDM), cf., [12, 13, 14]. Bayesian methods have also been previously proposed for wireless communication systems. Examples include the estimation of the dominant multipath components in the response of wireless channels [15] and joint channel estimation and decoding for clustered sparse channels [16]. In [17], we have proposed a variational Bayesian inference algorithm for the estimation of the wireless channel in OFDM. The resulting estimator, however, suffers from the same complexity and convergence rate issues as those in [5] and [7].

In this paper, we present a fast iterative sparse Bayesian estimation algorithm for pilot-assisted channel estimation in OFDM wireless receivers. We follow the fast inference framework outlined in [10] based on the hierarchical prior model of the Bessel K pdf for sparse estimation that we propose in [9, 17]. Our estimator drastically increases the convergence speed compared to similar algorithms such as Fast-RVM and Fast-Laplace with no penalization in performance and achieves favorable BER and mean-squared error (MSE) performance as compared to both Bayesian and non-Bayesian state-of-the-art methods.

Ii System Description

Ii-a OFDM Signal Model

We consider a single-input single-output OFDM system with subcarriers. A cyclic prefix (CP) is added to eliminate inter-symbol interference between consecutive OFDM blocks and the channel response is assumed static during the transmission of each OFDM block. The received baseband signal for a given OFDM block reads


The diagonal matrix contains the complex-modulated symbols. The entries in are the samples of the channel frequency response at all subcarriers. Finally,

is a zero-mean complex symmetric Gaussian random vector whose entries are independent with variance


Let the pilot pattern be characterized by the set containing the indices of subcarriers reserved for pilot transmission. The received signals observed at the pilot positions are then divided each by their corresponding pilot symbol in to produce the vector of observations


where and are defined analogously to . We assume that all pilot symbols hold unit power so that the statistics of the noise term remain unchanged.

We consider a frequency-selective, block-fading wireless channel with impulse response modeled as a sum of multipath components:


In this expression, and are respectively the complex weight and the (continuous) delay of the th multipath component, is the total number of multipath components, and is the Dirac delta function. The channel parameters , , and

are all random variables and may vary from the transmission of one OFDM block to the next. Additional details regarding the assumptions on the channel model are provided in Section 


By using the parametric model (

4) of the channel, we can rewrite (3) as


with , , , , and with entries


where denotes the frequency of the th pilot subcarrier.

Ii-B Compressive Sensing Signal Model

In order to apply sparse representation methods for the estimation of in (2), we must first recast the signal model in (5) into the form of (1). The main limitation to do so is that the delay entries in are, a priori, unknown at the receiver. To circumvent this, we consider a grid of uniformly-spaced delay samples in the interval :


with such that is an integer. The symbols and denote respectively the maximum excess delay of the channel and the sampling time. The dictionary matrix is now defined as . Thus, the entries of are of the form (6) with argument . The number of columns in is thereby inversely proportional to the selected delay resolution . The selection of impacts the dimension of . By assuming a vector with many more entries than the number of multipath components, we expect most of the entries in to be zero. Therefore, we use compressive sensing techniques to obtain sparse estimates of .

Notice that the signal model (1) with is an approximation of the true signal model (5). The estimate of the channel vector at the pilot subcarriers is then . In order to estimate the full channel in (2) the dictionary is appropriately expanded to include a row corresponding to each of the subcarrier frequencies. Thus, with


where denotes the frequency of the th subcarrier.

Iii Bayesian Inference Learning

We now present the iterative sparse Bayesian inference algorithm for channel estimation proposed in this paper. First, we detail the hierarchical prior model leading to the Bessel K pdf for each entry of . Based on this model, we apply a fast Bayesian algorithm to estimate the unknown model parameters. Finally, we briefly comment on the relationship between our algorithm and other similar state-of-the-art approaches.

Iii-a The Probabilistic Model

Instead of working directly with the prior pdf , in the SBL framework, is usually modeled using a two-layer hierarchical prior model involving a conditional prior pdf and a hyperprior pdf . With this design, the resulting probabilistic model for signal model (1) is given by


Due to (1), is multivariate Gaussian: .333Here, denotes a complex Gaussian pdf with mean vector and covariance matrix . We shall also make use of , which denotes a gamma pdf with shape parameter and rate parameter . For the noise precision , we select a constant prior, i.e., .

The design of the factors and for each weight heavily influences the sparsity-inducing property of the prior model. We adopt the hierarchical structure of the Bessel K pdf, where the first layer is defined as and the second layer is selected to be . With these choices, we compute the marginal pdf


In this expression, is the modified Bessel function of the second kind and order . The parameter determines the sparsity-inducing property of the Bessel K pdf [9]. The selection greatly enforces sparseness on the estimate as more probability mass concentrates around the origin. As a consequence, the mode of the resulting posterior pdf is more likely to be found close to the axes. However, selecting a too high () may lead to over-fitting and thereby non-sparse results. Thus, this parameter has a similar functionality as the parameter in the FOCUSS algorithm [18].

Iii-B Fast Iterative Bayesian Inference

Given fixed estimates and , the posterior pdf is a multivariate Gaussian, i.e., with



. The hyperparameters

and are estimated by maximizing [5, 6]


The cost function (13) can be iteratively maximized using the EM algorithm by noting that and are complete data for and . Following the classical EM formulation, the E-step equivalently computes (11)-(12) and the M-step computes


The expectation in the above expressions are evaluated with respect to the posterior pdf , where and are the estimates computed in the previous iteration. After an initialization procedure, the individual quantities in (11)–(12) and (14)–(15) are iteratively updated until convergence.

The above EM algorithm suffers from two main disadvantages: high computational complexity of the update (11) and low rate of convergence. In order to overcome the first drawback a greedy procedure as in [10] can be adopted: as most of the entries in are mostly zero, one may start out with an “empty” dictionary matrix and incrementally fill the dictionary by adding column vectors. To circumvent the drawback of low convergence rate, we compute the stationary points of the EM update in (14). For this, we fix , at their current estimates, while computing a sequence of estimates according to (14) for .444Notice that in (14) is a function of as seen from (11) and (12). In this way, we update the estimates of the components in sequentially, instead of jointly. The generalized EM framework justifies this modification. As shown in [9], corresponds in fact to the (local) extrema of


with being a constant encompassing the terms independent of and the definitions , , and .555For the derivation of , we exploit that is Gaussian with mean zero and covariance matrix . Note that the definition domain of is . Now, taking the derivative of with respect to and equating the result to zero yields the cubic equation


In general (17) has three solutions when ranges through . These can be determined analytically with a feasible solution for constrained to be positive. The analysis of the sparsity-inducing property of the Bessel K pdf in [9] shows that we should select small. When , (17) has at least one negative solution as . Therefore, (17) has either no real positive solution or two real positive solutions and . In the former case, no feasible solution to exists and the corresponding column vector is not added to the dictionary. In the latter case, we simply select if and otherwise.

Fig. 1: Performance comparison of the different algorithms: we have , , and . In (c) the SNR is fixed at 5 dB, 10 dB, and 15 dB.

We follow the approach in [10] and realize the proposed fast iterative Bayesian inference algorithm by computing each , , and selecting the one that gives rise to the greatest increase in between two consecutive iterations. Depending on the new value , we may then add, delete, or keep the corresponding column vector in the dictionary. The quantities , , and are updated using (11), (12), and (15) together with the computation of and , . The computational complexity of each iteration is when , where is the number of nonzero components in . If is not updated between two consecutive iterations, , , , and can be updated efficiently according to the update procedures in [10]. In this case the cost in complexity is only . We refer to the proposed algorithm as Fast-BesselK.

Iii-C Fast-RVM and Fast-Laplace

The Fast-BesselK algorithm described in Section III-B is parametrized by and . In the following, we will show how, by appropriately setting these parameters, we can obtain Fast-RVM [10] and Fast-Laplace [8] as particular instances of Fast-BesselK. For Fast-RVM, the estimation of relies on the maximization of the likelihood , i.e., a constant prior is assumed for the hyperprior, . Hence, by selecting and we obtain the cost function used in [10]. In case of Fast-Laplace [8], the exponential pdf is selected for . As the gamma pdf reduces to the exponential pdf by choosing its shape parameter , we obtain used in [8] from this choice.

Iv Numerical Results

Fig. 2: Performance comparison of the different algorithms: unless otherwise specified, , , and . In (b)-(d) the SNR is 15 dB. The dashed gray curve in (a) corresponds to .
Sampling time, 32.55 ns
CP length 4.69 s / 144
Subcarrier spacing 15 kHz
Pilot pattern Evenly spaced, QPSK
Modulation QPSK ()
Subcarriers, 1200
OFDM symbols 1
Information bits 1091
Channel interleaver Random
Convolutional code
Decoder BCJR algorithm [19]
TABLE I: Parameter settings for the simulations.

We perform Monte Carlo simulations to evaluate the performance of Fast-BesselK derived in Section III. We consider a scenario inspired by the 3GPP LTE standard [20] with the settings specified in Table I. In all investigations conducted we fix the spectral efficiency of information bits per subcarrier, which corresponds to a rate code. We note that we employ a rate-1/3 convolutional code and use puncturing in order to increase the spectral efficiency. Unless otherwise specified, evenly-spaced pilot symbols are used.

The multipath channel (4) is based on the model used in [21] where, for each realization of the channel, the total number of multipath components

is Poisson distributed with mean

and the delays ,

, are independent and uniformly distributed random variables drawn from the continuous interval

. Conditioned on , , the weights , , are independent, and weight

has a zero-mean complex circular symmetric Gaussian distribution with variance

and parameters .666The parameter is computed such that . In the considered simulation scenario, , , and . In this way form a marked Poisson process.

For Fast-BesselK, we set and in all investigations. We empirically observed that this is a proper selection of parameters for channel models with both few and numerous multipath components. Fast-BesselK is compared to two Bayesian methods, Fast-RVM [10]777The software is available at http://people.ee.duke.edu/~lcarin/BCS.html. and Fast-Laplace [8]888The software is available at http://ivpl.eecs.northwestern.edu/.. For these three algorithms the noise precision is estimated at every third iteration with the initialization [10]. The stopping criterion is based on the difference in between two consecutive iterations [22]. Two non-Bayesian methods, LASSO and OMP, are also included for comparison. For LASSO, we use the sparse reconstruction by separable approximation (SpaRSA) algorithm [23]999The software is available on-line at http://www.lx.it.pt/~mtf/SpaRSA/. The required regularization parameter is chosen as [24], which has been empirically observed to provide satisfactory results. For OMP, an a priori estimate of the sparsity of needs to be set. In all investigations we use . Finally, the commonly employed robustly designed Wiener filter (RWF) [25] for OFDM channel estimation is used as a reference.

Unless otherwise specified, we set the number of rows in to (pilot subcarriers) and the number of columns in to , which corresponds to a delay resolution of . The performance versus SNR is compared in Figs. 1(a)-1(b). From Fig. 1(a), we see that Fast-BesselK and Fast-Laplace outperform the other algorithms in terms of BER across all the SNR range considered. Specifically, at 1 BER the gain is apporiximatly 1 dB over Fast-RVM, LASSO, and OMP and 2 dB over RWF. Fig. 1(b) shows how Fast-BesselK yields a lower MSE than the other algorithms. Surprisingly, the improved performance in MSE achieved by Fast-BesselK does not lead to a better BER performance when compared to Fast-Laplace.

The convergence speed of the Bayesian iterative algorithms is shown in Fig. 1(c). Here, Fast-BesselK achieves a remarkable improvement compared to Fast-RVM and Fast-Laplace with MSE values converging in about 10-30 iterations. As Fig. 1(c) shows, there is no guarantee that the MSE is reduced at each iteration, due to the objective function (13). Fast-RVM and Fast-Laplace suffer a significant increase in MSE after a certain number of iterations; this drawback is significantly mitigated in the case of Fast-BesselK. The superior convergence speed of Fast-BesselK can be explained by observing Figs. 2(a)-2(b). Fig. 2(b) shows that the improvement in convergence rate comes as the Besssel K prior can handle channels with few multipath components better (i.e., yields lower MSE). As a consequence, the other methods tend to add more column vectors to the dictionary matrix, thus, increasing the number of add, delete, and reestimate iterations as seen from Fig. 2(a).

Fig. 2(c) shows the MSE versus the number of pilots . We observe that, for a given MSE performance, Fast-BesselK is able to significantly reduce the required pilot overhead. In particular, Fast-BesselK achieves an MSE on pair with LASSO, OMP, and RWF using less than half the number of pilots. Finally, in Fig. 2(d) we evaluate the MSE performance versus available delay resolution determined by the number of columns in (cf., Section II).101010Naturally, RWF does not require a dictionary matrix to be specified and its performance is thereby independent of . Several observations are worth being noticed. Fast-BesselK leads to a noticeable MSE performance gain as the delay resolution improves as opposed to the other algorithms. In fact, it appears that, besides Fast-BesselK, only OMP is able to exploit the improved delay resolution. The reason for this is that LASSO, Fast-RVM, and Fast-Laplace produce a solution with an increasing number of nonzero components in when increasing (there are simply more column vectors in to be added or deleted). Thus, these algorithms also require an increasing amount of iterations to be run as opposed to Fast-BesselK (results not shown).

V Conclusion

In this work, we presented a fast iterative Bayesian inference channel estimation algorithm based on the hierarchical Bayesian prior model of the Bessel K probability density function. Following the framework for fast Bayesian inference in [10], we proposed an algorithm that significantly lowers the number of needed iterations as compared to state-of-the-art Bayesian inference methods with no penalization in performance. This improvement in convergence rate is directly related to the Bessel K prior’s ability to handle channels with few multipath components better than other commonly employed prior models. Furthermore, our algorithm shows improved performance when compared to both Bayesian and non-Bayesian state-of-the-art methods.


This work was supported in part by the 4GMCT cooperative research project, funded by Intel Mobile Communications, Agilent Technologies, Aalborg University and the Danish National Advanced Technology Foundation, and by the project ICT-248894 Wireless Hybrid Enhanced Mobile Radio Estimators (WHERE2).


  • [1] R. Tibshirani, “Regression shrinkage and selection via the LASSO,” J. R. Statist. Soc., vol. 58, pp. 267–288, 1994.
  • [2] S. S. Chen, D. L. Donoho, and M. A. Saunders, “Atomic decomposition by basis pursuit,” SIAM Journal on Scientific Computing, vol. 20, pp. 33–61, 1998.
  • [3] J. A. Tropp, “Greed is good: Algorithmic results for sparse approximation,” IEEE Trans. on Inf. Theory, vol. 50, pp. 2231–2242, 2004.
  • [4] D. Needell and J. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 301–321, 2009.
  • [5] M. Tipping, “Sparse Bayesian learning and the relevance vector machine,”

    Journal of Machine Learning Research

    , vol. 1, pp. 211–244, 2001.
  • [6] D. Wipf and B. Rao, “Sparse Bayesian learning for basis selection,” IEEE Trans. on Signal Proc., vol. 52, no. 8, pp. 2153 – 2164, 2004.
  • [7]

    M. Figueiredo, “Adaptive sparseness for supervised learning,”

    IEEE Trans. on Pattern Analysis and Machine Intel., vol. 25, no. 9, pp. 1150–1159, 2003.
  • [8] S. D. Babacan, R. Molina, and A. K. Katsaggelos, “Bayesian compressive sensing using Laplace priors,” IEEE Trans. on Image Proc., vol. 19, no. 1, pp. 53–63, 2010.
  • [9] N. L. Pedersen, D. Shutin, C. N. Manchón, and B. H. Fleury, “Sparse estimation using Bayesian hierarchical prior modeling for real and complex models,” in preparation, 2013.
  • [10] M. E. Tipping and A. C. Faul, “Fast marginal likelihood maximisation for sparse Bayesian models,” in

    Proc. 9th International Workshop on Artificial Intelligence and Statistics

    , Key West, FL, 2003.
  • [11] W. Bajwa, J. Haupt, A. Sayeed, and R. Nowak, “Compressed channel sensing: A new approach to estimating sparse multipath channels,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1058–1076, 2010.
  • [12] C. R. Berger, S. Zhou, J. C. Preisig, and P. Willett, “Sparse channel estimation for multicarrier underwater acoustic communication: From subspace methods to compressed sensing,” IEEE Trans. on Signal Proc., vol. 58, no. 3, pp. 1708–1721, 2010.
  • [13] J. Huang, C. R. Berger, S. Zhou, and J. Huang, “Comparison of basis pursuit algorithms for sparse channel estimation in underwater acoustic OFDM,” in Proc. OCEANS 2010 IEEE - Sydney, pp. 1–6, 2010.
  • [14] G. Tauböck, F. Hlawatsch, D. Eiwen, and H. Rauhut, “Compressive estimation of doubly selective channels in multicarrier systems: leakage effects and sparsity-enhancing processing,” IEEE Journal of Selected Topics in Signal Proc., vol. 4, no. 2, pp. 255–271, 2010.
  • [15] D. Shutin and B. H. Fleury, “Sparse variational Bayesian SAGE algorithm with application to the estimation of multipath wireless channels,” IEEE Trans. on Signal Proc., vol. 59, pp. 3609–3623, 2011.
  • [16] P. Schniter, “A message-passing receiver for BICM-OFDM over unknown clustered-sparse channels,” IEEE Journal of Selected Topics in Signal Proc., vol. 5, no. 8, pp. 1662–1474, 2011.
  • [17] N. L. Pedersen, C. N. Manchón, D. Shutin, and B. H. Fleury, “Application of Bayesian hierarchical prior modeling to sparse channel estimation,” in Proc. IEEE Int. Communications Conf. (ICC), pp. 3487–3492, 2012.
  • [18] I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using focuss: a re-weighted minimum norm algorithm,” IEEE Trans. on Signal Proc., vol. 45, no. 3, pp. 600–616, 1997.
  • [19] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. on Inf. Theory, vol. 20, no. 2, pp. 284–287, 1974.
  • [20] 3rd Generation Partnership Project (3GPP) Technical Specification, “Evolved universal terrestrial radio access (e-utra); base station (bs) radio transmission and reception,” TS 36.104 V8.4.0, Tech. Rep., 2008.
  • [21] M. L. Jakobsen, K. Laugesen, C. Navarro Manchón, G. E. Kirkelund, C. Rom, and B. Fleury, “Parametric modeling and pilot-aided estimation of the wireless multipath channel in OFDM systems,” in Proc. IEEE Int Communications Conf. (ICC), pp. 1–6, 2010.
  • [22] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Trans. on Signal Proc., vol. 56, no. 6, pp. 2346–2356, 2008.
  • [23] S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo, “Sparse reconstruction by separable approximation,” IEEE Trans. on Sig. Proc., vol. 57, no. 7, pp. 2479–2493, 2009.
  • [24] Z. Ben-Haim and Y. C. Eldar, “The Cramér-Rao bound for sparse estimation,” 2009, arXiv:0905.4378v4.
  • [25]

    O. Edfors, M. Sandell, J.-J. van de Beek, S. K. Wilson, and P. O. Börjesson, “OFDM channel estimation by singular value decomposition,”

    IEEE Trans. on Communications, vol. 46, no. 7, pp. 931–939, 1998.