# Limited Feedback Channel Estimation in Massive MIMO with Non-uniform Directional Dictionaries

Channel state information (CSI) at the base station (BS) is crucial to achieve beamforming and multiplexing gains in multiple-input multiple-output (MIMO) systems. State-of-the-art limited feedback schemes require feedback overhead that scales linearly with the number of BS antennas, which is prohibitive for 5G massive MIMO. This work proposes novel limited feedback algorithms that lift this burden by exploiting the inherent sparsity in double directional (DD) MIMO channel representation using overcomplete dictionaries. These dictionaries are associated with angle of arrival (AoA) and angle of departure (AoD) that specifically account for antenna directivity patterns at both ends of the link. The proposed algorithms achieve satisfactory channel estimation accuracy using a small number of feedback bits, even when the number of transmit antennas at the BS is large -- making them ideal for 5G massive MIMO. Judicious simulations reveal that they outperform a number of popular feedback schemes, and underscore the importance of using angle dictionaries matching the given antenna directivity patterns, as opposed to uniform dictionaries. The proposed algorithms are lightweight in terms of computation, especially on the user equipment side, making them ideal for actual deployment in 5G systems.

## Authors

• 2 publications
• 38 publications
• 2 publications
• 3 publications
• 2 publications
• ### Multi-resolution CSI Feedback with deep learning in Massive MIMO System

In massive multiple-input multiple-output (MIMO) system, user equipment ...
10/31/2019 ∙ by Zhilin Lu, et al. ∙ 0

• ### Downlink Channel Reconstruction for Spatial Multiplexing in Massive MIMO Systems

To get channel state information (CSI) at a base station (BS), most of r...
02/10/2021 ∙ by Hyeongtaek Lee, et al. ∙ 0

• ### Deep Learning for Distributed Channel Feedback and Multiuser Precoding in FDD Massive MIMO

This paper shows that deep neural network (DNN) can be used for efficien...
07/13/2020 ∙ by Foad Sohrabi, et al. ∙ 0

• ### The Effect of Diversity Combining on ISI in Massive MIMO

We analyze the resiliency of Massive Multiple-Input Multiple-Output (M-M...
11/01/2018 ∙ by Arkady Molev-Shteiman, et al. ∙ 0

• ### Nonconvex Regularized Gradient Projection Sparse Reconstruction for Massive MIMO Channel Estimation

Novel sparse reconstruction algorithms are proposed for beamspace channe...
01/26/2021 ∙ by Pengxia Wu, et al. ∙ 0

• ### FDD Massive MIMO – Antenna Duplex Pattern an-Reciprocity : A Missing Brick

Obtaining down link (DL) channel state information (CSI) at the base sta...
11/15/2020 ∙ by Patrick C. F. Eggers, et al. ∙ 0

• ### Deep Convolutional Compression for Massive MIMO CSI Feedback

Coded caching provides significant gains over conventional uncoded cachi...
07/02/2019 ∙ by Qianqian Yang, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

The idea of harnessing a large number of antennas at the base station (BS), possibly many more than the number of user equipment (UE) terminals in the cell, has recently attracted a lot of interest in massive multiple-input multiple-output (MIMO) research. The key technical reasons for this is that massive MIMO can enable leaps in spectral efficiency [1] as well as help mitigating intercell interference through simple linear precoding and combining, offering immunity to small-scale fading – known as the channel hardening effect [2, 3]. Massive MIMO systems also have the advantage of being energy-efficient since every antenna may operate at a low-energy level [4].

The largest portion of the feedback-based channel estimation literature explores various quantization techniques; see [10]

for a well-rounded exposition. Many of these methods utilize a vector quantization (VQ) codebook that is known to both the BS and the UE. After estimating the instantaneous downlink CSI at the UE, the UE sends through a limited feedback channel the index of the codeword that best matches the estimated channel, in the sense of minimizing the outage probability

[11], maximizing link capacity [12], or maximizing the beamforming gain [13, 14]. Codebooks for spatially correlated channels based on generalizations of the Lloyd algorithm are given in [15], while codebooks designed for temporally correlated channels are provided in [16]. Codebook-free feedback for channel tracking was considered in [17] for spatio-temporally correlated channels with imperfect CSI at the UE. Many limited feedback approaches in MIMO systems consider a Rayleigh fading channel model [18, 13, 14, 19]. Under this channel model, the number of VQ feedback bits required to guarantee reasonable performance is linear in the number of transmit antennas at the BS [5] – which is costly in the case of massive MIMO. Yet the designer is not limited to using VQ-based approaches, and massive MIMO channels can be far from Rayleigh.

In this work, we consider an approach that differs quite sharply from the prevailing limited feedback methodologies. Our approach specifically targets FDD massive MIMO in the sublinear feedback regime. We adopt the double directional (DD) MIMO channel model [20] (see also [21]) instead of the Rayleigh fading model. The DD channel model parameterizes each channel path using angle of departure (AoD) at BS, small- and large-scale propagation coefficients, and angle of arrival (AoA) at UE – a parametrization that is well-accepted and advocated by 3GPP [22, 23]. We exploit a ‘virtual sparse representation’ of the downlink channel under the double directional MIMO model [20]. Quantizing AoA and AoD, it is possible to design overcomplete dictionaries that contain steering vectors approximating those associated with the true angles of arrival and departure. Building upon [20], such representation has been exploited to design receiver-side millimeter wave (mmWave) channel estimation algorithms using high-resolution [24], or low-resolution (coarsely quantized) analog-to-digital converters (ADCs) [25, 26].

In contrast, we focus on transmitter-side (BS) downlink channel acquisition using only limited receiver-side (UE) computation and feedback to the BS [27]. We propose novel optimization formulations and algorithms for downlink channel estimation at the BS using single-bit judiciously-compressed measurements. In this way, we shift the channel computation burden from the UE to the BS, while keeping the feedback overhead low. Using the overcomplete parametrization of the DD model, three new limited feedback setups are proposed:

• In the first setup, UE applies dictionary-based sparse channel estimation and support identification to estimate the 2D angular support and the corresponding coefficients of the sparse channel. Then, the UE feeds back the support of the sparse channel estimate, plus a coarsely quantized version of the corresponding non-zero coefficients, assuming known thresholds at the BS. This is the proposed UE-based limited feedback baseline method for the DD model.

• In the second setup, the UE compresses the received measurements and sends back only the signs of the compressed measurements to the BS. Upon receiving these sign bits, the BS estimates the channel using single-bit DD dictionary-based sparse estimation algorithms.

• The third setup is a combination of the first and the second, called hybrid limited feedback: UE estimates and sends the support of the sparse channel estimate on top of the compressed sign feedback used in the second setup. Upon receiving this augmented feedback from the UE, the BS can then apply the algorithms of setup 2 on a significantly reduced problem dimension.

For sparse estimation and support identification, the orthogonal matching pursuit (OMP) algorithm [28] is utilized as it offers the best possible computational complexity among all sparse estimation algorithms [29], which is highly desired for resource-constrained UE terminals.

Contributions:

A new limited feedback channel estimation framework is proposed exploiting the sparse nature of the DD model (setup 2). Two formulations are proposed based on single-bit sparse maximum-likelihood estimation (MLE) and single-bit compressed sensing. For MLE, an optimal in terms of iteration complexity [30] first-order proximal method is designed using adaptive restart, to further speed up the convergence rate [31]. The proposed compressed sensing (CS) formulation can be – fortuitously – harnessed by invoking the recent single-bit CS literature. The underlying convex optimization problem has a simple closed-form solution, which is ideal for practical implementation. The proposed framework shifts the computational burden towards the BS side – the UE only carries out matrix-vector multiplications and takes signs. This is sharply different from most limited feedback schemes in the literature, where the UE does the ‘heavy lifting’ [6, 10]. More importantly, under our design, using a small number of feedback bits achieves very satisfactory channel estimation accuracy even when the number of BS antennas is very large, as long as the number of paths is reasonably small – which is usually the case in practice [20]; thus, the proposed framework is ideal for massive MIMO 5G cellular networks.

In addition to the above contributions, a new angle dictionary construction methodology is proposed to enhance performance, based on a companding quantization technique [32]. The idea is to create dictionaries that concentrate the angle density in a non-uniform manner, around the angles where directivity patterns attain higher values. The baseline 3GPP antenna directivity pattern is considered for this, and the end-to-end results are contrasted with those obtained using uniform quantization, to showcase this important point. Judicious simulations reveal that the proposed dictionaries outperform uniform dictionaries.

Last but not least, to further reduce computational complexity at the BS and enhance beamforming and ergodic rate performance, a new hybrid implementation is proposed (setup 3). This setup is very effective when the UE is capable of carrying out simple estimation algorithms, such as OMP. At the relatively small cost of communicating extra support information that slightly increases feedback communication overhead, the BS applies the single-bit MLE and single-bit CS algorithms on a dramatically reduced problem dimension. Simulations reveal that the performance of the two algorithms under setup 3 is always better than under setup 2. As in setup 2, the feedback overhead is tightly controlled by the system designer and the desired level of channel estimation accuracy is attained with very small feedback rate, even in the massive MIMO regime.

Comprehensive simulations over a range of pragmatic scenarios, based on the 3GPP DD channel model [33], compare the proposed methods with baseline least-squares (LS) scalar and vector quantization (VQ) feedback strategies in terms of normalized mean-squared estimation error (NRMSE), beamforming gain, and multi-user capacity under zero-forcing (ZF) beamforming. Unlike VQ, which requires that the number of feedback bits grows at least linearly with the number of BS antennas to maintain a certain level of estimation performance, the number of feedback bits of the proposed algorithms is controlled by the system designer, and substantial feedback overhead reduction is observed for achieving better performance compared to VQ methods. It is also shown that when the sparse DD model is valid, the proposed methods not only outperform LS schemes, but they may also offer performance very close to perfect CSI in some cases.

Relative to the conference precursor [34] of this work, this journal version includes the following additional contributions: the UE-based limited feedback scheme under setup 1; the novel channel estimation algorithm based on the sparse MLE formulation; the new hybrid schemes under setup 3; and comprehensive (vs. illustrative) simulations of all schemes considered. The rest of this paper is organized as follows. Section II presents the adopted wireless system model, and Section III derives the proposed non-uniform directional dictionaries. Sections IVV, and VI develop the proposed UE-based, BS-based, and hybrid limited feedback algorithms, respectively. Section VII presents simulation results, and Section VIII summarizes conclusions.

Notation: Boldface lowercase and uppercase letters denote column vectors and matrices, respectively; , , and , denote conjugate, transpose, and Hermitian operators, respectively. , , , and denote the -norm (with ), the real, the imaginary, and the absolute or set cardinality operator, respectively. is the diagonal matrix formed by vector , is the all-zero vector and its size is understood from the context, is the identity matrix. Symbol denotes the Kronecker product. is the expectation operator.

denotes the proper complex Gaussian distribution with mean

and covariance . Matrix (vector) () comprises of the columns of matrix (elements of ) indexed by set . Function for and zero, otherwise; abusing notation a bit, we also apply it to vectors, element-wise. Function , is the imaginary unit, and is the -function.

## Ii System Model

We consider an FDD cellular system consisting of a BS serving active UE terminals, where the downlink channel is estimated at the BS through feedback from each UE. For brevity of exposition, we focus on a single UE. The proposed algorithms can be easily generalized to multiple users, as the downlink channel estimation process can be performed separately for each UE. The BS is equipped with antennas and the UE is equipped with antennas. The channel is assumed static over a coherence block of complex orthogonal frequency division multiplexing (OFDM) symbols, where is the coherence bandwidth (in Hz), is the coherence time (in seconds), and quantity indicates the fraction of useful symbol time (i.e., is the OFDM symbol duration and is the cyclic prefix duration).111 In LTE, time-frequency resources are structured in a such a way, so the coherence block occupies some resource blocks – each resource block consists of contiguous OFDM symbols in time multiplied by contiguous subcarriers in frequency. A subframe of duration msec consists of two contiguous in time resource blocks, yielding symbols, over which the channel can be considered constant [35]. In downlink transmission, the BS has to acquire CSI through feedback from the active UE terminals, and then design the transmit signals accordingly. At the training phase, the BS employs training symbols for channel estimation. The narrowband (over time-frequency) discrete model over a period of training symbols is given by

 yn=Hsn+nn,  n=1,2,…,Ntr, (1)

where is the -th training index, is the transmitted training signal, is the received vector, denotes the complex baseband equivalent channel matrix, and

. All quantities in the right hand-side of (1) are independent of each other; , for all , where denotes the average total transmit power. The signal-to-noise ratio (SNR) is defined as .

To estimate , we can use linear least-squares (LS) [18], or, if the channel covariance is known, the linear minimum mean-squared error (LMMSE) approach [6]. These linear approaches need more than training symbols to establish identifiability of the channel (to ‘over-determine’ the problem) – which is rather costly in massive MIMO scenarios.

A more practical approach to the problem of downlink channel acquisition at the BS of massive MIMO systems would be to shift the computational burden to the BS, relying on relatively lightweight computations at the UE, and assuming that only low-rate feedback is available as well. The motivation for this is clear: the BS is connected to the communication backbone, plugged to the power grid, and may even have access to cloud computing – thus is far more capable of performing intensive computations. The challenge of course is how to control the feedback overhead – without a limitation on feedback rate, the UE can of course simply relay the signals that receives back to the BS, but such an approach is clearly wasteful and impractical. The ultimate goal is to achieve accurate channel estimation with low feedback overhead, i.e., estimate using just a few feedback bits.

Towards this end, our starting idea is to employ a finite scatterer (also known as discrete multipath, or double directional) channel model comprising of paths, which can be parameterized using a virtual sparse representation. The inherent sparsity of DD parameterization in the angle-delay domain can be exploited also at the the UE side to estimate the downlink channel using compressed sensing techniques with reduced pilot sequence overhead [20]. This sparse representation will lead to a feedback scheme that is rather parsimonious in terms of both overhead and computational complexity. The narrowband downlink channel matrix can be written as

 H=√MTMRLL∑l=1αlcT(ϕ′l)cR(ϕl)aR(ϕl)aHT(ϕ′l)ejφl, (2)

where is the complex gain of the -th path incorporating path-losses, small- and large-scale fading effects; variables and are the azimuth angle of arrival (AoA) and angle of departure (AoD) for the th path, respectively; and , represent the transmit and receive array steering vectors, respectively, which depend on the antenna array geometry. Random phase is associated with the delay of the -th path. Functions and represent the BS and UE antenna element directivity pattern, respectively (all transmit antenna elements are assumed to have the same directivity pattern, and the same holds for the receive antenna elements). Examples of transmit and receive antenna patterns are the uniform directivity pattern over a sector , given by , when and , otherwise, and likewise for . Another baseline directivity pattern is advocated by 3GPP [36]

 (3)

with , where is the maximum directional gain of the radiation element in dBi, is the front-to-back ratio in dB, and is the dB-beamwidth. A common antenna array architecture is the uniform linear array (ULA) (w.r.t. axis) using only the azimuth angle; in this case the BS steering vector (similarly for UE) is given by

 aT(ϕ)=√1MT[1 e−j2πdyλsin(ϕ) … e−j2πdy(MT−1)λsin(ϕ)]⊤, (4)

where is the carrier wavelength, and is the distance between the antenna elements along the axis (usually ).

The channel in (2) can be written more compactly as

 H=ARdiag(α)AHT, (5)

with matrices and denoting all transmit and receive steering vectors in compact form, respectively, while vector collects the path-loss and phase shift coefficients. Starting from the model in (5), one can come up with a sparse representation of the channel [20]. First, the angle space of AoA and AoD is quantized by discretizing the angular space. Let us denote these dictionaries and for AoDs and AoAs, respectively. Dictionary contains dictionary members, while contains dictionary members. One simple way of constructing these dictionaries is to use a uniform grid of phases in an angular sector . In that case, and . For given dictionaries and , dictionary matrices are defined

 ˜AR≜ {cR(ϕ)aR(ϕ):ϕ∈PR}∈CMR×GR, (6) ˜AT≜ {cT(ϕ)aT(ϕ):ϕ∈PT}∈CMT×GT, (7)

which stand for an overcomplete quantized approximation of the matrices and , respectively. Hence, the channel matrix in the left-hand side of (5) can be written, up to some quantization errors, as

 H≈˜ARG˜AHT, (8)

where matrix is an interaction matrix, whose th element is associated with the th and th columns in and , respectively – if , this means that a propagation path associated with the th angle in and the th angle in is active. In practice, the number of active paths is typically very small compared to the number of elements of (i.e., ). Thus, the matrix is in most cases very sparse [20].

Stacking all columns in (1) in a parallel fashion, we form matrix . Denoting for the transmitted training symbol sequence and for the noise, and using the channel matrix approximation in (8), the baseband signal in (1) can be written in a compact matrix form as

 Y=˜ARG˜AHTS+N. (9)

Applying the vectorization property in Eq. (9), the baseband received signal is given by

 y=((S⊤˜A∗T)⊗˜AR)g+n=Qg+n, (10)

where , , , and . We define the joint (product) dictionary size. This quantity plays a pivotal role on the performance of the algorithms considered, since it determines the angle granularity of the dictionaries, which in turn determines the ultimate estimation error performance. Fig. 1 provides a high-level overview of the system model.

## Iii Angle Dictionary Construction Accounting for Antenna Directivity Patterns

Before introducing the proposed feedback schemes, let us consider the practical issue of quantizing the angular space. Prior art on channel estimation employs the sparse representation in (5) using uniformly discretized angles as dictionaries [20]. However, a more appealing angle dictionary should take into consideration the antenna directivity patterns, since the channel itself naturally reflects the directivity pattern. In this work we propose the following: pack more angles around the peaks of the antenna directivity pattern, because the dominant paths will likely fall in those regions, and this is where we need higher angular resolution. Denser discretization within high-antenna-power regions can reduce quantization errors more effectively compared to a uniform quantization that ignores the directivity pattern.

To explain our approach, let be a given antenna directivity pattern function, which is assumed continuous over and suppose that we want to represent it using quantization points; see Fig. 2 for the 3GPP directivity pattern. We define the cumulative function of , given by . As the range space of function takes positive values, its continuity implies that is monotone increasing. Thus, the following set

 Cq≜{G(a)+n(G(b)−G(a))N+1,}Nn=1, (11)

partitions the range of in intervals of equal size. By the definition of , the set in (11) partitions function in equal area intervals. Having the elements of set , we can find the phases at which is partitioned in equal area intervals – which means that we achieve our goal of putting denser grids in the angular region where the function has higher intensity. These phases can be found as

 Fq≜{G−1(y)}y∈Cq, (12)

where is the inverse (with respect to composition) function of . Observe that is a continuous, monotone increasing function since is itself continuous and monotone increasing. The discrete set is a subset of and concentrates more elements at points where function has larger values.

Let us exemplify the procedure of constructing the angle dictionaries using the 3GPP antenna directivity pattern. As the most general case [36], we assume and . The domain of can be partitioned into 3 disjoint intervals as , with . Using in Eq. (3), applying the definition of cumulative function , and using its continuity, we obtain [34]

 G(ϕ)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩(ϕ−a)10GdB20−Am20,ϕ∈[a,−ϕ0),G(−ϕ0)+10GdB20√πϕ23dBln(10)2.4⋅(erf(√ln(10)0.6Am12)+sign(ϕ)erf(√ln(10)0.6ϕ23dB|ϕ|)),ϕ∈[−ϕ0,ϕ0),G(ϕ0)+(ϕ−ϕ0)10GdB20−Am20,ϕ∈[ϕ0,b), (13)

where was utilized. Upon defining , , and , the inverse of can be calculated using Eq. (13) in closed form as

 G−1(y)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩y10Am20−GdB20+a,y∈[0,y−),−erf−1⎛⎝2√ln(10)0.6ϕ3dB√π(y0−y)10−GdB20⎞⎠√ln(10)0.6ϕ3dB,y∈[y−,y0),erf−1⎛⎝2√ln(10)0.6ϕ3dB√π(y−y0)10−GdB20⎞⎠√ln(10)0.6ϕ3dB,y∈[y0,y+),ϕ0+(y−y+)10Am20−GdB20,y∈[y+,G(b)), (14)

where is the inverse (with respect to composition) function of , and is well tabulated by several software packages, such as Matlab. The definition of inverse function in (14) for interval , such that , is the most general case. As one can see in Fig. 2, the point density of this quantization of the angular space indeed reflects the selectivity of the antenna directivity pattern, as desired.

## Iv UE-based Baseline Limited Feedback Sparse Channel Estimation

This section presents a baseline limited feedback setup where UE estimates the sparse channel and sends back the support along with the coarsely quantized nonzero elements of the estimated sparse channel .

### Iv-a Channel Estimation and Support Identification at UE

The inherent sparsity of in (10) suggests the following formulation to recover it at UE

 ming∈CG:∥g∥0≤L{12∥y−Qg∥22}. (15)

The optimization problem in (15) is a non-convex combinatorial problem. Prior art in compressed sensing (CS) optimization literature has attempted to solve (15) using approximation algorithms, such as orthogonal matching pursuit (OMP) [28], iterative hard thresholding (IHT) [37], and many others; see [29] and references therein. OMP-based algorithms are preferable for sparse channel estimation, due to their favorable performance-complexity trade-off [29]. OMP admits simple and even real-time implementation, and its run-time complexity can be further reduced by caching the QR factorization of matrix [28]. For completeness, the pseudo code for OMP is provided in Algorithm 1. For a detailed discussion regarding the implementation details and performance guarantees of the OMP algorithm the reader is referred to [38].

### Iv-B Scalar Quantization and Limited Feedback

After estimating the sparse vector associated with an estimate of interaction matrix a simple feedback technique is to send coarsely quantized non-zero elements of , along with the corresponding indices. In this work we make use of Lloyd’s scalar quantizer to quantize the non-zero elements of , and we denote the scalar quantization operation . Upon receiving the bits associated with the non-zero indices and elements of , i.e., and , the BS reconstructs channel matrix via (8), provided it has perfect knowledge of SQ threshold values. As the channel model in (10) has sparse structure comprising non-zero elements, for suitably designed and a sufficient number of training symbols, this approach tends to yield a channel estimate comprising non-zero elements.

Using a -bit real scalar quantizer, each non-zero element of complex vector can be represented using bits, where the first term accounts for index coding, and the second for coding the real and imaginary parts. Hence, the total number of feedback bits to estimate the interaction matrix at the BS, scales with . In the worst case, OMP iterates times, offering worst case feedback overhead . Note that the number of feedback bits of the proposed UE-based baseline limited feedback algorithm is independent of .

## V BS-based Limited Feedback Sparse Channel Estimation

In order to reduce the feedback overhead without irrevocably sacrificing our ability to recover accurate CSI at the BS, we propose to apply a pseudo-random dimensionality-reducing linear operator to . The outcome is quantized with a very simple sign quantizer, whose output is fed back to the BS through a low-rate channel. More precisely, the BS receives

 (16)

where , with .

To facilitate operating in the more convenient real domain, consider the following definitions

 C⊤R ≜[R(QHP)⊤ I(QHP)⊤], (17a) C⊤I ≜[−I(QHP)⊤ R(QHP)⊤], (17b) C ≜[CR  CI]=[c1 c2… c2Nfb]∈R2G×2Nfb, (17c) x⊤ ≜[R(g)⊤ I(g)⊤]∈R2G, (17d) b⊤ ≜[b⊤R b⊤I]⊤=[b1 b2 … b2Nfb]∈R2Nfb, (17e) z⊤ ≜[z⊤R z⊤I]⊤=[z1 z2 … z2Nfb]∈R2Nfb, (17f)

with , , , and . Using the above, along with (16), the received feedback bits at the BS are given by

 bi=sign(c⊤ix+zi),  i=1,2,…,2Nfb. (18)

The objective at the BS is to estimate , given and . If the complex vector has non-zero elements, then the real vector has up to non-zero elements. More precisely, vector has active (real, imaginary) element pairs, i.e., it exhibits group-sparsity of order , where the groups are predefined pairs here. In our experiments, we have noticed that the distinction hardly makes a difference in practice. In the sequel, we therefore drop group sparsity in favor of simple sparsity.

It should be noted that the number of feedback bits is controlled by the dimension of , which is determined by the designer to balance channel estimation accuracy versus the feedback rate. As , from compressive sensing theory we know that the number of measurements to recover is lower bounded by [38]. In practice, depending on the examined cellular setting, it is usually easy to have a rough idea of [23].

### V-a Single-Bit Compressed Sensing Formulation

Single-bit compressed sensing (CS) has attracted significant attention in the compressed sensing literature [39, 40, 41, 42], where the goal is to reconstruct a sparse signal from single-bit measurements. Existing single-bit CS algorithms make the explicit assumption that [39], or [40, 42]. Thus, the solution of single-bit CS problems is always a sparse vector on a unit hypersphere. In our context, we seek a sparse that yields maximal agreement between the observed and the reconstructed signs. This suggests the following formulation

 ˆx=argminx∈R2G{−2Nfb∑i=1sign(c⊤ix)bi+ζ∥x∥0}, (19)

where is a regularization parameter that controls the sparsity of the optimal solution. Unfortunately the optimization problem in (19) is non-convex and requires exponential complexity to be solved to global optimality. In addition, notice that the scaling of cannot be determined from (19): if is an optimal solution, so is for any . Therefore, the following convex surrogate of problem (19) is considered

 ˆx=argminx∈R2G:∥x∥2≤R2{−x⊤Cb+ζ∥x∥1}, (20)

where is an upper bound on the norm of , which also prevents meaningless scaling up of when is small. We found that setting to be on the same order of magnitude with works very well, where ; note that quantity expresses the aggregated power of the wireless channel gain coefficients in Eq. (2). The cost function in (20) is known to be an effective surrogate of the one in (19), both in theory and in practice. If the elements of are drawn from a Gaussian distribution, the formulation in (20) will recover -sparse on the unit hypersphere (i.e., ) with -accuracy using measurements [42].

Interestingly, problem (20) admits closed-form solution, given by [42]

 ˆx={0,∥Cb∥∞≤ζ,R2T(ζ;Cb)∥T(ζ;Cb)∥2,otherwise, (21)

where for , denotes the shrinkage-thresholding operator, given by

 [T(v;x)]i=(|xi|−v)+sign(xi),  i=1,2,…,2G. (22)

The overall computational cost of computing (21) is . A key advantage of the adopted CS method is that it is a closed-form expression, and thus it is very easily implementable in real-time.

### V-B Sparse Maximum-Likelihood Formulation

Let

be a semi-unitary matrix, i.e.,

. Because vector is a circularly-symmetric complex Gaussian vector, the statistics of the noise vector are , where . So, each

is a Rademacher random variable (RV) with parameter

. In addition to that, due to the fact that ’s covariance matrix is diagonal, all are independent of each other.

In the proposed sparse maximum-likelihood (ML) formulation, the sparse channel parameter vector is estimated by maximizing the regularized log-likelihood of the (sign) observations, , given . Using the independence of , the sparse ML problem can be formulated as [43]

 infx∈R2G{−2Nfb∑i=1lnQ(−bic⊤ixσz)+ζ∥x∥1}, (23)

where is a tuning regularization parameter that controls the sparsity of the solution. Let us denote and . The above is a convex optimization problem since the -function is log-concave [44, p. 104]. According to the Weierstrass theorem, the minimum in (23) always exists since the objective, , is a coercive function, meaning that for any sequence , such that , holds true [45, p. 495]. A choice for that guarantees that the all-zero vector is not solution of (23) is (the proof of this claim relies on a simple application of optimality conditions using subdifferential calculus [45]), where the gradient of is given by [17]

 ∇f(x)=−2Nfb∑i=1bie−(c⊤ix)22σ2z√2πσzQ(−bic⊤ixσz)ci. (24)

It is worth noting that the minimizer of problem (23) can be also viewed as the maximum a-posteriori probability (MAP) estimate of under the assumption that the elements of vector are independent of each other and follow a Laplacian distribution.

The Hessian of is given by [17]

 ∇2f(x)= Cdiag(m(x))C⊤, (25)

where the elements of vector are given by

 mi(x)=e−(c⊤ix)2σ2z2πσ2z[Q(−bic⊤ixσz)]2+bi(c⊤ix)e−(c⊤ix)22σ2z√2πσ3zQ(−bic⊤ixσz), (26)

. Having calculated the Hessian, due to Cauchy-Swartz inequality for matrix norms

 ∥∇2f(x)∥2≤ ∥C∥2∥diag(m(x))∥2∥C⊤∥2 = ∥C∥22∥m(x)∥∞≜L(x), ∀x∈R2G. (27)

It is noted that for bounded , is also bounded.

An accelerated gradient method for the -regularized problem in (23) is utilized, where sequences and are generated according to [46]

 x(t+1) (28a) β(t+1) =1+√1+4(β(t))22, (28b) u(t+1) =x(t+1)+β(t)−1β(t+1)(x(t+1)−x(t)). (28c)

For bounded , which holds in our case, the sequence generated by updates in (28) converges to an -optimal solution (a neighborhood of the optimal solution with diameter ) using at most iterations [46].

Algorithm 2 illustrates the proposed first-order -regularization algorithm incorporating Nesterov’s extrapolation method. In addition, an adaptive restart mechanism [31] is utilized in order to further speed up the convergence rate. Experimental evidence on our problems shows that it works remarkably well. At line (1), quantity is precomputed, requiring arithmetic operations. The per iteration complexity of the proposed algorithm is due to the evaluation of and at lines 4 and 5, respectively. In the worst case, MLE-reg algorithm iterates times offering total computational cost . Note that such complexity is linear in , and thus, affordable at a typical BS.

To reconstruct an estimate of the downlink channel, the BS obtains from as and forms an estimate of the interaction matrix using the inverse of the vectorization operation, i.e., . With available, the downlink channel matrix can be estimated as .

## Vi Hybrid Limited Feedback Sparse Channel Estimation With Reduced Computational Cost

The last setup proposed in this work is a hybrid between the setups presented in Sections IV and V. This third setup is better suited to cases when the UE can afford to run simple channel estimation algorithms, such as OMP. The UE-based support identification algorithm presented in Algorithm 1 is combined with the BS-based limited feedback schemes of Section V resulting in an algorithm that can significantly reduce the computational cost at the BS, and possibly even the overall feedback overhead for a given accuracy.

The UE first estimates the support of the downlink channel vector , , using Algorithm 1. Let be the -sparse channel estimate.222It is noted that having the support of complex vector the support of can be also inferred easily through Eq. (17d). Specifically, . As feedback, UE sends the indices associated with non-zero elements of estimate (i.e., ), using bits, along with sign-quantized bits associated with received signal . Upon receiving and an estimate of the support of , the BS exploits the fact that the elements of vector are zero in the complement of the support , i.e., , implying that

 bi=sign⎛⎝∑j∈Sˆxci,jxj+zi⎞⎠, i=1,2,…,2Nfb, (29)

and applies either of the two limited feedback channel estimation algorithms presented in Sections V-A and V-B, but this time limited to the reduced support to obtain an estimate . The whole procedure is listed in Algorithm 3.

At the BS, the computational complexity of the proposed hybrid limited feedback sparse estimation algorithms invoked in Algorithm 3 is reduced by a factor compared to the pure BS-based counterparts of Section V. It is reasonable to assume that is of the same order as ; thus, using extra feedback bits, the computational cost of BS reconstruction algorithms executed over a reduced support depends only on and and becomes independent of the joint dictionary size . Numerical results show that not only the complexity diminishes, but the estimation error can be further reduced compared to the case of not sending the support information. This can in turn be used to reduce , if so desired.

## Vii Numerical Results

The double directional channel model in Eq. (2) is used with uniform antenna directivity pattern at UE and uniform or 3GPP antenna directivity pattern at the BS. BS and UE are equipped with ULAs. A variety of performance metrics is examined such as normalized mean-squared error (NRMSE), beamforming gain, and multiuser sum-capacity. The uplink feedback channel is considered error-free. The following algorithms are compared:

• LS channel estimation at the UE, given by , and quantization of ’s elements using scalar quantizer of bits per real number. This feedback scheme requires exactly feedback bits. This scheme is abbreviated LS-SQ.

• For the case of , we add in the comparisons a VQ technique that applies (a) LS channel estimation at the UE, followed by (b) VQ of , and (c) feedback of the VQ index. The VQ strategy of [13] based on a -PSK codebook