I Introduction
Millimeterwave (mmWave) communication with the aid of massive multipleinput multipleoutput (MIMO) is an enabling technology for nextgeneration mobile communications, since the abundant spectrum resource at mmWave frequency band can boost the throughput by orders of magnitude [1, 2]. To mitigate the severe path loss for mmWave signal, massive MIMO is usually integrated into mmWave communications to form beams for directional signal transmission [3, 4]. However, the powerful fullydigital MIMO architecture, which requires a radio frequency (RF) chain for each antenna, is unaffordable for mmWave massive MIMO, due to the prohibitive hardware cost and power consumption of RF chains required [5]. The hybrid MIMO architecture with a much smaller number of RF chains than that of antennas offers a practical solution by using hybrid analog/digital beamforming [6]. Nonetheless, for such a hybrid MIMO system, it is challenging to estimate the highdimensional mmWave channel from the lowdimensional effective measurements observed from the limited number of RF chains, since the training overhead for channel estimation (CE) can be excessively high [2]. Moreover, the low signaltonoise ratio (SNR) before beamforming can further degrade the performance of channel state information (CSI) acquisition [7].
Ia Related Work
Several approaches were proposed in the literature to acquire CSI for narrowband mmWave communications, including codebookbased beam training [8, 9, 10, 11] and compressed sensing (CS)based CE [12, 13, 14]. The beam training approaches were initially adopted in analog beamforming, e.g., IEEE standards 802.11ad [8] and 802.15.3c [2], where the transceiver exhaustively searches for the optimal beam pair from a predefined codebook to maximize the received SNR for improved transmission performance. To reduce the search dimension of codebooks for achieving lower training overhead, the multistage overlapped beam patterns were designed in [9], where the beam patterns can become narrow as the training stage increases. However, these schemes only consider the analog beamforming with singlestream transmission. For hybrid beamforming with multistream transmission, beam training solutions with hierarchical multibeam codebooks were proposed in [10, 11], where the optimal multibeam pairs can be acquired after hierarchical beam search with gradually finer and narrower beams. However, the training overhead of a beam training scheme is usually proportional to the dimension of codebook, which is very large for fulldimensional (FD) MIMO with a large number of antennas. By exploiting the inherent angledomain sparsity of mmWave MIMO channels, several CSbased CE schemes were proposed to reduce the CE overhead [12, 13]. In [12], the orthogonal matching pursuit (OMP) algorithm was considered to estimate sparse mmWave channels by formulating the CSI acquisition problem as a sparse signal recovery problem, where a redundant dictionary with nonuniformly quantized angledomain girds was designed for improved performance. Furthermore, a Bayesian CSbased CE scheme was proposed in [13] by considering the impact of transceiver hardware impairments. Besides, by leveraging the lowrank property of mmWave channels, a CANDECOMP/PARAFAC decompositionbased CE scheme [14] was proposed with further improved performance.
The aforementioned solutions [9, 10, 11, 12, 13, 14] only consider frequencyflat mmWave channels but practical mmWave channels can be frequency selective. A distributed grid matching pursuit (DGMP) algorithm was proposed in [15] to estimate timedispersive channels, where orthogonal frequency division multiplexing (OFDM) is considered. An adaptive grid matching pursuit (AGMP) algorithm developed from the DGMP was proposed to reduce power leakage by using adaptive grid matching solution [16]. In [17], the sparse mmWave channels at different subcarriers were estimated separately by utilizing the OMP, but the computational complexity is high as the number of subcarriers is typically large. To reduce complexity, a simultaneous weighted (SW)OMP based scheme was proposed in [18], which exploits the angledomain common sparsity of channels at different subcarriers to improve performance. By leveraging the common sparsity of delaydomain channels among transceiver antenna pairs, a block CSbased CE solution was proposed for mmWave fullydigital MIMO system [19]
, where the training sequences are designed to improve CE performance. Based on the lowrank property of wideband mmWave channels, the training signal received can be formulated as a highorder tensor with the lowrank CANDECOMP/PARAFAC decomposition to estimate the dominated channel parameters, including angleofarrivals/angleofdepartures (AoAs/AoDs) and delays
[20]. However, most CSbased CE schemes for wideband mmWave MIMO usually adopt discrete AoAs/AoDs grids in CS dictionary, but the practical AoAs/AoDs of multipath components (MPCs) are continuously distributed. This mismatch may degrade CE performance. Moreover, the stateoftheart works [9, 10, 11, 12, 13, 14, 15, 16, 19, 17, 18, 20, 21] usually focus on the ideal uniform linear array (ULA) while seldom investigate the practical uniform planar array (UPA). Compared to the ULA, the UPA offers more compact array with threedimensional (3D) beamforming in both horizontal and vertical directions [22, 23], leading to the FDMIMO. Although mmWave FDMIMO CE has been investigated in [22] and [23], they only consider either fullydigital MIMO or frequencyflat channels.IB Our Contributions
We propose a closedloop sparse CE scheme for multiuser wideband mmWave FDMIMO systems by exploiting the sparsity of MPCs in both angle and delay domains. To illustrate this sparsity, we consider the mmWave FDMIMO based aerial base station (BS), as shown in Fig. 1, which has flexible deployment capacity to serve user devices (UDs) in hotspot areas [24]. In such systems, the mmWave airground channels exhibit inherent sparsity in both angle and delay domains due to the limited significant scatterers. By carefully designing the transmit beamforming or precoding and receive combining in the training stage, our solution is capable of acquiring the superresolution estimates of AoAs, AoDs, and delays^{1}^{1}1By contrast, the stateoftheart CSbased solutions [15, 16, 17, 18] only focus on the angledomain sparsity and design the limited resolution CS dictionary with quantized angledomain grids. This quantization error limits the achievable CE performance. based on spectrum estimation theory [25, 26] with low training overhead and computational complexity. To clearly show the novelty and new contribution of our proposed solution as well as to contrast it with the existing solutions, below we conceptually explain our proposed closedloop sparse CE scheme.
Our closedloop solution includes the downlink CE stage followed by the uplink CE stage as illustrated in Fig. 2, where the channel reciprocity in time division duplex (TDD) based systems is exploited [27, 28]. In TDD based systems, downlink AoAs (AoDs) are uplink AoDs (AoAs). The frame structure of our solution is further depicted in Fig. 3. As shown in Figs. 2 and 3, at the downlink CE stage, the horizontal/vertical AoAs of sparse MPCs are first estimated at each UD and they are fed back to the BS with limited quantization accuracy through the feedback link. At this stage, we design a common random transmit precoder at the BS to transmit the training signal for omnidirectional channel sounding and we design the receive combiners at each UD to visualize the highdimensional hybrid array as a lowdimensional digital array, which facilitates the use of the multidimensional unitary ESPRIT (MDUESPRIT) algorithm to estimate channel parameters. Similarly, at the uplink CE stage, the horizontal/vertical AoDs and delays associated with different UDs are successively estimated by using the MDUESPRIT algorithm at the BS. Owing to the channel reciprocity, the AoAs estimated at UD side can be utilized as a priori to design the multibeam transmit precoder to improve the receive SNR for uplink CE. A maximum likelihood (ML) approach is adopted at the BS to pair the channel parameters acquired at these two stages and, consequently, the associated path gains can readily be obtained using the least squares (LS) estimator. Finally, the mmWave channel associated with each UD can be separately reconstructed based on the dominant channel parameters estimated above.
In contrast to the existing solutions, our main contributions are summarized as follows:

We propose a closedloop sparse CE solution with the common downlink CE stage for all UDs and the dedicated uplink CE stage for each UD. At the downlink CE stage, the BS fully exploits its large transmit power to allow multiple UDs simultaneously perform CE for saving training overhead, and only horizontal/vertical AoAs estimation with low computational complexity is required at UDs. At the uplink CE stage, the multibeam transmit precoder is designed at UDs to enhance CE accuracy, and the BS with high computational capacity can jointly estimate horizontal/vertical AoDs and delays. By contrast, the stateoftheart CSbased solutions [15, 16, 17, 18] are all based on openloop approach, which imposes high computational complexity and storage space requirements on the receiver^{2}^{2}2More specifically, downlink openloop CE imposes prohibitive complexity and storage requirements on UDs, while uplink openloop CE suffers from the low receive SNR and thus poor CE performance due to the limited transmit power of UDs.. It is worth emphasizing that by exploiting the horizontal/vertical AoAs estimated at the first stage, the multibeam transmit precoder designed at UDs significantly improves SNR, which works even for the case that the number of MPCs is larger than that of RF chains, i.e., the number of beams can be larger than that of RF chains.

We design the receive combiners at UDs and BS for visualizing the hybrid array as a digital array, to enable the application of spectrum estimation techniques. For hybrid MIMO, it is challenging to directly apply spectrum estimation techniques to estimate channel parameters, since the shiftinvariance structure of array response matrix observed from the digital baseband domain does not hold [26, 29]. Our solution sheds light on how to apply spectrum estimation techniques, e.g., ESPRITtype algorithms, to hybrid MIMO, so that the superresolution estimation of channel parameters can be acquired with low training overhead and computational complexity. By contrast, to achieve the highresolution estimation of channel parameters, the existing CSbased solutions [15, 16, 17, 18] usually rely on the redundant dictionary, which imposes the excessively high computational complexity and storage requirements for FDMIMO systems with largescale antenna arrays.

The double sparsity of MPCs in both angle and delay domains is harnessed in our proposed CE scheme. By leveraging the double sparsity, our scheme formulates the CE problem as a multidimensional spectrum estimation problem, where the superresolution estimations of horizontal/vertical AoAs/AoDs and delays can be obtained simultaneously. By comparison, the existing CE solutions [15, 16, 18] only consider the angledomain sparsity of mmWave channels. Moreover, the delaydomain CE approaches of [17, 19] have to estimate the effective delaydomain channel impulse response (CIR), which includes the timedomain pulse shaping filter (PSF) that can weaken the delaydomain channel sparsity. By contrast, the superresolution estimation of delays in our solution is immune to PSF.
Throughout this paper, boldface lower and uppercase symbols denote column vectors and matrices, respectively.
, , , , and denote the conjugate, transpose, Hermitian transpose, matrix inversion, integer ceiling and integer floor operators, respectively. and are norm and norm of , respectively, while is Frobenius norm of , and is the cardinality of the set . The Kronecker and KhatriRao product operations are denoted by and , respectively. denotes the identity matrix and is the null matrix of size , while () denotes the vector of size with all the elements being (). is the diagonal matrix with the elements of at its diagonal entries, denotes the vector consisting of the main diagonal elements of , and denotes the block diagonal matrix with as its block diagonal entries. The expectation and determinant operators are denoted by and , respectively. The modulo operation returns the remainder of dividing by , and returns the set containing of the ordered set . The operator returns the set containing the indices of nonzero elements of , and converts the vector of size into the matrix of size by successively selecting every elements of as its columns. The operator stacks the columns of on top of each another, denotes the throw and thcolumn element of , and is the vector consisting the th to th elements of , while is the submatrix containing the th to th columns of . denotes the submatrix containing the rows of indexed in the order set , and is the th column of . Finally, and denote the real part and imaginary part of the argument, respectively.Ii Downlink Channel Estimation Stage
Consider the mmWave FDMIMO system with hybrid beamforming, where the BS and UDs are all equipped with UPA, and OFDM with subcarriers is adopted, while independent signal streams are transmitted on each subcarrier [15]. The BS (UD) employs () antennas and () RF chains, where () and () are the numbers of antennas in horizontal and vertical directions at the BS (UD), respectively.
Iia Downlink Channel Estimation Signal Model
The downlink CE stage lasts time slots and each time slot contains OFDM symbols. The signal received by the th UD at the th subcarrier of the th OFDM symbol in the th time slot can be expressed as
(1) 
for , , and . In (1), the UD’s receive combining matrix in which and are the analog and digital receive combining matrices, and the BS’s transmit precoding matrix in which and are the analog and digital transmit precoding matrices, respectively, while is the corresponding downlink channel matrix, is the training signal with , and is the complex additive white Gaussian noise (AWGN) vector with the covariance matrix , i.e., . Due to the constant modulus of the phase shift network (PSN), and with , and is the quantized phase set of the PSN with the resolution , given by
(2) 
Also to guarantee the constraint on the total transmit power [5].
According to the typical mmWave channel model [20, 15, 16, 19], the downlink delaydomain continuous channel matrix with MPCs can be expressed as
(3) 
where is the normalization factor, is the delay of the th MPC, and denotes the equivalent PSF, while the complex gain matrix is given by
(4) 
where is the associated complex path gain, () and () denote the horizontally and vertically spatial frequencies with halfwavelength antenna spacing at the th UD (the BS), respectively. Here, () and () are the downlink horizontal and vertical AoAs (AoDs) of the th MPC associated with the UPA, respectively. The array response vector at UD is given by [23, 25], in which
(5)  
(6) 
are the steering vectors associated with the horizontal and vertical directions, respectively. Similarly, the array response vector at BS is given by , where the horizontal and vertical direction steering vectors and are given respectively by substituting and with and in (5) as well as by substituting and with and in (6).
The frequencydomain channel matrix at the th subcarrier can then be expressed as
(7) 
where denotes the system bandwidth, and is the sampling period. The derivation of the first equation in (7) is shown in Appendix. Observe that does not depend on the PSF, and it exhibits the sparsity in delay domain due to small but large normalized delay spread. Recall that the existing CSbased solutions of [17, 19] have to estimate the effective delaydomain CIRs that include the PSF, and this PSF will destroy the delaydomain sparsity of mmWave channels when the order of PSF is large. in (7) can be rewritten as
(8) 
where is the diagonal matrix in which with and , and the array response matrix associated with the AoAs of the th UD can be expressed as in which and are the steering matrices corresponding to the horizontally and vertically spatial frequencies, respectively, while is the array response matrix associated with the AoDs in which the steering matrices and have the similar form as and , respectively.
IiB Obtain Horizontal/Vertical AoAs at UD
The downlink CE corresponds to Step 1 to Step 4 of Fig. 2, where the horizontal and vertical AoAs are estimated. The same training signal is transmitted at every subcarrier, i.e., , and the same digital transmit precoding/receive combining matrices are adopted at every subcarrier, i.e., and , for . The number of independent signal streams associated with each subcarrier in each OFDM symbol is . We can visualize a lowdimensional digital subUPA, in which and are the numbers of antennas in horizontal and vertical directions, from the highdimensional hybrid analog/digital UPA. Given , the BS only requires time slots to broadcast training signals, with each time slot containing OFDM symbols. The choice of , and trades off estimation accuracy with training overhead, because larger , and lead to better estimation accuracy but higher training overhead, and vice versa. Since the signals received by all UDs have the same form, we can focus on the th UD and the user index can be omitted from , , , , , , and other relavent variables for clarity.
By collecting the received signals of (1) associated with the th subcarrier over all the OFDM symbols of the th time slot into the signal matrix , we have
(9) 
where and . Since the BS transmits the common random signal , the transmit precoding matrix
should be a random matrix. This is achieved by designing
as with randomly and uniformly selected from , and designing as with randomly and uniformly selected from the interval , i.e., . The BS can use the same transmit precoding matrix to send the same sounding signal for every time slot. By stacking the received signal matrices of (9) over the time slots into , we have(10) 
where contains the downlink receive combining matrices used in the time slots, and , while is the corresponding noise matrix.
Multiplying with and aggregating the resulting signals over all the subcarriers lead to the signal matrix
(11) 
where , with , and . Observe from (10) and (11) that we cannot directly apply powerful spectrum estimation techniques [25, 30] to estimate the horizontal/vertical AoAs from , since the shiftinvariance structure of the array response matrix does not hold in hybrid receive array [26, 29]. We propose to visualize the highdimensional hybrid array as a lowdimensional digital array by designing appropriate receive combining matrices so that the shiftinvariance structure of array response can be reconstructed and therefore superresolution CE based on spectrum estimation techniques can be harnessed.
IiC Design Receive Combining Matrices at UD
Without loss of generality, we consider
independent signal streams. First, we utilize a unitary matrix
to design the digital receive combining matrix of the th time slot’s receive combining matrix for . Specifically, . To design the analog receive combining matrix , we construct the matrix as(12) 
where . Then we take the submatrix and define to construct the ordered index set with . Next we perform the modulo operation on with to get the ordered index set with . The rows of whose indices correspond to are determined by as , while the rest rows of consist of the identical . The phase value of arbitrary element in the designed , denoted by , is then quantized to by minimizing the Euclidean distance according to . Thus, the th receive combining matrix can be obtained as for .
The proposed design for is summarized in Algorithm 1. Since the number of RF chains is usually the power of 2, we can adopt Hadamard matrix for
or discrete Fourier transform matrix for
to construct . Clearly, our design can be used for the PSN with arbitrary , even the extremely low resolution PSN with . With the designed , we have(13) 
Clearly, () is the matrix containing the first () rows of (). Thus, maintains the double shiftinvariance structure of the original array response matrix for both horizontal and vertical AoAs [29], and the design can be used to visualize the highdimensional hybrid analog/digital array as a lowdimensional digital array. Therefore, we can utilize the MDUESPRIT algorithm detailed in Section IV to obtain the superresolution estimates of horizontal/vertical AoAs at UD. Since the ESPRITtype algorithms [25, 26, 29, 30] require the knowledge of the number of MPCs, we next turn to the task of acquiring the number of MPCs at the receiver, i.e., Step 2 of Fig. 2.
IiD EVDBased Estimate for Number of MPCs
In OFDM systems, the channels of multiple adjacent subcarriers within coherence bandwidth are highly correlated [31]. If the maximum delay spread is with delay taps, the channel coherence bandwidth is . Then we can jointly use the measurements of adjacent subcarriers to estimate the number of MPCs, where is the subcarrier’s bandwidth. Specifically, by dividing signal matrices into groups, we can obtain the th measurement matrix , as the average of the measurements in the th group
(14) 
The average measurements are collected as , and the covariance matrix of is
. According to the eigenvalue decomposition (EVD), we obtain
Comments
There are no comments yet.