Massive multiple-input multiple-output (MIMO) systems rely on a large number of base station (BS) antennas for simultaneously serving a few dozens of users, while striking an attractive spectral efficiency (SE) vs. energy efficiency (EE) trade-off [1, 2, 3]. However, a large number of antennas inevitably lead to an excessive radio-frequency (RF) hardware cost and energy consumption. Hence, a hardware-efficient hybrid analog-digital (HAD) beamforming architecture has been proposed as an alternative to the fully digital beamformer for practical implementation. Explicitly, this architecture relies on an analog beamformer in the RF domain combined with a low-dimensional digital beamformer in the baseband, hence allowing a significant reduction in the number of RF chains required. This in turn provides increased design flexibility for striking an attractive performance vs. complexity trade-off. The receiver’s power dissipation is dominated by that of the analog-to-digital converters (ADCs), since it increases exponentially with the number of quantization bits ,. Hence, low-resolution ADCs (LADCs) have been advocated for the RF chains of HAD receivers.
I-a Related Work
Extensive research efforts have been invested in the design and performance evaluation of hybrid beamformers [7, 8, 9, 10, 11, 12, 13, 14]. Basically, the existing hybrid beamforming architectures can be mainly divided into the static partially-connected structure [8, 9, 10], the static fully-connected structure [8, 9, 11], and the dynamic partially-connected or fully-connected structure [12, 13, 14]. The authors of  considered a single-user millimeter wave (mmWave) MIMO system and treated the hybrid beamforming weight design as a matrix factorization problem. Efficient alternating optimization algorithms were developed for both static partially-connected and static fully-connected structures. As a further exploration, the authors of 
proposed heuristic algorithms for the design of static partially-connected and fully-connected HAD beamformers, permitting to maximize the overall SE of a broadband orthogonal frequency-division multiplexing (OFDM)-based system. Besides, the proposed algorithm in for the fully-connected structure can achieve SE close to that of the optimal fully-digital solution with much less number of RF chains. Moreover, to dynamically adapt to the spatial channel covariance matrix and improve the system performance, [12, 13, 14] proposed to design the dynamic-connected HAD beamforming architecture to realize a flexible analog beamforming matrix. In , a dynamic sub-array approach was considered for OFDM systems, and a greedy algorithm to optimize the array partition based on the long-term channel characteristics was suggested. Different from the partially-adaptive-connected structure in , the authors in  proposed to implement the hybrid precoder with a fully-adaptive-connected structure. A joint optimization of switch-controlled connections and the hybrid precoders was formulated as a large-scale mixed-integer nonconvex problem with high dimensional power constraints. By modifying the on-off states of switch-controlled connections, this fully-adaptive-connected structure can realize a fully-connected structure or any possible sub-connected structure. Furthermore, the HAD beamforming strategy has also been investigated in the context of novel relay-aided systems [15, 16] and in Terahertz communications . Since employing LADCs in the massive MIMO regime has become indispensable for reducing the power consumption and hardware cost, it has catalyzed substantial interest in the recent literature. The authors of 
analyzed the performance for transmission over flat-fading MIMO channel using single-bit ADC and derived the capacity upper-bound both at infinite and finite signal-to-noise ratios (SNR). The impact of the spatial correlation of antennas on the rate loss caused by the coarse quantization of LADCs was further studied in, where the authors concluded that LADCs can achieve a sum rate performance much closer to the case of ideal ADCs under spatially correlated channels.
On account of the benefits provided by HAD beamformers and LADCs, a number of studies have been proposed to characterize the performance of massive MIMO systems relying on the HAD beamforming architecture using LADCs. The pioneering contribution of  proposed a generalized hybrid architecture using LADCs and verified that the achievable rate is comparable to that obtained by high-precision ADC based receivers at low and medium SNRs, which provides valuable insights for future research. Intensive research efforts have also been dedicated to analog/digital beamforming design [21, 22], to SE/EE optimization [23, 24, 25], to channel estimation  and to signal detection .
However, previous research on HAD beamforming using LADCs has mainly considered uniform quantizers having a fixed, predetermined number of bits, which limited the performance of these systems due to coarse quantization. As a further advance, it was shown that a variable-resolution ADC or adaptive-resolution ADC (RADC) architecture is preferable [28, 29]. In , the authors investigated a mixed-ADC structure designed for cloud radio access networks (C-RAN). In particular, they developed an ADC-resolution selection algorithm for maximizing either the SE or EE based on an approximation of the generalized mutual information in the low-SNR regime. In , the authors developed a pair of ADC bit allocation strategies for minimizing the quantization error effects under a total ADC power constraint, thereby achieving an improved performance. However, the separate design of the ADC quantization bit allocation and hybrid beamforming matrices tends to suffer from performance degradation. Hence, the authors of  jointly optimized both the on/off modes of the RF processing chains and the number of ADC quantization bits.
As a further development, the authors of  aimed for jointly optimizing the sampling resolution of ADCs and the hybrid beamforming matrices, which results in energy efficient solutions for point-to-point mmWave MIMO systems. In light of , the authors of  extended the joint design to multiuser systems, hence achieving a significantly improved EE compared to the existing schemes. The potential advantages of RADCs in the context of various practical systems have also been reported in [33, 34, 35]. The authors of  focused their attention on the uplink of mmWave systems using RADCs and investigated the associated joint resource allocation and user scheduling problem. In  the design of the reconfigurable intelligent surface (RIS) aided mmWave uplink system relying on RADCs was investigated, demonstrating that an RIS is capable of mitigating the performance erosion imposed by RADCs.
Nevertheless, the aforementioned studies are mainly based on perfect instantaneous channel state information (CSI), which is assumed to be known at the BS. In practice, the acquisition of perfect CSI cannot be achieved in massive MIMO systems due to the inevitable channel estimation errors . Therefore, how to efficiently design the pilot signals for improving the precision of channel estimation is of paramount importance. In practice a codebook-based pilot scheme is preferred, where orthogonal pilot sequences are chosen from a given codebook as a benefit of its low-complexity implementation and low feedback overhead [37, 38]. However, allocating mutually orthogonal pilot sequences to a large number of users for avoiding interference during channel estimation would require excessive pilot lengths and their orthogonality would still be destroyed upon convolution with the dispersive channel impulse response (CIR). For this reason, the carefully constructed reuse of a limited set of orthogonal pilot sequences for different users for example is of paramount importance for high-precision channel estimation. Further design alternatives were proposed for massive MIMO systems for example in [38, 39], which dispense with a codebook, hence they may be termed as codebook-free solutions. However, they tend to require a higher feedback overhead for attaining a high channel estimation accuracy.
I-B Main Contributions
Despite the above advances, there is a paucity of research contributions on jointly optimizing the pilot sequences, the number of ADC quantization bits, and the HAD combiner for achieving high-precision channel estimation in the uplink of a multiuser massive MIMO system employing RADCs. Hence our inspiration is to fill this knowledge-gap. In particular, both codebook-based and codebook-free pilot schemes are investigated, where for each scheme, we aim to minimize the mean square error (MSE) of the channel estimate subject to a transmit power constraint, to the constant-modulus constraint imposed on the elements of the analog combining matrix, and to the additional constraints on the number of quantization bits. In a nutshell, the main contributions of this paper over the existing literature lie in the following:
We focus on jointly optimizing the pilot sequences, the HAD combiners and the RADC bit allocation in the presence of a correlated Rayleigh fading channel model. The channel estimation mean square error (MSE) minimization problem is formulated, which only requires the knowledge of channel statistics under practical operating conditions.
We first transform the highly nonconvex optimization problem into an equivalent but more tractable form by introducing auxiliary variables and employing fractional programming (FP) techniques. Then, we develop a new block coordinate descent (BCD) based algorithm for a codebook-free channel estimation scheme and a penalty dual decomposition (PDD) based algorithm for a codebook-based channel estimation scheme. Both of these iterative algorithms ensure convergence to the set of stationary solutions of the original optimization problem. Furthermore, the computational complexity of the proposed algorithms is analyzed.
A simplified low-complexity algorithm is also presented for the codebook-based channel estimation scheme, which solves the MSE minimization problem suboptimally but efficiently.
To characterize the benefits of our proposed algorithms, we provide exhaustive simulation results in terms of the MSE, sum rate and feedback overhead for a range of pertinent system settings. We demonstrate that through the coordinated allocation of bits to the RADCs, the proposed algorithms can beneficially exploit the knowledge of channel statistics to accomplish the pilot design, while minimizing interference and improving the channel estimation accuracy.
I-C Organization and Notation
This paper is organized as follows. In Section II and III, we introduce the investigated system model and formulate the optimization problem for the constrained channel estimation, respectively. In Section IV and V, we propose efficient algorithms by solving the formulated problems for the codebook-free and codebook-based pilot schemes, respectively. Section VI provides simulation results to appraise the performance of the proposed algorithms. The paper is concluded in Section VII, whilst proofs and detailed derivations appear in the Appendices.
Notations: For a matrix , , , , and
denote its transpose, conjugate, conjugate transpose and vectorization, respectively.denotes the element at the intersection of row and column . For a square matrix , , , and represents its trace, inverse and Frobenius norm, respectively. denotes a diagonal matrix consisting of the diagonal elements of .
denotes an identity matrix. For a vector, denotes a diagonal matrix with along its main diagonal and denotes the Euclidean norm of vector . The symbol denotes the Kronecker product. and respectively denote the real and magnitude parts of a complex number. denotes the largest integer less than or equal to and denotes the smallest integer greater than or equal to . We let () denote complex (real) space. denotes the expectation and .
Ii System Model
As shown in Fig. 1, we consider a multiuser massive MIMO uplink system that adopts a static fully-connected hybrid AD combining structure with RADCs at the BS. The BS which is equipped with antennas and RF chains, serves single-antenna users simultaneously. The baseband output of each RF chain is fed to a dedicated RADC that employs variable bit resolution to quantize the real and imaginary parts of each analog signal. Moreover, we assume that the BS and users are fully time-synchronized.
Ii-a Channel Model
Without loss of generality, we consider a narrowband correlated channel model. Let represent the uplink channel from user to the BS. Then, the channel vector can be expressed as
where is a vector with independent and identically distributed (i.i.d.) elements distributed as , and denotes the channel covariance matrix for user . describes the spatial correlation properties of the channel due to macroscopic effects of propagation, including path-loss and shadowing. When the users are quasi-stationary, the path-loss and shadowing can be readily obtained based on the distance between the BS and user and stored at the BS as a priori ,.
Ii-B Pilot Sequences
In this work, we focus on two types of pilot sequences, i.e., the codebook-free and codebook-based pilots. For ease of exposition, we denote as as the pilot sequence transmitted by user , where is the length of the pilot sequence during each coherence interval. It is noteworthy that is predetermined based on the coherence budget.
where denotes the transmit power budget for user .
2) Codebook-based pilots: We denote the available codebook as , where denotes the -th () potential pilot sequence. It is assumed that the different pilot sequences meet the orthonormality conditions, i.e., , and . Then, the pilot sequence of user is constructed as
where denotes the transmit power of user and denotes the codebook sequence allocated to user .
Ii-C Uplink Training
In the uplink training phase, the BS estimates the uplink channels based on the pilot sequences simultaneously transmitted from the users. Focusing on the -th user, the received pilot signal at the BS can be expressed as
where the first term represents the desired contribution from user , the second term represents multiuser interference, and denotes the additive complex Gaussian noise matrix with i.i.d. entries following the distribution .
An analog combining matrix is employed to process the received signal at the BS with the goal of suppressing interference from the other users. In the HAD architecture, the analog combiner is typically implemented using phase shifters , which imposes constant-modulus constraints on the elements of the matrix . The output of the analog combiner is given by
We employ RADCs to quantize as shown in Fig. 1, which enables flexible quantization bit allocation for each baseband channel according to the radio propagation characteristics. Such a refined design can efficiently mitigate the quantization errors and greatly improve the system performance with reduced hardware cost and power consumption. Let integer denote the number of available quantization bits of RADC . Assuming that the gain of automatic gain control is appropriately set, the additive quantization noise model (AQNM) can be employed to reformulate the quantized signal . Then, based on the AQNM, the quantized output is specialized to
where is the element-wise quantization function, is a diagonal gain matrix. Here, the quantization gain is a function of the number of quantization bit and defined as , where is a normalized quantization error. For , can be expressed exactly in terms of , while for , they can be approximated by . is the additive quantization noise which is independent of . To facilitate analytical derivations, we vectorize and obtain . obeys the complex Gaussian distribution with zero mean and covariance matrix
where and .
Finally, we leverage the digital processing techniques for quantization loss mitigation and interference cancellation. Specifically, the retrieved signal of user at the output of the digital combiner is expressed as
Iii Problem Statement
In this section, we first introduce the minimum MSE (MMSE)-based estimator and then formulate the problem under investigation.
Iii-a MMSE Channel Estimation
The BS aims to estimate the channel based on the received pilot signal . To facilitate analytical derivations and achieve the estimation of the desired channel using the MMSE estimator , we vectorize in (8) and obtain shown at the top of this page.
Defining as the MMSE estimate of the channel , we have
where we define , , and We note that matrix is positive definite and therefore invertible. The corresponding MSE of user is given by
The detailed derivations of and are shown in Appendix A. Then, the total MSE for the estimation of all the user channels can be expressed as
Iii-B Problem Formulation
We note that the effectiveness of hybrid beamforming mainly depends on the accuracy of CSI. In this work, we concentrate on the joint design of the pilot sequence, HAD combiner, and the allocation of ADC quantization bits, aiming to minimize the total MSE (12) for the given system model. To simplify notations, we introduce , , and . Since the matrices in (12) do not depend on the optimization variables , and , minimizing the is equivalent to maximizing . Hence, we consider the following optimization problem
where and respectively denote the minimum and maximum number of quantization bits (), denotes the average number of quantization bits and is provided as the total budget of quantization bits at the BS. Constraint (13c) is imposed to enforce constant-modulus on the elements of analog combining matrix . Constraint (13d) limits the range of the quantization bits for each RADC, while constraint (13e) gives a threshold on the total ADC quantization bits at the BS.
It should be emphasized that problem (13) is extremely difficult to solve due to the constant-modulus constraints, the nonconvex mixed-integer feasible set, and the highly nonconvex objective function with matrix ratio term . To be specific, in the matrix fractional structure of the objective function, continuous variables and discrete variable appear in both the denominator and the numerator, which makes the problem intractable.
Iv Proposed BCD-Based Algorithm for the Codebook-Free Pilot Scheme
In this section, we focus on the codebook-free channel estimation where the pilot sequences meet condition (2). We first convert problem (13) into an equivalent and mathematically tractable one based on the FP method. Then, an efficient BCD-based joint design algorithm is proposed to solve the resulting problem, where a series of subproblems can be tackled via alternating optimization.
Iv-a Problem Transformation
With the aid of the advanced matrix FP techniques , we employ the ratio-decoupling approach to transform problem (13) into a more tractable yet equivalent form. To this end, we first introduce the auxiliary variable for each ratio term . Then problem (13) can be converted into the equivalent problem
It is observed that the constraint regarding to in problem (14) are separable with respect to the other variables, i.e., .111Actually, matrix is not involved in the constraints. When these variables are fixed, each auxiliary variable can be optimally determined as follows:
The detailed proofs of the equivalence between problem (13) and problem (14), as well as of the optimal solution (15) for are deferred to Appendix B. Using the matrix quadratic transformation, where the cost function in (13a) is replaced by that in (14a), we effectively decouple the numerator and denominator in each term and avoid the difficulties posed by the nonconvex fractional objective function.
Iv-B Proposed BCD-Based Algorithm
Note that the constraints (13b)-(13e) in problem (14) are uncoupled with respect to the variables , i.e., each one of the constraints involve only one of these variables at a time. Hence, to reach a solution, we can decompose (14) into several independent subproblems each involving a single variable and solve problem (14) by means of the BCD algorithm. The corresponding developments are elaborated in further details below.
1) Optimization of : In order to cope with the difficulties posed by the discrete integer variables , we first relax into a continuous value , solve the resulting problem for , and finally round each optimal continuous value to the nearest integer . To determine the best integer quantization bits and efficiently control quantization error, we employ the following criterion for :
where is properly chosen so that is satisfied. Notice that simple rounding to with can always satisfy constraint (13e) and reduce the power consumption while increasing the MSE and quantization errors. On the other hand, rounding to with may violate the constraint (13e), i.e., . Hence, considering (16), we apply the bisection method to find the optimal value of  and consequently, determine the optimal allocation of the quantization bits, which greatly achieves the quantization error control.
We denote as the vector of continuous variables after relaxation, to be used in (14) in place of . With the remaining variables being fixed, the subproblem for is formulated as
where the objective function is expressed as
It is rather challenging to globally solve a nonconcave optimization problem such as (17), due to the nature of the objective function. To address this difficulty, we therefore resort to successively solving a sequence of strongly concave approximate problems. Specifically, by virtue of the successive concave approximation (SCA) method , we construct a concave surrogate function in lieu of the objective function (IV-B) at each iteration. The surrogate function used at iteration takes the form
where is the gradient of with respect to at the current point
(which is calculated based on the chain rule),is a positive constant, and the term is used to ensure the strong concavity of . Therefore, at the -th iteration of the SCA algorithm, we need to solve the following linearly constrained quadratic surrogate problem to update :
whose solution can be efficiently obtained using the off-the-shelf CVX solver . Accordingly, we can find the optimal in an iterative fashion, and finally apply the procedure in (16) to obtain the desired integer solution .
2) Optimization of : By keeping the other variables fixed, the subproblem for becomes a quadratic optimization problem with constant-modulus constraints, which is given by
To handle the nontrivial constant-modulus constraints resulting from the analog combiner, we use the one-iteration BCD-type algorithm  to recursively solve problem (21). The detailed derivation is shown in Appendix C.
3) Optimization of : Similarly, the corresponding unconstrained subproblem for is given by
where each term in the sum only involve the corresponding vector . Therefore, problem (22) can be further decomposed into a sequence of simple per-user cases, each one being a quadratic optimization problem. These per-user subproblems can be efficiently solved by the first order condition . Specifically, the optimal value of can be derived as
4) Optimization of : We now turn to the optimization of the pilot matrix while fixing the other variables . In this case, the key step is to rewrite the objective function (14a) in a quadratic form with respect to , which results into
with being the th row vector of the matrix , and being the th vector on the column of , and is a constant term independent of variable . Therefore, the subproblem for the codebook-free pilot matrix is given by
which can be solved by the Lagrange multiplier method. By associating the Lagrange multiplier to the corresponding power budget constraint , the Lagrange function for (26) is given by
By examing the first order optimality condition for , the optimal value of can be derived as
where should be optimally determined as
The corresponding BCD-based algorithm is summarized in Algorithm 1.
Iv-C Convergence Analysis and Computational Complexity
This subsection establishes the local convergence of Algorithm 1 to stationary solutions and presents its detailed computational complexity analysis. First, we introduce a key lemma, which can be readily proved according to .
In the following, we analyze the computational complexity of Algorithm 1. We use the number of multiplications as a measure of complexity and assume that . Updating the auxiliary variables in (15) involves the calculation of , and the inverse of based on Gauss-Jordan elimination with an overall complexity . According to the proposed one-iteration BCD type algorithm , updating all the entries of once has a complexity of . Furthermore, the overall computational complexity of is on the order of . The computational complexity of optimizing is dominated by computing the Jacobian matrix of with respect to . Thus, the complexity for updating is , where denotes the number of iterations of the SCA method. As for the pilot matrix , using the bisection method to search each Lagrangian parameter requires iterations to achieve a desired accuracy, where is the initial interval size and is the tolerance. Hence, the overall computational complexity of updating over all the users is . By retaining dominant terms, the overall complexity of the proposed Algorithm 1 is , where denotes the number of iterations.
V Proposed PDD-Based Algorithm for the Codebook-Based Pilot Scheme
In this section, we focus on the codebook-based channel estimation where pilot sequences are chosen from the codebook . Under this setup, we first recast the corresponding problem (14) from Section IV into a resource allocation problem with discrete binary codeword indicator variables. Subsequently, we introduce a set of auxiliary variables and propose an innovative PDD-based algorithm to solve the optimization problem.
V-a Proposed PDD-Based Algorithm
Since is herein structured as where is chosen from the codebook , the codebook-based pilot design can be regarded as a pilot resource allocation problem. Consequently, how to allocate the limited orthogonal pilot in to the different users to reduce the channel estimation error is of great importance. We introduce as the allocation indicator, where signifies that the best orthogonal pilot is assigned to user ; otherwise, we have . Hence, using we can write and .
With each expressed in terms of , problem (14) can be equivalently converted to the following problem:
where with represents the search variables, and constraint (31c) guarantees that each user is associated with a single pilot sequence.
where and is the -th column of identity matric . Then problem (31) is then equivalent to
where importantly, the variables are no longer limited to binary values. To solve problem (34), we next introduce the proposed PDD-based algorithm which exhibits a double-loop structure to solve problem (34). Based on the PDD framework [47, 48], we first add a penalized version of the equality constraints in (33) to the objective function (34a), thereby obtaining the following augmented Lagrangian (AL) problem
where , , denote the Lagrange multipliers and denotes the penalty coefficient. We note that problems (34) and (35) are equivalent in the limit , which is at the hearth of the PDD method. Specifically, in the PDD-based algorithm, the inner loop solves the AL problem with fixed AL multipliers and penalty coefficient, while the outer loop aims to update the dual variables while reducing the penalty coefficient in light of the constraint violation.
Since the constraints in problem (35) are separable, we can address the AL problem (35) in the inner loop with the BCD method. Particularly, the subproblems for , and are the same as problems (17), (21) and (22) discussed in Section IV-B, respectively, and can therefore be solved using the same methods. The optimization of the remaining variables in the inner loop, i.e. , and , is explained in further detail below.
1) Optimization of : We optimize in parallel for , with the remaining variables being fixed. The subproblem of optimizing can be simplified as the unconstrained problem in (V) shown at the top of this page, where . By examining the first-order optimality condition of (V), we derive the closed-form solution of shown in (V) at the top of this page. Finally, we use the one-iteration BCD method to update based on (V).
2) Optimization of : We optimize in parallel with the other variables fixed. The corresponding subproblem for can be expressed as