I Introduction
MASSIVE multipleinput multipleoutput (MIMO) has become a key enabling technology for the fifthgeneration (5G) and future wireless communication systems [1][6]. In the downlink transmission of a massive MIMO system, existing nonlinear precoding methods such as TomlinsonHarashima precoding (THP) [7]
or vector perturbation (VP) precoding
[8][11] are not preferred, due to their prohibitive computational complexity when the number of antennas is large. Instead, it has been shown in [12] that lowcomplexity linear precoding approaches such as zeroforcing (ZF) [13] and regularized ZF (RZF) [14] can achieve nearoptimal performance.The near optimality for linear precoding in massive MIMO is achieved assuming that fullydigital processing and highresolution digitaltoanalog converters (DACs) are employed at the base station (BS). However, this fullydigital processing requires a dedicated radio frequency (RF) chain and a pair of highresolution DACs for each antenna element, which results in a significant increase in the hardware complexity and cost when the number of transmit antennas scales up. Moreover, the resulting power consumption of the large number of hardware components will also be prohibitive for practical implementation. All of the above drawbacks make fullydigital processing highly undesirable for a massive MIMO BS. Accordingly, there have been several emerging techniques that aim to reduce the hardware complexity and the power consumption for a massive MIMO BS, including hybrid analogdigital (AD) precoding [15][22], constantenvelope (CE) precoding [23][26], and lowresolution DACs.
Hybrid AD precoding reduces the hardware complexity and cost by reducing the number of RF chains, where precoding is divided into the analog domain and the lowdimension fullydigital domain [15]. CE precoding reduces the hardware complexity by transmitting CE signals, which allows the use of the most powerefficient and cheapest RF amplifiers for each RF chain [25]. In addition to the above two techniques, the use of lowresolution DACs, which is the focus of this paper, can reduce the hardware cost and power consumption per RF chain by reducing the resolution of the DACs. Since the power consumption of DACs grows exponentially with the resolution and linearly with the bandwidth [27], [28], adopting lowresolution DACs instead of highresolution ones can greatly reduce the power consumption at the BS, especially in the case of massive MIMO where a large number of DACs are required. Among lowresolution DACs, the most extreme case, i.e., 1bit DACs, has received particular research interest, not only because it allows the most significant power savings, but also because the output signals of 1bit DACs are CE signals, which further enables the use of the most powerefficient RF amplifiers, as in the case for CE precoding.
In the existing literature, there have already been some works that consider the precoding designs in the presence of 1bit DACs [29][31]. In [29], the traditional ZF precoding was applied to the case of 1bit DACs, where the 1bit quantization was directly performed upon the ZF precoded signals, and an error floor is observed as the transmit signaltonoise ratio (SNR) increases. The significant performance loss is as expected for this naive precoding method. In [30], a 1bit quantized linear precoding method was proposed based on the minimum mean squared error (MMSE) metric, which achieves an improved performance over the quantized ZF precoding approach. In [31], the 1bit precoding algorithm was proposed via an iterative gradient projection process based on the MMSE metric. However, error floors can still be observed for the 1bit precoding schemes proposed in [30] and [31]
, which result from the fact that linear precoding is still considered, i.e., the precoded signals before quantization are linear transformations of the data symbols. To further improve the error rate performance, nonlinear 1bit precoding designs, which directly map the data symbols into the 1bit transmit signals through a symbollevel operation, were further proposed in
[32][41]. In [32] and [33], nonlinear 1bit precoding schemes were proposed via the gradient projection algorithm based on the minimum bit error rate (BER) metric and MMSE metric, respectively. Both proposed 1bit algorithms outperform [29][31] significantly, especially in mediumtohigh SNR regime. [34] proposed a 1bit precoding design via a biconvex relaxation procedure, while [35] extended the work in [34] and proposed several 1bit precoding schemes based on semidefinite relaxation (SDR), norm relaxation, and sphere precoding, respectively. [37] improves the performance of the schemes proposed in [35] through an alternating optimization framework, when a highorder QAM modulation is adopted at the BS.Nevertheless, it should be noted that these MMSEbased precoding methods may be suboptimal since they ignore that multiuser interference can be constructive and further benefit the performance, when symbollevel precoding is employed. Considering a PSK constellation as an example, if the received signal is forced to locate deeper within the decision region and further away from the detection boundaries, a more reliable decoding performance can be obtained, though the MSE in this case will increase. This observation has already been exploited in [10] and [42][45] by constructive interference (CI) precoding to achieve an improved BER performance in a traditional smallscale MIMO system. Following this concept, [38] and [39] have extended the idea of interference exploitation to 1bit precoding designs, and the resulting BER performance is shown to be promising. Moreover, while not explicitly shown, [40] also adopts the formulation of CIbased 1bit precoding, where a branchandbound (BB)based algorithm that obtains the optimal solution is presented. More recently, the BB framework has been extended to the case of QAM modulations in [41]
based on the QR decomposition. However, the above two 1bit designs based on the fullyBB (FBB) process are still not practically useful in massive MIMO systems due to their unfavorable complexity.
In this paper, we focus on designing a nearoptimal 1bit precoding algorithm as well as its lowcomplexity variation for massive MIMO systems, where both PSK and QAM modulations are considered. We exploit the concept of CI to formulate the optimization problem, which aims to maximize the CI effect subject to the 1bit output signal requirement. The proposed nearoptimal 1bit precoding solution is achieved via a judicious partial BB (PBB) procedure, while its lowcomplexity counterpart is implemented through a greedy algorithm. For clarity, we summarize the main contributions of this paper below:

For both PSK and QAM signaling, by constructing the Lagrangian function of the relaxed optimization problem and formulating the corresponding KarushKuhnTucker (KKT) conditions, we mathematically prove by contradiction that the majority of the output signals obtained from solving the relaxed problem already satisfy the 1bit constraint, and only a small portion of the entries need to be further quantized to obtain a feasible 1bit solution, where the quantization losses are incurred.

Building on this important and interesting observation, we propose a 1bit precoding algorithm through a PBB process to further improve the performance of the conventional CIbased 1bit precoding method in [39], where the BB process is only performed for part of the output signals that do not comply with the 1bit requirement, and we adopt the adaptive subdivision rule to guarantee a faster convergence rate. For PSK signaling, we use the ‘maxmin’ criterion to design the PBB algorithm, while the MSE criterion and the alternating optimization framework are employed when QAM signaling is considered at the BS. Compared to the conventional FBB method whose complexity becomes prohibitive in massive MIMO scenarios, our proposed PBB approach enables the use of the BB framework in massive MIMO systems and allows a significant gain in terms of computational cost, while still exhibiting a nearoptimal error rate performance.

We further design an alternative 1bit precoding scheme through an ‘Ordered Partial Sequential Update’ (OPSU) process, where we only consider the effect of a single entry at a time on the objective function, while keeping other entries in the output signals fixed. The proposed OPSU method further allows an additional complexity reduction compared to the PBB approach, and is particularly appealing when the PBB process needs to search the entire subspace.

Compared to the conventional CIbased approach and other existing 1bit precoding methods in the literature, numerical results demonstrate an SNR gain of more than 7dB for the proposed 1bit precoding schemes in terms of BER, which also remove the error floors that are commonly observed in conventional 1bit precoding techniques, especially when higherorder modulations are adopted at the BS.
The remainder of this paper is organized as follows. Section II introduces the basic system model and concept of CI. Section III includes the proposed 1bit precoding approaches for PSK signaling, and Section IV extends the proposed 1bit precoding schemes to QAM signaling. Numerical results are shown in Section V, and Section VI concludes our paper.
Notations: , , and denote scalar, column vector and matrix, respectively. and denote transposition and conjugate transposition of a matrix, respectively. denotes the cardinality of a set, is the sign function, and denotes the imaginary unit. denotes the modulus of a complex number or the absolute value of a real number, and denotes the norm. and represent an matrix in the complex and real set, respectively. and denote the real and imaginary part of a complex number, respectively. returns the rank of a matrix, and represents a identity matrix.
Ii System Model and Constructive Interference
Iia System Model
We consider a massive MIMO system in the downlink, as depicted in Fig. 1, where a BS with transmit antennas communicates with a total number of singleantenna users simultaneously in the same timefrequency resource, where . As we focus on the precoding design at the BS, ideal ADCs are employed for each user, and we assume perfect knowledge of CSI is known [30][37]. We denote the data symbol vector as , which can be drawn from a unitnorm PSK or a normalized QAM constellation. We denote
as the flatfading Rayleigh channel matrix between the BS and the users, with each entry following a standard complex Gaussian distribution
. The corresponding transmit signal vector before quantization can then be expressed as(1) 
which is a function of the symbol vector as well as the channel matrix . represents a general precoding strategy that forms the desired unquantized signal vector , which can be a linear transformation of as in [30][32] or a nonlinear mapping as in [33][41]. When 1bit DACs are adopted at the BS, the output signal vector on the antenna elements is given by
(2) 
where is the elementwise 1bit quantization on both real and imaginary part of . For simplicity, we normalize such that , which leads to
(3) 
where is the th entry in , , and . Accordingly, the received signal vector can be expressed as
(4) 
where is the additive Gaussian noise at the receiver side and .
IiB Constructive Interference
CI is defined as the interference that leads to an increased distance to all the detection thresholds for a specific constellation point, as discussed in [42][44]. Closedform CI precoding was firstly considered for PSK signaling in smallscale MIMO systems to improve the performance of the linear ZF precoding in [46][48]. The optimizationbased CI approach firstly appeared in [10], and has more recently been extensively studied in [49][53], where the constructive area is introduced. It is shown that, as long as the received signal is located within the constructive area, the corresponding interfering signals are beneficial, which further improve the error rate performance. CI precoding has further been extended to QAM constellations in [54], [55]. Compared to PSK modulations where all the constellation points can exploit CI, only part of the constellation points for QAM modulations can exploit CI, since we observe all the interference for the inner constellation points of QAM to be destructive, as discussed in [55].
Iii 1Bit Precoding for PSK Signaling
Iiia CI Condition and Problem Formulation
Before presenting the 1bit precoding designs, we first briefly introduce the mathematical formulation of the CI condition for PSK modulations based on the ‘symbolscaling’ metric, as depicted in Fig. 2, where we adopt one quarter of an 8PSK constellation as the example [55]. Without loss of generality, we express
(5) 
to denote a unitnorm constellation point, where we have further decomposed the constellation point into and that are parallel to the two detection boundaries of . The detailed expressions for and can be found in the appendix of [39] for a general PSK modulation, and are omitted here for brevity. denotes the received signal for user excluding noise, which is similarly decomposed into
(6) 
where is the th row of . and are two introduced real auxiliary variables that fully represent the effect of interference and 1bit quantization on . Following [10] and [55], the ‘symbolscaling’ CI condition for PSK signaling can be expressed as
(7) 
where . Accordingly, the 1bit precoding design that exploits CI and maximizes its effect can be formulated as
(8)  
is a nonconvex optimization problem due to the 1bit constraint , , and it is therefore difficult to directly obtain the optimal solution. Nevertheless, by relaxing this nonconvex constraint, can readily be transformed into a convex problem:
(9)  
where is the th entry in . With the relaxed signal vector obtained by solving , a feasible solution to the original 1bit precoding problem can be obtained by enforcing an elementwise normalization, given by
(10) 
For notational simplicity, we denote the final quantized signal vector and the 1bit precoding scheme based on the above relaxationnormalization procedure as and ‘CI 1Bit’, respectively.
IiiB Analytical Study of 1Bit CI Precoding for PSK
It has been shown in [39] that the error rate performance of ‘CI 1Bit’ is promising, which outperforms many of the existing 1bit precoding designs in the literature for PSK signaling [30][34]. In fact, it is numerically observed in [39] that most of the entries in obtained by solving already satisfy the 1bit constraint, while an elementwise relaxation is performed afterwards. This is the main reason why the performance of ‘CI 1Bit’ is promising, since only a small part of the entries in need to be further quantized, which leads to an insignificant quantization loss. Nevertheless, [39] fails to explain this observation from a mathematical point of view.
In this section, we further elaborate on this observation, and propose a 1bit precoding method via the PBB method based on this observation, which further improves the performance of ‘CI 1Bit’ and achieves a closetooptimal error rate performance. To begin with, we first transform the relaxed optimization problem into a simpler form for ease of our analysis. By comparing the real and imaginary part of both sides of (6), we can express as a function of and , given by
(11)  
By defining
(12) 
and
(13) 
(11) can be expressed in a compact matrix form as
(14) 
where is given by
(15) 
Based on the construction of shown above, the following rank property is observed.
Lemma 1:
with probability 1.
Proof: See Appendix A.
With the matrix formulation in (14), the relaxed optimization problem is equivalent to
(16)  
where is the th row of , is the th entry of , , and .
Based on the formulation of , the following important proposition is obtained, which builds the foundation of the proposed 1bit precoding algorithms through PBB in the following.
Proposition 1: For obtained by solving , there are at least entries that already satisfy the 1bit constraint.
Proof: See Appendix B.
Lemma 2: The results of Proposition 1 directly extend to rankdeficient channels, where in this case there are at least entries in obtained by solving that already satisfy the 1bit constraint.
Proof: The proof for this lemma follows the proof for Proposition 1, and is therefore omitted for brevity.
Proposition 1 mathematically explains the observation in [39] and the reason why the performance of ‘CI 1Bit’ is promising. In the case of a massive MIMO system where , is close to , i.e., the majority of the entries in obtained by solving already satisfy the 1bit constraint, and the performance loss incurred from the subsequent quantization on the residual (or even smaller) entries in becomes insignificant. Moreover, the performance loss to the optimal solution is expected to become even less for rankdeficient channels, where , as shown by Lemma 2.
IiiC 1Bit Precoding Design via Partial BranchandBound
Building upon the important observation in Proposition 1, we introduce the 1bit precoding method based on PBB in this section. Essentially, as opposed to the FBB method in [40] that searches the entire space , our proposed PBB scheme only focuses on part of the space, i.e., , which corresponds to the entries in that do not comply with the 1bit constraint, where . Therefore, compared to the FBB scheme in [40] whose complexity is proportional to the number of transmit antennas, which thus only works in smallscale MIMO systems, the PBB approach introduced in this paper, whose complexity is only proportional to the number of users, enables a significant reduction in the computational cost of the BBbased method and allows the BB framework to be applicable in massive MIMO systems.
To be more specific, we firstly conduct some row rearrangements for obtained from solving to arrive at , such that can be decomposed into
(17) 
where consists of that already satisfy the 1bit constraint, and we obtain following Proposition 1. We further express that consists of the residual entries in whose amplitudes are strictly smaller than , where we have and . We further denote the matrix with the corresponding column rearrangement as (), which is decomposed into
(18) 
where and . The resulting optimization problem on is then given by
(19)  
The proposed PBB algorithm aims to update via the BB process to obtain the optimal solution of , while is kept fixed throughout the algorithm.
IiiC1 Initialization
We select the solution obtained from the ‘CI 1Bit’ scheme in Section IIIA as the starting point of the PBB algorithm, and we initialize the upper bound by substituting into (14), where represents the real representation of . Accordingly, is given by
(20) 
IiiC2 Branching
In the branching process, we select an entry in and allocate its value. To guarantee a fast convergence speed, we adopt the adaptive subdivision rule to choose within each branching process [56], [57], where satisfies:
(21) 
Subsequently, we update and by removing from and including it in , where , , and in (18) are also updated accordingly. By relaxing the 1bit constraint, the convex optimization problem to obtain the lower bound can be formulated as
(22)  
The value of the lower bound is equal to the objective of with the optimal , i.e.,
(23) 
and the corresponding upper bound is obtained by enforcing a 1bit quantization on the resulting , given by
(24) 
It should be noted that needs to be solved twice in each branching operation, since can take the value of either (left child) or (right child). For notational convenience, we denote () and () as the corresponding obtained lower bound (upper bound) for the left child and right child, respectively.
IiiC3 Bounding
In the bounding process, we update the upper bound and remove suboptimal branches. To be more specific, is updated as
(25) 
and we denote as the 1bit signal vector that returns . Importantly, if the value of the lower bound ( or ) is smaller than this updated upper bound, the corresponding obtained is a valid branch. Otherwise, if the value of the lower bound ( or ) is larger than this updated upper bound, the corresponding obtained signal vector and all its subsequent branches are suboptimal and can be excluded from the algorithm, which makes the BB process more efficient than the exhaustive search method.
IiiC4 Algorithm
We repeat the above branching and bounding process until all the entries in have been included in , as illustrated in Fig. 3, and the final solution of the proposed PBB approach is obtained as the signal vector that returns the optimal upper bound value . For clarity, we summarize the above procedure in Algorithm 1 below, where transforms a real vector into its complex equivalence.
IiiD A LowComplexity Alternative via OPSU
While the proposed PBB algorithm exhibits a significant complexity reduction compared to the FBB method, it may still need to search the entire subspace in the worst case, which may not be favorable when the number of users is large. Therefore, in this section we further introduce a lowcomplexity alternative approach based on an ‘ordered partial sequential update’ (OPSU) process, which is essentially a greedy algorithm. Firstly, it is observed in Algorithm 1 that, when updating , the PBB approach considers the effects of all the residual entries in on the resulting by solving . To pursue a more computationallyefficient approach, we propose a suboptimal procedure by only considering the effect of a single entry in at a time on the objective function. Because of this design, the sequence how we select each time may further have an effect on the solution of and lead to different local optimums.
To be more specific, we first rewrite as
(26) 
where represents the th column in , and . For the proposed ‘OPSU’ approach, in each iteration we aim to choose the value for that can increase the value of the minimum entry in , while keeping other entries in fixed. Meanwhile, we note that the amplitudes of the entries in the corresponding also have an effect on the resulting . Therefore, by denoting
(27) 
we propose to first allocate values for whose corresponding has the most significant impact on , i.e., that has the largest value of . We repeat the above process until all the entries in have been visited, and this iterative process is summarized in Algorithm 2, where is the sort function following a descending order.
Compared to the PBB method proposed in the previous section where the cardinality of the set in Algorithm 1 may keep increasing after each iteration, the major complexity gain for the lowcomplexity ‘OPSU’ method proposed in this section comes from the fact that we only consider one feasible solution and update its entries following an iterative manner. In this case, and therefore the inner iterative process in Algorithm is no longer required. Another complexity reduction comes from the fact that the proposed ‘OPSU’ method avoids the need to solve the optimization problem within each iteration. Both of the above make the ‘OPSU’ method more computationally efficient than the PBB approach.
Iv 1Bit Precoding for QAM Signaling
In this section, we focus on CIbased 1bit precoding approaches when QAM signaling is considered at the BS. In this case, the received signal vector needs to be further rescaled for correct demodulation, expressed as
(28) 
where is the received symbol vector for demodulation, and is the precoding factor that can be obtained by minimizing the MSE between and , given by [37]
(29) 
Iva CI Condition and Problem Formulation
Similar to Section III, we begin by considering the CIbased 1bit precoding design for QAM, where we still decompose the symbol and noiseless received signal following (5) and (6). In the case where QAM constellations are considered, the expressions for and can be simplified into
(30) 
For the mathematical CI condition for QAM constellations, we follow [55] and consider the multiuser interference on the inner constellation points as only destructive, as illustrated in Fig. 4, where a 16QAM constellation is depicted. Accordingly, we divide the real scalars and , into two groups and , where the entries in correspond to the real or imaginary part of the symbols that can exploit CI, i.e., both the real and imaginary part of the constellation point ‘D’, the real part of ‘B’ and the imaginary part of ‘C’, as shown in Fig. 4, while consists of the residual entries corresponding to the symbols that cannot benefit from CI. We can then obtain
(31) 
and
(32) 
Subsequently, the CI condition for QAM constellation points can be expressed as
(33) 
and the corresponding 1bit precoding problem that exploits CI can be formulated as
(34)  
Remark: is a nonconvex optimization problem. More importantly, due to the fact that the equality constraints and the 1bit constraints cannot both be satisfied at the same time in general, we note that the original optimization problem for QAM constellations is an infeasible problem in nature, as opposed to formulated for PSK which is always feasible. This infeasibility also makes the PBB algorithm designed for PSK signaling not directly applicable.
To obtain a feasible 1bit solution for QAM signaling, we can relax the 1bit constraint in by following similar steps as in (9) and (10), where the relaxed optimization problem can be formulated as
(35)  
The error rate performance for ‘CI 1Bit’ following this relaxationnormalization procedure serves as an upper bound of the proposed 1bit precoding methods based on PBB and OPSU introduced in the following. For notational convenience, we denote the obtained quantized signal vector based on this conventional CI approach as .
IvB Analytical Study of 1Bit CI Precoding for QAM
In this section, we show that the results revealed in Proposition 1 for PSK signaling directly extend to QAM signaling, while the problem formulations are different. To begin with, we expand (28) into its real representation, given by
(36) 
where , and are the real representations of , and , respectively, similar to as shown in (12). in (29) can then be equivalently expressed as
(37) 
where . Following the steps in Section IIIA, the relaxed optimization problem can be equivalently transformed into
(38)  
Based on the formulation of , the following proposition is presented.
Proposition 2: Similar to the case of PSK, for obtained by solving , there are at least entries that already satisfy the 1bit constraint.
IvC 1Bit Precoding Design via Partial BranchandBound
In this section, we propose the 1bit precoding scheme via PBB for QAM modulations. Before we proceed, we note that due to the infeasibility of in nature as discussed in Remark, the PBB algorithm designed for PSK constellations cannot be directly extended to the case of QAM modulations, since the subproblem included in the BB process will be infeasible when the number of entries in that are to be optimized is smaller than . To circumvent this issue, when we design the PBB algorithm for QAM signaling after obtaining , we consider the MSE criterion as the objective function instead, which is defined as
(39) 
Based on the expression for MSE as shown in (39), we note another distinct feature when QAM signaling is considered: Compared to PSK signaling in which case the objective function only includes , the objective function for QAM signaling also includes the precoding factor , which is a function of the transmit signal vector that is to be optimized. The nonlinear relationship between and , as observed in (37), makes the direct minimization on MSE difficult to solve. Nevertheless, noting that and are uncoupled in the expression for MSE, the alternating optimization framework can be adopted as an effective method [58]. To be more specific, the alternating optimization selects as the starting point, and iteratively update and until convergence, where the update for follows (37), and the updated is obtained by minimizing the MSE in (39), given by
(40)  
where we note that the term is constant when is fixed, which is therefore omitted.
For clarity, we first summarize the main steps of the alternating optimization framework in Algorithm 3 before proceeding, where is a predefined threshold for convergence.
In the following, we briefly describe the PBB process within the alternating optimization framework for QAM signaling, which generally follows the PBB process for PSK in Section IIIA. The major difference lies in the formulated optimization problems for obtaining and the corresponding calculation of the lower bounds and upper bounds, since the criterion has switched to MSE minimization.
IvC1 Initialization
The initial upper bound can be obtained based on the expression for MSE in (39) as
(41) 
where we note that is fixed in this BB process due to the alternating optimization approach.
IvC2 Branching
Similar to the case for PSK, we rearrange obtained from solving into such that can be decomposed as in (17), where and are similarly defined. To proceed, we select an entry in and allocate its value following the adaptive subdivision rule. The resulting optimization problem on that minimizes the MSE can then be constructed as
(42)  
where denotes with the corresponding column rearrangement. By decomposing into , the objective function of can be simplified into
(43)  
where we introduce that is fixed within the PBB process for a given . To obtain the lower bound, we relax the 1bit constraint in to arrive at a convex leastsquares (LS) problem as
(44)  
Comments
There are no comments yet.