MASSIVE multiple-input multiple-output (MIMO) has become a key enabling technology for the fifth-generation (5G) and future wireless communication systems -. In the downlink transmission of a massive MIMO system, existing non-linear precoding methods such as Tomlinson-Harashima precoding (THP) 
or vector perturbation (VP) precoding- are not preferred, due to their prohibitive computational complexity when the number of antennas is large. Instead, it has been shown in  that low-complexity linear precoding approaches such as zero-forcing (ZF)  and regularized ZF (RZF)  can achieve near-optimal performance.
The near optimality for linear precoding in massive MIMO is achieved assuming that fully-digital processing and high-resolution digital-to-analog converters (DACs) are employed at the base station (BS). However, this fully-digital processing requires a dedicated radio frequency (RF) chain and a pair of high-resolution DACs for each antenna element, which results in a significant increase in the hardware complexity and cost when the number of transmit antennas scales up. Moreover, the resulting power consumption of the large number of hardware components will also be prohibitive for practical implementation. All of the above drawbacks make fully-digital processing highly undesirable for a massive MIMO BS. Accordingly, there have been several emerging techniques that aim to reduce the hardware complexity and the power consumption for a massive MIMO BS, including hybrid analog-digital (AD) precoding -, constant-envelope (CE) precoding -, and low-resolution DACs.
Hybrid AD precoding reduces the hardware complexity and cost by reducing the number of RF chains, where precoding is divided into the analog domain and the low-dimension fully-digital domain . CE precoding reduces the hardware complexity by transmitting CE signals, which allows the use of the most power-efficient and cheapest RF amplifiers for each RF chain . In addition to the above two techniques, the use of low-resolution DACs, which is the focus of this paper, can reduce the hardware cost and power consumption per RF chain by reducing the resolution of the DACs. Since the power consumption of DACs grows exponentially with the resolution and linearly with the bandwidth , , adopting low-resolution DACs instead of high-resolution ones can greatly reduce the power consumption at the BS, especially in the case of massive MIMO where a large number of DACs are required. Among low-resolution DACs, the most extreme case, i.e., 1-bit DACs, has received particular research interest, not only because it allows the most significant power savings, but also because the output signals of 1-bit DACs are CE signals, which further enables the use of the most power-efficient RF amplifiers, as in the case for CE precoding.
In the existing literature, there have already been some works that consider the precoding designs in the presence of 1-bit DACs -. In , the traditional ZF precoding was applied to the case of 1-bit DACs, where the 1-bit quantization was directly performed upon the ZF precoded signals, and an error floor is observed as the transmit signal-to-noise ratio (SNR) increases. The significant performance loss is as expected for this naive precoding method. In , a 1-bit quantized linear precoding method was proposed based on the minimum mean squared error (MMSE) metric, which achieves an improved performance over the quantized ZF precoding approach. In , the 1-bit precoding algorithm was proposed via an iterative gradient projection process based on the MMSE metric. However, error floors can still be observed for the 1-bit precoding schemes proposed in  and 
, which result from the fact that linear precoding is still considered, i.e., the precoded signals before quantization are linear transformations of the data symbols. To further improve the error rate performance, non-linear 1-bit precoding designs, which directly map the data symbols into the 1-bit transmit signals through a symbol-level operation, were further proposed in-. In  and , non-linear 1-bit precoding schemes were proposed via the gradient projection algorithm based on the minimum bit error rate (BER) metric and MMSE metric, respectively. Both proposed 1-bit algorithms outperform - significantly, especially in medium-to-high SNR regime.  proposed a 1-bit precoding design via a biconvex relaxation procedure, while  extended the work in  and proposed several 1-bit precoding schemes based on semidefinite relaxation (SDR), -norm relaxation, and sphere precoding, respectively.  improves the performance of the schemes proposed in  through an alternating optimization framework, when a high-order QAM modulation is adopted at the BS.
Nevertheless, it should be noted that these MMSE-based precoding methods may be sub-optimal since they ignore that multi-user interference can be constructive and further benefit the performance, when symbol-level precoding is employed. Considering a PSK constellation as an example, if the received signal is forced to locate deeper within the decision region and further away from the detection boundaries, a more reliable decoding performance can be obtained, though the MSE in this case will increase. This observation has already been exploited in  and - by constructive interference (CI) precoding to achieve an improved BER performance in a traditional small-scale MIMO system. Following this concept,  and  have extended the idea of interference exploitation to 1-bit precoding designs, and the resulting BER performance is shown to be promising. Moreover, while not explicitly shown,  also adopts the formulation of CI-based 1-bit precoding, where a branch-and-bound (BB)-based algorithm that obtains the optimal solution is presented. More recently, the BB framework has been extended to the case of QAM modulations in 
based on the QR decomposition. However, the above two 1-bit designs based on the fully-BB (F-BB) process are still not practically useful in massive MIMO systems due to their unfavorable complexity.
In this paper, we focus on designing a near-optimal 1-bit precoding algorithm as well as its low-complexity variation for massive MIMO systems, where both PSK and QAM modulations are considered. We exploit the concept of CI to formulate the optimization problem, which aims to maximize the CI effect subject to the 1-bit output signal requirement. The proposed near-optimal 1-bit precoding solution is achieved via a judicious partial BB (P-BB) procedure, while its low-complexity counterpart is implemented through a greedy algorithm. For clarity, we summarize the main contributions of this paper below:
For both PSK and QAM signaling, by constructing the Lagrangian function of the relaxed optimization problem and formulating the corresponding Karush-Kuhn-Tucker (KKT) conditions, we mathematically prove by contradiction that the majority of the output signals obtained from solving the relaxed problem already satisfy the 1-bit constraint, and only a small portion of the entries need to be further quantized to obtain a feasible 1-bit solution, where the quantization losses are incurred.
Building on this important and interesting observation, we propose a 1-bit precoding algorithm through a P-BB process to further improve the performance of the conventional CI-based 1-bit precoding method in , where the BB process is only performed for part of the output signals that do not comply with the 1-bit requirement, and we adopt the adaptive subdivision rule to guarantee a faster convergence rate. For PSK signaling, we use the ‘max-min’ criterion to design the P-BB algorithm, while the MSE criterion and the alternating optimization framework are employed when QAM signaling is considered at the BS. Compared to the conventional F-BB method whose complexity becomes prohibitive in massive MIMO scenarios, our proposed P-BB approach enables the use of the BB framework in massive MIMO systems and allows a significant gain in terms of computational cost, while still exhibiting a near-optimal error rate performance.
We further design an alternative 1-bit precoding scheme through an ‘Ordered Partial Sequential Update’ (OPSU) process, where we only consider the effect of a single entry at a time on the objective function, while keeping other entries in the output signals fixed. The proposed OPSU method further allows an additional complexity reduction compared to the P-BB approach, and is particularly appealing when the P-BB process needs to search the entire subspace.
Compared to the conventional CI-based approach and other existing 1-bit precoding methods in the literature, numerical results demonstrate an SNR gain of more than 7dB for the proposed 1-bit precoding schemes in terms of BER, which also remove the error floors that are commonly observed in conventional 1-bit precoding techniques, especially when higher-order modulations are adopted at the BS.
The remainder of this paper is organized as follows. Section II introduces the basic system model and concept of CI. Section III includes the proposed 1-bit precoding approaches for PSK signaling, and Section IV extends the proposed 1-bit precoding schemes to QAM signaling. Numerical results are shown in Section V, and Section VI concludes our paper.
Notations: , , and denote scalar, column vector and matrix, respectively. and denote transposition and conjugate transposition of a matrix, respectively. denotes the cardinality of a set, is the sign function, and denotes the imaginary unit. denotes the modulus of a complex number or the absolute value of a real number, and denotes the -norm. and represent an matrix in the complex and real set, respectively. and denote the real and imaginary part of a complex number, respectively. returns the rank of a matrix, and represents a identity matrix.
Ii System Model and Constructive Interference
Ii-a System Model
We consider a massive MIMO system in the downlink, as depicted in Fig. 1, where a BS with transmit antennas communicates with a total number of single-antenna users simultaneously in the same time-frequency resource, where . As we focus on the precoding design at the BS, ideal ADCs are employed for each user, and we assume perfect knowledge of CSI is known -. We denote the data symbol vector as , which can be drawn from a unit-norm PSK or a normalized QAM constellation. We denote
as the flat-fading Rayleigh channel matrix between the BS and the users, with each entry following a standard complex Gaussian distribution. The corresponding transmit signal vector before quantization can then be expressed as
which is a function of the symbol vector as well as the channel matrix . represents a general precoding strategy that forms the desired unquantized signal vector , which can be a linear transformation of as in - or a non-linear mapping as in -. When 1-bit DACs are adopted at the BS, the output signal vector on the antenna elements is given by
where is the element-wise 1-bit quantization on both real and imaginary part of . For simplicity, we normalize such that , which leads to
where is the -th entry in , , and . Accordingly, the received signal vector can be expressed as
where is the additive Gaussian noise at the receiver side and .
Ii-B Constructive Interference
CI is defined as the interference that leads to an increased distance to all the detection thresholds for a specific constellation point, as discussed in -. Closed-form CI precoding was firstly considered for PSK signaling in small-scale MIMO systems to improve the performance of the linear ZF precoding in -. The optimization-based CI approach firstly appeared in , and has more recently been extensively studied in -, where the constructive area is introduced. It is shown that, as long as the received signal is located within the constructive area, the corresponding interfering signals are beneficial, which further improve the error rate performance. CI precoding has further been extended to QAM constellations in , . Compared to PSK modulations where all the constellation points can exploit CI, only part of the constellation points for QAM modulations can exploit CI, since we observe all the interference for the inner constellation points of QAM to be destructive, as discussed in .
Iii 1-Bit Precoding for PSK Signaling
Iii-a CI Condition and Problem Formulation
Before presenting the 1-bit precoding designs, we first briefly introduce the mathematical formulation of the CI condition for PSK modulations based on the ‘symbol-scaling’ metric, as depicted in Fig. 2, where we adopt one quarter of an 8PSK constellation as the example . Without loss of generality, we express
to denote a unit-norm constellation point, where we have further decomposed the constellation point into and that are parallel to the two detection boundaries of . The detailed expressions for and can be found in the appendix of  for a general -PSK modulation, and are omitted here for brevity. denotes the received signal for user excluding noise, which is similarly decomposed into
where is the -th row of . and are two introduced real auxiliary variables that fully represent the effect of interference and 1-bit quantization on . Following  and , the ‘symbol-scaling’ CI condition for PSK signaling can be expressed as
where . Accordingly, the 1-bit precoding design that exploits CI and maximizes its effect can be formulated as
is a non-convex optimization problem due to the 1-bit constraint , , and it is therefore difficult to directly obtain the optimal solution. Nevertheless, by relaxing this non-convex constraint, can readily be transformed into a convex problem:
where is the -th entry in . With the relaxed signal vector obtained by solving , a feasible solution to the original 1-bit precoding problem can be obtained by enforcing an element-wise normalization, given by
For notational simplicity, we denote the final quantized signal vector and the 1-bit precoding scheme based on the above relaxation-normalization procedure as and ‘CI 1-Bit’, respectively.
Iii-B Analytical Study of 1-Bit CI Precoding for PSK
It has been shown in  that the error rate performance of ‘CI 1-Bit’ is promising, which outperforms many of the existing 1-bit precoding designs in the literature for PSK signaling -. In fact, it is numerically observed in  that most of the entries in obtained by solving already satisfy the 1-bit constraint, while an element-wise relaxation is performed afterwards. This is the main reason why the performance of ‘CI 1-Bit’ is promising, since only a small part of the entries in need to be further quantized, which leads to an insignificant quantization loss. Nevertheless,  fails to explain this observation from a mathematical point of view.
In this section, we further elaborate on this observation, and propose a 1-bit precoding method via the P-BB method based on this observation, which further improves the performance of ‘CI 1-Bit’ and achieves a close-to-optimal error rate performance. To begin with, we first transform the relaxed optimization problem into a simpler form for ease of our analysis. By comparing the real and imaginary part of both sides of (6), we can express as a function of and , given by
(11) can be expressed in a compact matrix form as
where is given by
Based on the construction of shown above, the following rank property is observed.
with probability 1.
Proof: See Appendix A.
With the matrix formulation in (14), the relaxed optimization problem is equivalent to
where is the -th row of , is the -th entry of , , and .
Based on the formulation of , the following important proposition is obtained, which builds the foundation of the proposed 1-bit precoding algorithms through P-BB in the following.
Proposition 1: For obtained by solving , there are at least entries that already satisfy the 1-bit constraint.
Proof: See Appendix B.
Lemma 2: The results of Proposition 1 directly extend to rank-deficient channels, where in this case there are at least entries in obtained by solving that already satisfy the 1-bit constraint.
Proof: The proof for this lemma follows the proof for Proposition 1, and is therefore omitted for brevity.
Proposition 1 mathematically explains the observation in  and the reason why the performance of ‘CI 1-Bit’ is promising. In the case of a massive MIMO system where , is close to , i.e., the majority of the entries in obtained by solving already satisfy the 1-bit constraint, and the performance loss incurred from the subsequent quantization on the residual (or even smaller) entries in becomes insignificant. Moreover, the performance loss to the optimal solution is expected to become even less for rank-deficient channels, where , as shown by Lemma 2.
Iii-C 1-Bit Precoding Design via Partial Branch-and-Bound
Building upon the important observation in Proposition 1, we introduce the 1-bit precoding method based on P-BB in this section. Essentially, as opposed to the F-BB method in  that searches the entire space , our proposed P-BB scheme only focuses on part of the space, i.e., , which corresponds to the entries in that do not comply with the 1-bit constraint, where . Therefore, compared to the F-BB scheme in  whose complexity is proportional to the number of transmit antennas, which thus only works in small-scale MIMO systems, the P-BB approach introduced in this paper, whose complexity is only proportional to the number of users, enables a significant reduction in the computational cost of the BB-based method and allows the BB framework to be applicable in massive MIMO systems.
To be more specific, we firstly conduct some row rearrangements for obtained from solving to arrive at , such that can be decomposed into
where consists of that already satisfy the 1-bit constraint, and we obtain following Proposition 1. We further express that consists of the residual entries in whose amplitudes are strictly smaller than , where we have and . We further denote the matrix with the corresponding column rearrangement as (), which is decomposed into
where and . The resulting optimization problem on is then given by
The proposed P-BB algorithm aims to update via the BB process to obtain the optimal solution of , while is kept fixed throughout the algorithm.
We select the solution obtained from the ‘CI 1-Bit’ scheme in Section III-A as the starting point of the P-BB algorithm, and we initialize the upper bound by substituting into (14), where represents the real representation of . Accordingly, is given by
In the branching process, we select an entry in and allocate its value. To guarantee a fast convergence speed, we adopt the adaptive subdivision rule to choose within each branching process , , where satisfies:
Subsequently, we update and by removing from and including it in , where , , and in (18) are also updated accordingly. By relaxing the 1-bit constraint, the convex optimization problem to obtain the lower bound can be formulated as
The value of the lower bound is equal to the objective of with the optimal , i.e.,
and the corresponding upper bound is obtained by enforcing a 1-bit quantization on the resulting , given by
It should be noted that needs to be solved twice in each branching operation, since can take the value of either (left child) or (right child). For notational convenience, we denote () and () as the corresponding obtained lower bound (upper bound) for the left child and right child, respectively.
In the bounding process, we update the upper bound and remove sub-optimal branches. To be more specific, is updated as
and we denote as the 1-bit signal vector that returns . Importantly, if the value of the lower bound ( or ) is smaller than this updated upper bound, the corresponding obtained is a valid branch. Otherwise, if the value of the lower bound ( or ) is larger than this updated upper bound, the corresponding obtained signal vector and all its subsequent branches are sub-optimal and can be excluded from the algorithm, which makes the BB process more efficient than the exhaustive search method.
We repeat the above branching and bounding process until all the entries in have been included in , as illustrated in Fig. 3, and the final solution of the proposed P-BB approach is obtained as the signal vector that returns the optimal upper bound value . For clarity, we summarize the above procedure in Algorithm 1 below, where transforms a real vector into its complex equivalence.
Iii-D A Low-Complexity Alternative via OPSU
While the proposed P-BB algorithm exhibits a significant complexity reduction compared to the F-BB method, it may still need to search the entire subspace in the worst case, which may not be favorable when the number of users is large. Therefore, in this section we further introduce a low-complexity alternative approach based on an ‘ordered partial sequential update’ (OPSU) process, which is essentially a greedy algorithm. Firstly, it is observed in Algorithm 1 that, when updating , the P-BB approach considers the effects of all the residual entries in on the resulting by solving . To pursue a more computationally-efficient approach, we propose a sub-optimal procedure by only considering the effect of a single entry in at a time on the objective function. Because of this design, the sequence how we select each time may further have an effect on the solution of and lead to different local optimums.
To be more specific, we first rewrite as
where represents the -th column in , and . For the proposed ‘OPSU’ approach, in each iteration we aim to choose the value for that can increase the value of the minimum entry in , while keeping other entries in fixed. Meanwhile, we note that the amplitudes of the entries in the corresponding also have an effect on the resulting . Therefore, by denoting
we propose to first allocate values for whose corresponding has the most significant impact on , i.e., that has the largest value of . We repeat the above process until all the entries in have been visited, and this iterative process is summarized in Algorithm 2, where is the sort function following a descending order.
Compared to the P-BB method proposed in the previous section where the cardinality of the set in Algorithm 1 may keep increasing after each iteration, the major complexity gain for the low-complexity ‘OPSU’ method proposed in this section comes from the fact that we only consider one feasible solution and update its entries following an iterative manner. In this case, and therefore the inner iterative process in Algorithm is no longer required. Another complexity reduction comes from the fact that the proposed ‘OPSU’ method avoids the need to solve the optimization problem within each iteration. Both of the above make the ‘OPSU’ method more computationally efficient than the P-BB approach.
Iv 1-Bit Precoding for QAM Signaling
In this section, we focus on CI-based 1-bit precoding approaches when QAM signaling is considered at the BS. In this case, the received signal vector needs to be further re-scaled for correct demodulation, expressed as
where is the received symbol vector for demodulation, and is the precoding factor that can be obtained by minimizing the MSE between and , given by 
Iv-a CI Condition and Problem Formulation
Similar to Section III, we begin by considering the CI-based 1-bit precoding design for QAM, where we still decompose the symbol and noiseless received signal following (5) and (6). In the case where QAM constellations are considered, the expressions for and can be simplified into
For the mathematical CI condition for QAM constellations, we follow  and consider the multi-user interference on the inner constellation points as only destructive, as illustrated in Fig. 4, where a 16QAM constellation is depicted. Accordingly, we divide the real scalars and , into two groups and , where the entries in correspond to the real or imaginary part of the symbols that can exploit CI, i.e., both the real and imaginary part of the constellation point ‘D’, the real part of ‘B’ and the imaginary part of ‘C’, as shown in Fig. 4, while consists of the residual entries corresponding to the symbols that cannot benefit from CI. We can then obtain
Subsequently, the CI condition for QAM constellation points can be expressed as
and the corresponding 1-bit precoding problem that exploits CI can be formulated as
Remark: is a non-convex optimization problem. More importantly, due to the fact that the equality constraints and the 1-bit constraints cannot both be satisfied at the same time in general, we note that the original optimization problem for QAM constellations is an infeasible problem in nature, as opposed to formulated for PSK which is always feasible. This infeasibility also makes the P-BB algorithm designed for PSK signaling not directly applicable.
The error rate performance for ‘CI 1-Bit’ following this relaxation-normalization procedure serves as an upper bound of the proposed 1-bit precoding methods based on P-BB and OPSU introduced in the following. For notational convenience, we denote the obtained quantized signal vector based on this conventional CI approach as .
Iv-B Analytical Study of 1-Bit CI Precoding for QAM
In this section, we show that the results revealed in Proposition 1 for PSK signaling directly extend to QAM signaling, while the problem formulations are different. To begin with, we expand (28) into its real representation, given by
where . Following the steps in Section III-A, the relaxed optimization problem can be equivalently transformed into
Based on the formulation of , the following proposition is presented.
Proposition 2: Similar to the case of PSK, for obtained by solving , there are at least entries that already satisfy the 1-bit constraint.
Iv-C 1-Bit Precoding Design via Partial Branch-and-Bound
In this section, we propose the 1-bit precoding scheme via P-BB for QAM modulations. Before we proceed, we note that due to the infeasibility of in nature as discussed in Remark, the P-BB algorithm designed for PSK constellations cannot be directly extended to the case of QAM modulations, since the sub-problem included in the BB process will be infeasible when the number of entries in that are to be optimized is smaller than . To circumvent this issue, when we design the P-BB algorithm for QAM signaling after obtaining , we consider the MSE criterion as the objective function instead, which is defined as
Based on the expression for MSE as shown in (39), we note another distinct feature when QAM signaling is considered: Compared to PSK signaling in which case the objective function only includes , the objective function for QAM signaling also includes the precoding factor , which is a function of the transmit signal vector that is to be optimized. The non-linear relationship between and , as observed in (37), makes the direct minimization on MSE difficult to solve. Nevertheless, noting that and are uncoupled in the expression for MSE, the alternating optimization framework can be adopted as an effective method . To be more specific, the alternating optimization selects as the starting point, and iteratively update and until convergence, where the update for follows (37), and the updated is obtained by minimizing the MSE in (39), given by
where we note that the term is constant when is fixed, which is therefore omitted.
For clarity, we first summarize the main steps of the alternating optimization framework in Algorithm 3 before proceeding, where is a pre-defined threshold for convergence.
In the following, we briefly describe the P-BB process within the alternating optimization framework for QAM signaling, which generally follows the P-BB process for PSK in Section III-A. The major difference lies in the formulated optimization problems for obtaining and the corresponding calculation of the lower bounds and upper bounds, since the criterion has switched to MSE minimization.
The initial upper bound can be obtained based on the expression for MSE in (39) as
where we note that is fixed in this BB process due to the alternating optimization approach.
Similar to the case for PSK, we rearrange obtained from solving into such that can be decomposed as in (17), where and are similarly defined. To proceed, we select an entry in and allocate its value following the adaptive subdivision rule. The resulting optimization problem on that minimizes the MSE can then be constructed as
where denotes with the corresponding column rearrangement. By decomposing into , the objective function of can be simplified into
where we introduce that is fixed within the P-BB process for a given . To obtain the lower bound, we relax the 1-bit constraint in to arrive at a convex least-squares (LS) problem as