I Introduction
Massive multipleinput multipleoutput (MIMO), which deploys tens to hundreds of antennas at the base station (BS), is a key and effective technology for significantly improving the spectrum and energy efficiency of 5G and beyond wireless communication systems [2, 3, 4]. Classical fulldigital processing, however, is not suitable for the massive MIMO system, as the number of radiofrequency (RF) chains and analogtodigital converters (ADCs)/digitaltoanalog converters (DACs) needs to be scaled up by the number of antennas, which will lead to high hardware complexity and power consumption. The above issue is particularly acute when highresolution ADCs/DACs are adopted, since the power consumption of ADCs/DACs grows exponentially with the resolution number[5]. Effective ways of reducing the hardware cost and the power assumption in a practical massive MIMO system (without too much performance loss) include employing hybrid analog digital (AD) precoding [6, 7, 8] (in which a largeantenna array is driven by only a limited number of RF chains) and using lowresolution ADCs/DACs. In particular, the cheapest onebit ADCs/DACs have attracted a lot of recent research interests[9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]. In addition to their capability of reducing the resolution cost, the onebit ADCs/DACs are able to cut down energy consumption and hardware complexity associated with RF power amplifiers (PAs), because the constant envelop signal they transmit allows for employment of the most powerefficient and cheapest PAs. This paper focuses on the onebit precoding problem in the multiuser downlink massive MIMO system with phase shift keying (PSK) modulation and proposes efficient approaches for solving the problem.
Ia Related Works
Early works on downlink transmission with onebit DACs have mainly focused on linearquantized precoding schemes, in which the precoders are obtained by simply quantizing the classical linear precoders [12, 11, 13]. Despite the advantage of their low computational complexities, such linear precoders usually suffer from high symbol error floors, especially in the high SNR regime. To overcome the error floor issues, there have been emerging works on analyzing and designing nonlinear precoders for onebit downlink transmission.
Nonlinear precoding schemes based on the minimum mean square error (MMSE) criterion have been considered in [13, 14, 15]. More specifically, the corresponding MMSE model has been formulated and several nonlinear MMSE precoders have been proposed in [13], including the semidefinite relaxation (SDR) precoder and the more computationally efficient squaredinfinity norm DouglasRachford splitting (SQUID) precoder. To further reduce the computational cost of SQUID, two precoders named C1PO and C2PO that are based on biconvex relaxation have been proposed in [14]. In [15], the authors have designed an algorithm based on the alternating direction method of multipliers (ADMM) framework, which is guaranteed to converge under some mild conditions.
Note that the nonlinear precoding schemes shift the precoding design from traditional blocklevel to symbollevel. Through a symbolbysymbol design, multiuser interference can be constructive to the useful signal power. Therefore, it is helpful to take the constructive interference (CI) into consideration when designing (symbollevel) precoders. The MMSE metric, however, views all interference as destructive, thus is suboptimal for the nonlinear transmission schemes.
The concept of CI has been well studied for symbollevel precoding [24, 25, 26, 27]. Roughly speaking, the CI effect is measured by the distance from the noisefree received signal to the boundary of the decision region. Recently, the idea of CI has been incorporated into the onebit precoding design[16, 17, 18, 19, 20]. More specifically, the CI model for onebit precoding, which maximizes the CI effect subject to the onebit constraint, has been formulated for the first time in [16]. Later, the authors in [17] have proposed an alternative CIbased model, called symbol scaling model, which admits a simpler formulation. In [28]
, the authors have shown that the previous two CI formulations are equivalent. There are also some works that directly consider the symbol error probability (SEP) criterion
[21, 22, 23]. In fact, the CI criterion is closely related to the SEP criterion. In particular, it has been shown in [22] that for PSK signaling, maximizing the CI effect can be seen as minimizing an upperbound approximation of the SEP.Various algorithms have also been proposed for solving the CIbased formulations, most of which are based on the symbol scaling model [17, 18, 19]. In [17]
, a lowcomplexity 3stage heuristic algorithm has been proposed, which achieves acceptable performance in smallscale systems but suffers from an error floor in largescale systems. To further improve the performance, two algorithms that are based on the linear programming (LP) relaxation of the symbol scaling model have been proposed in
[18]. More specifically, the authors have first proved that most entries of the solution of the LP relaxation already satisfy the onebit constraint. Building upon this observation, they have proposed the partial branchandbound (PBB) algorithm, where the BB procedure is performed only for the entries that do not comply with the onebit requirement, thus greatly reduce the complexity compared to the fullBB algorithm [19]. A greedy approach named ordered partial sequential update (OPSU) has also been proposed in [18], where the values of elements that do not satisfy the onebit constraint are determined sequentially according to a simple criterion. It has been shown that the OPSU precoder achieves significantly better performance than the maximum safety margin (MSM) precoder [16][17] obtained by directly quantizing the solution of the LP relaxation. In addition, the OPSU approach is much more computationally efficient than the PBB algorithm with small performance loss.In short summary, compared to the MMSEbased approaches, the CIbased approaches generally enjoy significantly better BER performance. However, their performance degrades in largescale systems with highorder modulation (e.g., OPSU) or their computational costs are prohibitively high (e.g., PBB). Please see Table I for a summary of models and/or algorithms for onebit precoding design.
IB Our Contributions
This paper considers and focuses on the CIbased symbol scaling model for onebit downlink transmission with PSK signaling[17]. The main contribution of this paper is an efficient negative penalty (NL1P) approach for solving the considered problem. Two key features of the proposed approach are as follows. First, our approach is based on a novel penalty model, which is shown to be equivalent to the original problem when the penalty parameter is sufficiently large. This is in sharp contrast to the LP relaxation model considered in the previous works [17, 18, 19]
. Second, the dominant cost of the proposed approach at each iteration is two matrixvector multiplications and one projection onto the simplex, which makes it particularly suitable for solving largescale onebit precoding problems.
We summarize the contributions of the paper as follows.

Complexity Analysis: We characterize the complexity status of the considered onebit precoding problem. Specifically, we show that the considered problem is NPhard even in the singleuser case and strongly NPhard in the general case. The complexity results fill a theoretical gap, as the complexity status of the problem remains unknown (in spite of the existence of various heuristic approaches for solving the problem).

Novel Penalty Model: We propose a novel negative penalty model for the considered problem, in which the onebit constraint is penalized into the objective with a negative norm term. We show that when the penalty parameter is sufficiently large, the penalty model is an exact reformulation of the original problem, in the sense that the two problems share the same global and local solutions.

Efficient Algorithms: To solve the penalty model, we further transform it into an equivalent minmax problem. We propose an efficient alternating optimization (AO) algorithm for solving a class of nonsmooth nonconvexconcave minmax problems (which includes our problem as a special case) and prove its convergence. We also propose a lowcomplexity implementation of the proposed AO algorithm when applied to solve our interested penalty problem. Simulation results show that both the proposed algorithm and its lowcomplexity implementation generally outperform the stateoftheart CIbased algorithms in terms of both the BER performance and the computational efficiency.
Algorithm Design principle Optimization model and/or technique Complexity Error rate ZF [13] Blocklevel ZF Direct quantization Low Poor WF[13] Blocklevel MMSE C1PO [14] Biconvex relaxation and Moderate Symbollevel alternating minimization Good in easy cases C2PO [14] Biconvex relaxation and MMSE forwardbackward splitting SQUID [13] DouglasRachford splitting but failed in difficult cases MSM[16][17] LP relaxation and direct quantization Moderate, higher than OPSU[18] LP relaxation and greedy search MMSE methods Satisfactory in general but Symbollevel degraded in difficult cases PBB[18] LP relaxation and branchandbound Prohibitively high Satisfactory in large systems NL1P (this paper) CI Negative penalty and Moderate, higher than ANL1P (this paper) minmax optimization MMSE methods TABLE I: A summary of models and algorithms for onebit precoding design.
IC Organization and Notations
The remaining parts of the paper are organized as follows. Section II introduces the system model and the CIbased symbol scaling model for onebit precoding design. Section III establishes the complexity status of the considered problem. A framework of the proposed negative penalty approach is given in Section IV, after which an efficient algorithm for solving the penalty model is developed in Section V. Simulation results are shown in Section VI and the paper is concluded in Section VII.
Throughout this paper, we use and to represent the real and complex space, respectively. We use , , , and to denote scalar, column vector, matrix, and set, respectively. The symbols and are column vectors with all elements being and , respectively. For a vector , refers to its th entry, where is also used if it does not cause any ambiguity; means that each element of is nonnegative (positive). For a matrix , refers to its th element; returns the mean value of . For a set , refers to its interior; is the projection operator onto set represents the sign of a real number, which returns if the number is nonnegative and returns otherwise. denotes the norm of the corresponding matrix or vector, where . , , , and return the transpose, the real part, the imaginary part, and the modular of their corresponding argument, respectively. The subdifferential of a convex function is denoted by . refers to the domain of the function , i.e., .
represents the zeromean circularly symmetric complex Gaussian distribution with covariance matrix
, wheredenotes the identity matrix.
refers to the ball centered at with radius , i.e., . denotes the probability of the corresponding event. Finally, is the imaginary unit.Ii Problem Formulation
In this section, we present the problem formulation, including the system model and the CIbased symbol scaling model for onebit precoding design.
Iia System Model
As depicted in Fig. 1, we consider a downlink multiuser massive MIMO system, in which a BS equipped with antennas serves singleantenna users simultaneously, where . We assume that onebit DACs are employed at the BS and ideal ADCs with infinite precision are employed at each receiver side. We also assume that the perfect CSI is available at the BS, as in [12, 11, 13, 14, 15, 16, 17, 18, 19, 20, 23, 21, 22]. The received signal vector can then be expressed as
where is the flatfading channel matrix between the BS and the users, is the transmitted signal, and is the additive white Gaussian noise.
As onebit DACs are adopted, each entry of , i.e., the output signal at each antenna element, can only be chosen from four symbols. Specifically, , where Here we normalize such that for simplicity. Let be the intended data symbol vector for the users whose entries are drawn from a unitnorm PSK constellation, i.e., . In this paper, we restrict our attention to the nonlinear precoding scheme, in which the transmitted signal is designed on a symbolbysymbol basis as a function of the channel matrix and the data symbol vector . At the receiver side, we assume that symbolwise nearestneighbor decoding is employed, that is, each user maps its received signal to the nearest constellation point .
Our goal is to design the transmitted signal such that the SEP, i.e., , is as low as possible. In this paper, we focus on the CI formulation of this optimization problem.
IiB CIBased Symbol Scaling Model for OneBit Precoding
CI refers to interference that pushes the received signal away from all of their corresponding decision boundaries of the modulatedsymbol constellation, which thus contributes to the useful signal power[27]. See [27, 24, 25, 26, 20] for more detailed discussions on CI. In this subsection, we introduce the mathematical formulation of the CI effect and the corresponding symbol scaling model proposed in [17].
For clarity, in Fig. 2 we depict a piece of the decision region for 8PSK modulation, where without loss of generality we assume the data symbol for user is . We denote and as the unit vectors in the directions of the decision boundaries, which can be expressed as respectively.
The CI metric aims to maximize the distance from the noisefree received signal to the corresponding decision boundary. To formulate such distance mathematically, we decompose the noisefree received signal , which corresponds to in Fig. 2, along and , as
As can be observed from Fig. 2, the length of and are and , respectively. Therefore, the distance from to the decision boundary can be expressed as
where denotes the length of the corresponding vector. Since is known as long as the constellation level is given, the distance is only determined by .
Based on the above discussions, the CI effect for all users in the system can be characterized by the value of , which measures the minimum distance from all noisefree received signals to their corresponding decision boundaries. Accordingly, the onebit precoding design problem that maximizes the CI effect can be formulated as
s.t.  (1)  
By denoting and , we can further remove the problemdependent quantity from the constraint on . With a little bit notational ambiguity, we still use and , then problem (1) can be rewritten as
(2a)  
(2b) 
We refer to (P) as the CIbased symbol scaling model for onebit precoding design.
Iii Complexity Analysis
In spite of the existence of various works on problem (P), its complexity status remains unknown. In this section, we fill this theoretical gap, i.e., characterizing the complexity of problem (P).
We first consider the case where there is only a single user in the system.
Theorem 1.
The CIbased onebit precoding problem (P) is NPhard in the singleuser case, i.e., .
Proof.
Notice that when , (P) reduces to the following problem:
(3)  
Next we shall build a polynomialtime transformation from the partition problem [29] to problem (3). The partition problem is to determine whether a given set of positive integers can be partitioned into two subsets such that the sum of elements in each subset is the same.
Now we construct an instance of problem (3) based on the given instance of the partition problem. Let the number of antennas at the BS be and the transmitted data symbol be , which is drawn from the PSK constellation set. In this case, and . Moreover, set the channel vector to be with . With the above constructed parameters, problem (3) becomes
(4)  
s.t.  
Let the optimal solution of problem (4) be Since , it is easy to argue that . By defining , it then follows that
Now, it is straightforward to argue that the optimal value of our constructed problem (4) is if and only if the partition problem has a “yes” answer. Finally, the above transformation can be done in polynomial time. Since the partition problem is NPcomplete, we can conclude that problem (3) is NPhard. ∎
In the following Theorem 2, we consider the more general case. The proof of Theorem 2 is provided in [30].
Theorem 2.
The CIbased onebit precoding problem (P) is strongly NPhard. Moreover, there is no polynomialtime constant approximation algorithm for (P), unless P NP.
The above complexity results reveal that the (worstcase) computational complexity of globally solving (P) is exponential (if P NP), which is prohibitively high for the massive MIMO system whose corresponding problem size is large. In addition, since the precoding scheme has been shifted from blocklevel to symbollevel, (P) must be solved at the symbol rate, which imposes high requirement on the efficiency of the corresponding algorithm. As such, instead of insisting on finding the optimal solution, we focus our attention on designing efficient algorithms for finding highquality solutions of problem (P).
Iv Proposed Negative Penalty Approach
In this section, we first introduce a compact form of problem (P), which is more favorable for the following algorithmic design. Then, we transform the compact form into a novel negative penalty model and give the algorithmic framework of the proposed negative penalty approach.
Iva A Compact Form of (P)
In this subsection, we briefly introduce a compact form of (P) proposed in [17]. Recall that and are both real numbers. Therefore, by rewriting the complexvalued constraints (2a) into the realvalued form, we can express explicitly as a function of , , and . Moreover, the original maximization problem can be converted into a minimization problem (by adding a negative sign in the objective). Then we arrive at the following compact form:
(5)  
where and with
See [17] for detailed derivations.
The constraint in problem (5) can be further substituted into the objective, which leads to the following form:
where and is the th row of . In the following, we shall design algorithms based on the compact form (P), which appears to be easier to handle than the form (P).
IvB Proposed Negative Penalty Approach
One main difficulty of problem (P) lies in its discrete onebit constraint. To deal with such difficulty, we resort to the penalty technique[31], which penalizes the constraint into the objective with some carefully selected penalty function. Specifically, the proposed approach relaxes the discrete onebit constraint into the continuous constraint , and includes a negative penalty into the objective as
where is the penalty parameter. Intuitively, the negative penalty term in (P) encourages large magnitudes of
Next, we establish the connection between the original problem (P) and the penalty model (P). In particular, Theorem 3 establishes the equivalence between global solutions of the two problems and Theorem 4 characterizes the relationship between local minimizers of problem (P) and feasible points of problem (P). In fact, the following theorems also hold for any Lipschitz continuous function with , where is the Lipschitz constant of the corresponding function.
Theorem 3.
If , any optimal solution of (P) is also an optimal solution of (P), and vice versa.
The proof of Theorem 3 is given in Appendix A. Theorem 3 shows that when the penalty parameter is sufficiently large, problems (P) and (P) share the same global solutions.
Theorem 4.
If , any local minimizer of (P) is a feasible point of (P). On the other hand, for such , any feasible point of (P) is also a local minimizer of (P).
The proof of Theorem 4 is provided in [30]. Theorem 4 establishes the relationship between local minimizers of (P) and feasible points of (P). In particular, the first part of Theorem 4 shows that for a sufficiently large penalty parameter , if a local minimizer of (P) is obtained, then it is also a feasible point of (P); the second part of Theorem 4 shows that with the same , all feasible points of (P) are also local minimizers of (P) and thus problem (P) have exponentially many local minimizers in this case. Our goal here is to find a good local minimizer of problem (P) with a sufficiently large which is thus a highquality solution of (P).
To achieve this goal, we employ the homotopy (sometimes called warmstart) technique [32][33], which turns out to be very helpful in guiding the corresponding (iterative) algorithm to find a highquality solution in practice. More specifically, the proposed approach solves problem (P) with a small penalty parameter at the beginning, then gradually increases the penalty parameter and traces the solution path of the corresponding penalty problems, until the penalty parameter is sufficiently large and a feasible point of problem (P) is obtained. We name the above procedure for solving problem (P) as the negative penalty (NL1P) approach and give the algorithmic framework as follows.
IvC Remarks on Proposed NL1P Approach
In this subsection, we give some discussions on the relationship between the proposed NL1P approach and existing algorithms for solving problem (P).
IvC1 Comparison with LP Relaxation Based Approaches
Most of the existing approaches (e.g., MSM[16][17], OPSU[18], PBB[18]) are based on the LP relaxation model, which corresponds to problem (P) with . Generally speaking, this kind of approaches consists of two stages: in the first stage, the LP relaxation model is solved; in the second stage, some optimization or greedy techniques are utilized to determine the values of elements of the LP solution that do not satisfy the onebit constraint. A key difference between the proposed approach and the LP relaxation based approaches lies in that the proposed approach seeks to solve the negative penalty model, which is an equivalent reformulation of the original problem, while the LP relaxation model solved in the existing approaches (e.g., MSM[16][17], OPSU[18], PBB[18]) is generally not equivalent to the original problem. This explains why our approach usually returns better solutions than the LP relaxation based approaches, as observed in the simulation.
IvC2 Comparison with the Work in [23]
It is interesting to note that, though with different motivations, problem (P) is in the same form as the problem considered in [23], where onebit precoding design for QAM modulation based on the SEP metric is studied. In contrast to our proposed approach that deals with the nonsmooth objective, the authors in [23] developed a penalty method based on a smooth approximation. Specifically, they applied the logsumexponential approximation to the maximum function and added a negative square penalty term, i.e., , to the objective. In order to obtain a tight approximation, the smoothing parameter should be chosen as small as possible, while a small smoothing parameter will result in a large Lipschitz constant of the gradient of the objective, which further leads to slow convergence. Therefore, the choice of the smoothing parameter is a key factor that affects the performance of their algorithm. In contrast, our proposed approach deals with the nonsmooth objective directly and does not involve any smooth approximation, and thus avoid the dilemma of the choice of the smoothing parameter. Nevertheless, the resulting nonsmooth penalty model (P) seems more challenging to solve than the smooth penalty model in [23]. In the next section, we shall propose an efficient algorithm for solving problem (P) by taking care of its special structure.
IvC3 Why not Negative Square Penalty
One may ask why we do not add the negative square penalty to the objective as in [23], whereby the resulting model is
(6) 
Next we show that (6) is not a good penalty model for (P). Specifically, for any , local minimizers of (6) are not necessarily feasible points of (P). We give an example as follows.
Example 1.
Consider the following problem:
(7)  
s.t. 
where
The corresponding negative square penalty problem is
(8)  
s.t. 
The main reason for the failure of the negative square penalty lies in that a smooth penalty is utilized in the problem where the objective is nonsmooth. This also explains why we choose the nonsmooth negative penalty for problem (P).
V An Efficient Alternating Optimization Algorithm for solving problem (P)
In this section, we propose an efficient algorithm for solving the nonsmooth nonconvex subproblem (P) in the NL1P approach. More specifically, we first transform problem (P) into an equivalent minmax problem () in Section VA. Then we propose an efficient alternating optimization (AO) algorithm for solving a class of nonsmooth minmax problems (which includes our problem () as a special case) and give the convergence analysis in Section VB and Section VC, respectively. In Section VD, we apply the proposed AO algorithm to solve problem () and give some discussions.
Va MinMax Reformulation of (P)
In this subsection, we reformulate problem (P) into an equivalent minmax problem. Recall that the objective in (P) is the maximum of a finite collection of functions. By introducing an auxiliary variable
(9) 
(P) can be equivalently transformed into the following minmax problem:
The two problems (P) and () are equivalent in the sense that an optimal solution (stationary point) of one problem can be easily constructed given an optimal solution (stationary point) of the other problem [34].
Below we shall focus on designing an efficient algorithm for solving the reformulated minmax problem (). In the next subsection, we shall develop an algorithm for solving a class of nonsmooth nonconvexconcave minmax problems, which includes () as a special case.
VB Proposed AO Algorithm
Minmax problems have drawn considerable interest (especially in machine learning and signal processing communities) in recent years. Various algorithms have been proposed for different types of minmax problems
[35, 36, 37, 38, 34, 39]. However, previous works mainly consider the smooth case[36, 35, 37, 38]. Few works that focus on nonsmooth minmax problems all require the nonsmooth term to be convex[34][39]. To the best of our knowledge, there is no existing works that cover our interested problem (), and thus no existing algorithms can be directly applied to solve problem ().In this subsection, we consider a class of nonsmooth nonconvexconcave minmax problems
(10) 
where is a smooth function that is nonconvex with respect to and concave with respect to , is a nonsmooth, proper closed convex function, and and are compact convex sets in and , respectively. Problem (10) includes problem () as a special case. To be specific, and correspond to the linear term and the norm , respectively; and correspond to and the simplex set in (9), respectively.
Our proposed algorithm for solving problem (10) can be regarded as an extension of the algorithms proposed in [34] and [35] from the smooth case to the nonsmooth case, which is independently interesting. In [34] and [35], the authors proposed unified frameworks for solving a few different classes of minmax problems including the smooth nonconvexconcave ones, which is a special case of (10) with .
Similar to [34] and [35], a perturbed function of the original objective:
is considered, where the perturbed term is introduced to make strongly concave in . It is shown in [34] and [35] that the perturbed term is important for the convergence of the corresponding algorithms.
At each iteration, the proposed algorithm updates and alternately as follows:
(11a)  
(11b) 
where and are the properly selected regularization parameters. Generally, the solution to the subproblem might not be unique, and in this case we only need to choose one from the solution set. Since the above algorithm updates variables and in an alternating fashion, we name it as the alternating optimization (AO) algorithm and summarize it as Algorithm 2.
Some remarks on the proposed AO algorithm and parameters in it are as follows. For the subproblem (11a), the sum of the first three terms is a local (linear) approximation of . To make the approximation rather accurate, we need the next iterate to be not far from the current one , and thus a regularization term is included. This idea is the same as that in the gradient projection and proximal point algorithms. Similarly, is updated via a classical gradient projection step of the perturbed function. Note that the parameters and in (11) trade off between the goal of minimizing the local approximation of the corresponding functions and the goal of making the approximation accurate, and controls the accuracy and strong concavity of the perturbed function. Properly selecting those parameters plays a vital role in guaranteeing convergence and good performance of the proposed algorithm.
The efficiency of the proposed AO algorithm depends on the efficiency of solving the subproblems in (11). The subproblem (11a) is a nonsmooth nonconvex problem, which generally does not admit a closedform solution. However, for many cases of our interest, the subproblem (11a) either has a closedform solution or can be efficiently solved to high accuracy. For instance, if is a Cartesian product of simple compact convex sets, i.e., , and is simple and separable in , i.e., , then the exact solution can be obtained by solving simple onedimensional problems. Fortunately, the interested problem () is such a case and we shall give a detailed discussion on this in Section VD later on. The subproblem (11b) is a projection problem onto set and can be efficiently solved for many cases of such as the simplex set in (9).
VC Convergence Analysis
In this subsection, we establish the global convergence of the proposed AO algorithm. Before doing this, we give the following definition of the stationary point, which is a generalization of [37, Definition 3.1] from the smooth case to the nonsmooth case.
Definition 1 .
A pair is called a stationary point of problem (10) if
where and are the indicator functions of and , respectively.
To establish the convergence, we need to impose the following assumptions on and in problem (10).
Assumption 1.
The function is continuously differentiable and there exist constants , and such that for and , we have
Assumption 2.
The function is Lipschitz continuous on with constant , i.e.,
With the above definition and assumptions, we are ready to present the convergence result of the proposed AO algorithm. The proof of the following theorem can be found in [30].
VD AO Algorithm for Solving ()
In this subsection, we specialize the proposed AO algorithm to problem () and carefully investigate its behaviors on this special problem, including implementation details (see Algorithm 3) and convergence results. We also propose a lowcomplexity implementation of Algorithm 3 to further reduce the computational cost.
VD1 Implementation Details
Specializing Algorithm 2 to problem (), the subproblems of and become
(12) 
and
(13) 
The subproblem (12) is separable and has a closedform solution. More specifically, by denoting , the subproblem (12) decouples into of problems in the following form:
(14)  
which admits a closedform solution as
(15) 
where . A detailed derivation of (15) is given in Appendix B. Note that when the solution of (14) is not unique and we only need to choose one from the solution set. Here we choose , and thus the solution of (14) can be expressed in a unified way as (15). The solution of subproblem (13) involves only simple matrix and vector operations and a projection onto the simplex, which has a very fast implementation [40].
In total, the dominant complexity at each iteration lies in calculating and , which requires realnumber multiplications, and computing one projection onto the simplex of dimension whose computational complexity is . Therefore, the AO algorithm enables us to solve problem () in a computationally efficient manner, especially when the dimension of the problem is large. We summarize the specialization of the AO algorithm for solving problem () as Algorithm 3.
VD2 Convergence Behavior
According to Theorem 5, the AO algorithm (with properly chosen parameters) is guaranteed to find a stationary point of problem (), whose corresponding part is also a stationary point of problem (P) due to the equivalence between problems (P) and (). The remaining question is whether the obtained stationary point satisfies the onebit constraint. Next we give an affirmative answer to this question.
We first characterize the stationary points of (P).
Theorem 6.
If all stationary points of (P) must satisfy
The proof of Theorem 6 is provided in Appendix C. Theorems 5 and 6 suggest that every limit point of the sequence generated by Algorithm 3 must have all of its elements being either or . Obviously zero elements here do not satisfy the onebit constraint in problem (P) and thus are undesirable. Fortunately, the following Corollary 1 shows that zero elements will not happen in Algorithm 3. Note that for problem (), . The following corollary is a combination of results in Theorem 5, Theorem 6, and the closedform solution (15). The detailed proof is relegated to Appendix D.
Corollary 1.
Let be the sequence generated by Algorithm 3 with , and , where , , , , and . Then if every limit point of must satisfy the onebit constraint.
In summary, when the penalty parameter in problem (P) is sufficiently large, every limit point of the sequence generated by Algorithm 3 (with properly selected parameters) is not only a stationary point of (P) but also a feasible point of (P) and thus a local minimizer (according to Theorem 4) of problem (P). This nice property is a result of the combination of nice properties of problem (P) and Algorithm 3.
VD3 Remarks on AO Algorithm for Solving Problem (P)
Recall that the core in our proposed NL1P approach is the AO algorithm for solving a sequence of penalty problems (P) (equivalent to ()) with gradually increasing , while the core in the LP relaxation based approaches (MSM[16][17], OPSU[18], PBB[18]) is the interiorpoint algorithm for solving the LP relaxation model, followed by some rounding procedure. Compared to the interiorpoint algorithm for solving the LP relaxation, our proposed AO algorithm has the following advantages.
First, the AO algorithm can be performed efficiently when solving problem (P), where at each iteration only two matrixvector multiplications and one projection onto the simplex need to be computed. In contrast, the periteration complexity of the interiorpoint algorithm is . When is large (which is the case for the massive MIMO system), such computational complexity is unacceptable for practical implementation. Therefore, our proposed algorithm is more suitable for solving largescale problems arising from the massive MIMO scenario.
Second, our proposed AO algorithm enjoys nice theoretical properties. In particular, when the penalty parameter in problem (P) is sufficiently large and the parameters in the AO algorithm are properly selected, any limit point of the sequence generated by the AO algorithm is a local minimizer of (P) and more importantly, it satisfies the onebit constraint.
VD4 A LowComplexity Implementation of Algorithm 3
To further reduce the computational cost, in this part we propose a lowcomplexity implementation of Algorithm 3. To be specific, we consider performing Algorithm 3 in a more aggressive manner by keeping the values of variables fixed in later iterations once they satisfy the onebit constraint. For clarity, we summarize the above procedure in Algorithm 4.
If Algorithm 4 is employed to solve the subproblem (P) in Algorithm 1, then the number of elements in that need to be updated will gradually decrease as the algorithm proceeds. Therefore, replacing Algorithm 3 with Algorithm 4 to solve the subproblem (P) can accelerate the convergence of the NL1P approach. We name the corresponding algorithm as the accelerated negative penalty (ANL1P) approach. It will be shown in the simulation that ANL1P can achieve almost the same performance as NL1P with less CPU time.
Vi Simulation Results
In this section, we present simulation results to demonstrate the performance of our proposed algorithms.
Via Simulation Setup and Choice of Parameters
We consider multiuser massive MIMO systems where the BS is equipped with hundreds of antennas. We assume standard Rayleigh fading channel, i.e., the channel matrix
is composed of independent and identically distributed Gaussian random variables with zero mean and unit variance. We set the length of the transmission block to be
and define the transmit SNR to be , where the unit transmit power is assumed. Highorder PSK modulations, including PSK and PSK, are considered. All the results are obtained with Monte Carlo simulations of 1000 independent channel realizations. All the algorithms are implemented in MATLAB (Release 2018b) in OS X 10.14 on a MacBook Pro with a 2.4GHz Intel Core i5 processor with access to 8 GB of RAM.As in Section II, we use the triple to describe the considered system, where denotes the total number of users in the system, is the number of transmit antennas at the BS, and refers to the constellation level for PSK modulation.
The parameters used in our algorithms are as follows. In Algorithm 1, the initial point is chosen as ; the penalty parameter is initialized as and increased by a factor of at each iteration. In Algorithm 3 (Algorithm 4), we set the initial point of as and the other parameters as and . We terminate Algorithm 3 (Algorithm 4) for solving the subproblem in Algorithm 1 when its iteration number is more than or when the distance of its successive iterates is less than . Recall that we can always obtain an intermediate point at each iteration of Algorithm 1. In our implementation, we quantize all intermediate points generated by Algorithm 1 to satisfy the onebit constraint and choose the quantized point with the best function value as the final solution.
We compare the proposed NL1P and ANL1P approaches with existing stateoftheart linear and nonlinear precoding approaches listed in Table I. We also include the unquantized ZF precoder, termed as InfBit ZF, which serves as the BER lower bound of the onebit precoding approaches.
ViB BER Performance
We first present the BER results for different multiuser massive MIMO systems.
In Fig. 3, a system with PSK modulation is considered. It can be observed that linear precoding suffers a BER floor in the high SNR regime due to the coarse onebit quantization, while all of the nonlinear approaches exhibit significantly better BER performance. Of the presented nonlinear precoding schemes, the CIbased methods generally perform better than the MMSEbased methods, among which the PBB algorithm achieves the best BER performance. However, since a branch and bound process is included, the PBB algorithm is computationally inefficient (especially when the number of users is large), as will be demonstrated in Section VIC. As can be observed from Fig. 3, all the CIbased approaches achieve comparable BER performance in this system, with the two proposed algorithms showing a slightly better performance than the stateoftheart OPSU precoder.
In Fig. 4 and Fig. 5, we investigate the more difficult cases, i.e., higher userantenna ratio and higherlevel modulation, respectively. More specifically, in Fig. 4 we present the BER result for a system with PSK modulation and in Fig. 5 we consider a system as in Fig. 3 but with higherorder PSK modulation. The PBB approach is not included in Fig. 4 due to its prohibitively high complexity. Since the problem becomes more difficult in these two cases, it is not surprising to observe remarkable performance loss for all the precoding methods. In particular, only the CIbased OPSU, PBB, and the two proposed approaches can achieve satisfactory BER performance, while all the other approaches suffer from severe error floors at relatively high SNRs. Moreover, compared to the OPSU approach, the performance gain of our proposed algorithms becomes more prominent in these two difficult cases. In particular, we can observe an SNR gain up to nearly 6dB and 2.5dB in Fig. 4 and Fig. 5 respectively when the BER is ; as the BER becomes lower, the performance gain in terms of the SNR also becomes larger.
In Fig. 6, we further depict the BER of the compared onebit precoders versus the number of users, where the number of transmit antennas at the BS is fixed to be , the SNR is fixed to be , and PSK modulation is adopted. Among all the presented precoders, the proposed NL1P approach achieves the best BER performance, followed by the proposed ANL1P approach, where only a slight performance loss is observed. Both of them exhibit significantly better performance than the other precoding schemes in the sense that with the same BER requirement, the two proposed precoders can serve much more users. For example, if we require the BER to be less than , then the NL1P precoder and the ANL1P precoder can serve nearly 40 and 38 users, respectively, while the stateoftheart OPSU precoder can serve only 32 users, which demonstrates the superiority of our algorithms.
ViC Computational Efficiency
Now we evaluate the computational efficiency of the compared algorithms by reporting their CPU time. Since linear and MMSEbased approaches fail to achieve satisfactory BER performance in many cases, we are mostly interested in the CPU time comparison of the CIbased methods in this subsection.
Comments
There are no comments yet.