# CI-Based One-Bit Precoding for Multiuser Downlink Massive MIMO Systems with PSK Modulation: A Negative ℓ_1 Penalty Approach

In this paper, we consider the one-bit precoding problem for the multiuser downlink massive multiple-input multiple-output (MIMO) system with phase shift keying (PSK) modulation and focus on the celebrated constructive interference (CI)-based problem formulation. We first establish the NP-hardness of the problem (even in the single-user case), which reveals the intrinsic difficulty of globally solving the problem. Then, we propose a novel negative ℓ_1 penalty model for the considered problem, which penalizes the one-bit constraint into the objective with a negative ℓ_1-norm term, and show the equivalence between (global and local) solutions of the original problem and the penalty problem when the penalty parameter is sufficiently large. We further transform the penalty model into an equivalent min-max problem and propose an efficient alternating optimization (AO) algorithm for solving it. The AO algorithm enjoys low per-iteration complexity and is guaranteed to converge to the stationary point of the min-max problem. To further reduce the computational cost, we also propose a low-complexity implementation of the AO algorithm, where the values of the variables will be fixed in later iterations once they satisfy the one-bit constraint. Numerical results show that, compared against the state-of-the-art CI-based algorithms, both of the proposed algorithms generally achieve better bit-error-rate (BER) performance with lower computational cost, especially when the problem is difficult (e.g., high-order modulations, large number of antennas, or high user-antenna ratio).

There are no comments yet.

## Authors

• 2 publications
• 51 publications
• 28 publications
• 8 publications
10/10/2021

### A Novel Negative ℓ_1 Penalty Approach for Multiuser One-Bit Massive MIMO Downlink with PSK Signaling

This paper considers the one-bit precoding problem for the multiuser dow...
10/29/2018

### 1-Bit Massive MIMO Downlink Based on Constructive Interference

In this paper, we focus on the multiuser massive multiple-input single-o...
08/10/2019

### Interference Exploitation 1-Bit Massive MIMO Precoding: A Partial Branch-and-Bound Solution with Near-Optimal Performance

In this paper, we focus on 1-bit precoding approaches for downlink massi...
02/13/2020

### Near-Optimal Interference Exploitation 1-Bit Massive MIMO Precoding via Partial Branch-and-Bound

In this paper, we focus on 1-bit precoding for large-scale antenna syste...
10/07/2018

### A Framework for One-Bit and Constant-Envelope Precoding over Multiuser Massive MISO Channels

Consider the following problem: A multi-antenna base station (BS) sends ...
04/15/2022

### TreeStep: Tree Search for Vector Perturbation Precoding under per-Antenna Power Constraint

Vector Perturbation Precoding (VPP) can speed up downlink data transmiss...
06/01/2021

### Low-Complexity Symbol-Level Precoding for MU-MISO Downlink Systems with QAM Signals

This study proposes the construction of a transmit signal for large-scal...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

### I-a Related Works

Early works on downlink transmission with one-bit DACs have mainly focused on linear-quantized precoding schemes, in which the precoders are obtained by simply quantizing the classical linear precoders [12, 11, 13]. Despite the advantage of their low computational complexities, such linear precoders usually suffer from high symbol error floors, especially in the high SNR regime. To overcome the error floor issues, there have been emerging works on analyzing and designing nonlinear precoders for one-bit downlink transmission.

Nonlinear precoding schemes based on the minimum mean square error (MMSE) criterion have been considered in [13, 14, 15]. More specifically, the corresponding MMSE model has been formulated and several nonlinear MMSE precoders have been proposed in [13], including the semidefinite relaxation (SDR) precoder and the more computationally efficient squared-infinity norm Douglas-Rachford splitting (SQUID) precoder. To further reduce the computational cost of SQUID, two precoders named C1PO and C2PO that are based on biconvex relaxation have been proposed in [14]. In [15], the authors have designed an algorithm based on the alternating direction method of multipliers (ADMM) framework, which is guaranteed to converge under some mild conditions.

Note that the nonlinear precoding schemes shift the precoding design from traditional block-level to symbol-level. Through a symbol-by-symbol design, multiuser interference can be constructive to the useful signal power. Therefore, it is helpful to take the constructive interference (CI) into consideration when designing (symbol-level) precoders. The MMSE metric, however, views all interference as destructive, thus is sub-optimal for the nonlinear transmission schemes.

The concept of CI has been well studied for symbol-level precoding [24, 25, 26, 27]. Roughly speaking, the CI effect is measured by the distance from the noise-free received signal to the boundary of the decision region. Recently, the idea of CI has been incorporated into the one-bit precoding design[16, 17, 18, 19, 20]. More specifically, the CI model for one-bit precoding, which maximizes the CI effect subject to the one-bit constraint, has been formulated for the first time in [16]. Later, the authors in [17] have proposed an alternative CI-based model, called symbol scaling model, which admits a simpler formulation. In [28]

, the authors have shown that the previous two CI formulations are equivalent. There are also some works that directly consider the symbol error probability (SEP) criterion

[21, 22, 23]. In fact, the CI criterion is closely related to the SEP criterion. In particular, it has been shown in [22] that for PSK signaling, maximizing the CI effect can be seen as minimizing an upper-bound approximation of the SEP.

Various algorithms have also been proposed for solving the CI-based formulations, most of which are based on the symbol scaling model [17, 18, 19]. In [17]

, a low-complexity 3-stage heuristic algorithm has been proposed, which achieves acceptable performance in small-scale systems but suffers from an error floor in large-scale systems. To further improve the performance, two algorithms that are based on the linear programming (LP) relaxation of the symbol scaling model have been proposed in

[18]. More specifically, the authors have first proved that most entries of the solution of the LP relaxation already satisfy the one-bit constraint. Building upon this observation, they have proposed the partial branch-and-bound (P-BB) algorithm, where the BB procedure is performed only for the entries that do not comply with the one-bit requirement, thus greatly reduce the complexity compared to the full-BB algorithm [19]. A greedy approach named ordered partial sequential update (OPSU) has also been proposed in [18], where the values of elements that do not satisfy the one-bit constraint are determined sequentially according to a simple criterion. It has been shown that the OPSU precoder achieves significantly better performance than the maximum safety margin (MSM) precoder [16][17] obtained by directly quantizing the solution of the LP relaxation. In addition, the OPSU approach is much more computationally efficient than the P-BB algorithm with small performance loss.

In short summary, compared to the MMSE-based approaches, the CI-based approaches generally enjoy significantly better BER performance. However, their performance degrades in large-scale systems with high-order modulation (e.g., OPSU) or their computational costs are prohibitively high (e.g., P-BB). Please see Table I for a summary of models and/or algorithms for one-bit precoding design.

### I-B Our Contributions

This paper considers and focuses on the CI-based symbol scaling model for one-bit downlink transmission with PSK signaling[17]. The main contribution of this paper is an efficient negative penalty (NL1P) approach for solving the considered problem. Two key features of the proposed approach are as follows. First, our approach is based on a novel penalty model, which is shown to be equivalent to the original problem when the penalty parameter is sufficiently large. This is in sharp contrast to the LP relaxation model considered in the previous works [17, 18, 19]

. Second, the dominant cost of the proposed approach at each iteration is two matrix-vector multiplications and one projection onto the simplex, which makes it particularly suitable for solving large-scale one-bit precoding problems.

We summarize the contributions of the paper as follows.

1. Complexity Analysis: We characterize the complexity status of the considered one-bit precoding problem. Specifically, we show that the considered problem is NP-hard even in the single-user case and strongly NP-hard in the general case. The complexity results fill a theoretical gap, as the complexity status of the problem remains unknown (in spite of the existence of various heuristic approaches for solving the problem).

2. Novel Penalty Model: We propose a novel negative penalty model for the considered problem, in which the one-bit constraint is penalized into the objective with a negative -norm term. We show that when the penalty parameter is sufficiently large, the penalty model is an exact reformulation of the original problem, in the sense that the two problems share the same global and local solutions.

3. Efficient Algorithms: To solve the penalty model, we further transform it into an equivalent min-max problem. We propose an efficient alternating optimization (AO) algorithm for solving a class of non-smooth nonconvex-concave min-max problems (which includes our problem as a special case) and prove its convergence. We also propose a low-complexity implementation of the proposed AO algorithm when applied to solve our interested penalty problem. Simulation results show that both the proposed algorithm and its low-complexity implementation generally outperform the state-of-the-art CI-based algorithms in terms of both the BER performance and the computational efficiency.

### I-C Organization and Notations

The remaining parts of the paper are organized as follows. Section II introduces the system model and the CI-based symbol scaling model for one-bit precoding design. Section III establishes the complexity status of the considered problem. A framework of the proposed negative penalty approach is given in Section IV, after which an efficient algorithm for solving the penalty model is developed in Section V. Simulation results are shown in Section VI and the paper is concluded in Section VII.

Throughout this paper, we use and to represent the real and complex space, respectively. We use , , , and to denote scalar, column vector, matrix, and set, respectively. The symbols and are column vectors with all elements being and , respectively. For a vector , refers to its -th entry, where is also used if it does not cause any ambiguity; means that each element of is nonnegative (positive). For a matrix , refers to its -th element; returns the mean value of . For a set , refers to its interior; is the projection operator onto set represents the sign of a real number, which returns if the number is nonnegative and returns otherwise. denotes the norm of the corresponding matrix or vector, where . , , , and return the transpose, the real part, the imaginary part, and the modular of their corresponding argument, respectively. The subdifferential of a convex function is denoted by . refers to the domain of the function , i.e.,

represents the zero-mean circularly symmetric complex Gaussian distribution with covariance matrix

, where

denotes the identity matrix.

refers to the ball centered at with radius , i.e., . denotes the probability of the corresponding event. Finally, is the imaginary unit.

## Ii Problem Formulation

In this section, we present the problem formulation, including the system model and the CI-based symbol scaling model for one-bit precoding design.

### Ii-a System Model

As depicted in Fig. 1, we consider a downlink multiuser massive MIMO system, in which a BS equipped with antennas serves single-antenna users simultaneously, where . We assume that one-bit DACs are employed at the BS and ideal ADCs with infinite precision are employed at each receiver side. We also assume that the perfect CSI is available at the BS, as in [12, 11, 13, 14, 15, 16, 17, 18, 19, 20, 23, 21, 22]. The received signal vector can then be expressed as

 y=HxT+n,

where is the flat-fading channel matrix between the BS and the users, is the transmitted signal, and is the additive white Gaussian noise.

As one-bit DACs are adopted, each entry of , i.e., the output signal at each antenna element, can only be chosen from four symbols. Specifically, , where Here we normalize such that for simplicity. Let be the intended data symbol vector for the users whose entries are drawn from a unit-norm -PSK constellation, i.e., . In this paper, we restrict our attention to the nonlinear precoding scheme, in which the transmitted signal is designed on a symbol-by-symbol basis as a function of the channel matrix and the data symbol vector . At the receiver side, we assume that symbol-wise nearest-neighbor decoding is employed, that is, each user maps its received signal to the nearest constellation point .

Our goal is to design the transmitted signal such that the SEP, i.e., , is as low as possible. In this paper, we focus on the CI formulation of this optimization problem.

### Ii-B CI-Based Symbol Scaling Model for One-Bit Precoding

CI refers to interference that pushes the received signal away from all of their corresponding decision boundaries of the modulated-symbol constellation, which thus contributes to the useful signal power[27]. See [27, 24, 25, 26, 20] for more detailed discussions on CI. In this subsection, we introduce the mathematical formulation of the CI effect and the corresponding symbol scaling model proposed in [17].

For clarity, in Fig. 2 we depict a piece of the decision region for 8-PSK modulation, where without loss of generality we assume the data symbol for user is . We denote and as the unit vectors in the directions of the decision boundaries, which can be expressed as respectively.

The CI metric aims to maximize the distance from the noise-free received signal to the corresponding decision boundary. To formulate such distance mathematically, we decompose the noise-free received signal , which corresponds to in Fig. 2, along and , as

 ^yk=hTkxT=αAksAk+αBksBk.

As can be observed from Fig. 2, the length of and are and , respectively. Therefore, the distance from to the decision boundary can be expressed as

 min{|−−→YC|,|−−→YD|} =min{|−−→YA|sinθ,|−−→YB|sinθ} =min{αAk,αBk}sinθ,

where denotes the length of the corresponding vector. Since is known as long as the constellation level is given, the distance is only determined by .

Based on the above discussions, the CI effect for all users in the system can be characterized by the value of , which measures the minimum distance from all noise-free received signals to their corresponding decision boundaries. Accordingly, the one-bit precoding design problem that maximizes the CI effect can be formulated as

 maxxT mink∈{1,2,…,K} {αAk,αBk} s.t. hTkxT=αAksAk+αBksBk,k=1,2,…,K, (1) xT(i)∈{±1√2Nt±1√2Ntj},i=1,2,…,Nt.

By denoting and , we can further remove the problem-dependent quantity from the constraint on . With a little bit notational ambiguity, we still use and , then problem (1) can be rewritten as

 maxxT mink∈{1,2,…,K} {αAk,αBk} (P0)s.t. hTkxT=αAksAk+αBksBk,k=1,2,…,K, (2a) xT(i)∈{±1±j},i=1,2,…,Nt. (2b)

We refer to (P) as the CI-based symbol scaling model for one-bit precoding design.

## Iii Complexity Analysis

In spite of the existence of various works on problem (P), its complexity status remains unknown. In this section, we fill this theoretical gap, i.e., characterizing the complexity of problem (P).

We first consider the case where there is only a single user in the system.

###### Theorem 1.

The CI-based one-bit precoding problem (P) is NP-hard in the single-user case, i.e., .

###### Proof.

Notice that when , (P) reduces to the following problem:

 maxxT min {αA,αB} (3) s.t. hTxT=αAsA+αBsB, xT(i)∈{±1±j},i=1,2,…,Nt.

Next we shall build a polynomial-time transformation from the partition problem [29] to problem (3). The partition problem is to determine whether a given set of positive integers can be partitioned into two subsets such that the sum of elements in each subset is the same.

Now we construct an instance of problem (3) based on the given instance of the partition problem. Let the number of antennas at the BS be and the transmitted data symbol be , which is drawn from the -PSK constellation set. In this case, and . Moreover, set the channel vector to be with . With the above constructed parameters, problem (3) becomes

 maxxT min{αA,αB} (4) s.t. [αAαB]=[a−aaa][R(xT)I(xT)], xT(i)∈{±1±j}, i=1,2,…,N.

Let the optimal solution of problem (4) be Since , it is easy to argue that . By defining , it then follows that

 αA=2∑i∉Sai,αB=2∑i∈Sai.

Now, it is straightforward to argue that the optimal value of our constructed problem (4) is if and only if the partition problem has a “yes” answer. Finally, the above transformation can be done in polynomial time. Since the partition problem is NP-complete, we can conclude that problem (3) is NP-hard. ∎

In the following Theorem 2, we consider the more general case. The proof of Theorem 2 is provided in [30].

###### Theorem 2.

The CI-based one-bit precoding problem (P) is strongly NP-hard. Moreover, there is no polynomial-time constant approximation algorithm for (P), unless P  NP.

The above complexity results reveal that the (worst-case) computational complexity of globally solving (P) is exponential (if P  NP), which is prohibitively high for the massive MIMO system whose corresponding problem size is large. In addition, since the precoding scheme has been shifted from block-level to symbol-level, (P) must be solved at the symbol rate, which imposes high requirement on the efficiency of the corresponding algorithm. As such, instead of insisting on finding the optimal solution, we focus our attention on designing efficient algorithms for finding high-quality solutions of problem (P).

## Iv Proposed Negative ℓ1 Penalty Approach

In this section, we first introduce a compact form of problem (P), which is more favorable for the following algorithmic design. Then, we transform the compact form into a novel negative penalty model and give the algorithmic framework of the proposed negative penalty approach.

### Iv-a A Compact Form of (P0)

In this subsection, we briefly introduce a compact form of (P) proposed in [17]. Recall that and are both real numbers. Therefore, by rewriting the complex-valued constraints (2a) into the real-valued form, we can express explicitly as a function of , , and . Moreover, the original maximization problem can be converted into a minimization problem (by adding a negative sign in the objective). Then we arrive at the following compact form:

 minx maxl∈{1,2,…,2K}αl (5) s.t. Λ=Ax, xi∈{−1,1}, i=1,2,…,2Nt,

where and with

See [17] for detailed derivations.

The constraint in problem (5) can be further substituted into the objective, which leads to the following form:

 (P)minx∈{−1,1}nmaxl∈{1,2,…,m}aTlx,

where and is the -th row of . In the following, we shall design algorithms based on the compact form (P), which appears to be easier to handle than the form (P).

### Iv-B Proposed Negative ℓ1 Penalty Approach

One main difficulty of problem (P) lies in its discrete one-bit constraint. To deal with such difficulty, we resort to the penalty technique[31], which penalizes the constraint into the objective with some carefully selected penalty function. Specifically, the proposed approach relaxes the discrete one-bit constraint into the continuous constraint , and includes a negative penalty into the objective as

 (Pλ)minx∈[−1,1]nmaxl∈{1,2,…,m} aTlx−λ∥x∥1,

where is the penalty parameter. Intuitively, the negative penalty term in (P) encourages large magnitudes of

Next, we establish the connection between the original problem (P) and the penalty model (P). In particular, Theorem 3 establishes the equivalence between global solutions of the two problems and Theorem 4 characterizes the relationship between local minimizers of problem (P) and feasible points of problem (P). In fact, the following theorems also hold for any Lipschitz continuous function with , where is the Lipschitz constant of the corresponding function.

###### Theorem 3.

If , any optimal solution of (P) is also an optimal solution of (P), and vice versa.

The proof of Theorem 3 is given in Appendix A. Theorem 3 shows that when the penalty parameter is sufficiently large, problems (P) and (P) share the same global solutions.

###### Theorem 4.

If , any local minimizer of (P) is a feasible point of (P). On the other hand, for such , any feasible point of (P) is also a local minimizer of (P).

The proof of Theorem 4 is provided in [30]. Theorem 4 establishes the relationship between local minimizers of (P) and feasible points of (P). In particular, the first part of Theorem 4 shows that for a sufficiently large penalty parameter , if a local minimizer of (P) is obtained, then it is also a feasible point of (P); the second part of Theorem 4 shows that with the same , all feasible points of (P) are also local minimizers of (P) and thus problem (P) have exponentially many local minimizers in this case. Our goal here is to find a good local minimizer of problem (P) with a sufficiently large which is thus a high-quality solution of (P).

To achieve this goal, we employ the homotopy (sometimes called warm-start) technique [32][33], which turns out to be very helpful in guiding the corresponding (iterative) algorithm to find a high-quality solution in practice. More specifically, the proposed approach solves problem (P) with a small penalty parameter at the beginning, then gradually increases the penalty parameter and traces the solution path of the corresponding penalty problems, until the penalty parameter is sufficiently large and a feasible point of problem (P) is obtained. We name the above procedure for solving problem (P) as the negative penalty (NL1P) approach and give the algorithmic framework as follows.

### Iv-C Remarks on Proposed NL1P Approach

In this subsection, we give some discussions on the relationship between the proposed NL1P approach and existing algorithms for solving problem (P).

#### Iv-C1 Comparison with LP Relaxation Based Approaches

Most of the existing approaches (e.g., MSM[16][17], OPSU[18], P-BB[18]) are based on the LP relaxation model, which corresponds to problem (P) with . Generally speaking, this kind of approaches consists of two stages: in the first stage, the LP relaxation model is solved; in the second stage, some optimization or greedy techniques are utilized to determine the values of elements of the LP solution that do not satisfy the one-bit constraint. A key difference between the proposed approach and the LP relaxation based approaches lies in that the proposed approach seeks to solve the negative penalty model, which is an equivalent reformulation of the original problem, while the LP relaxation model solved in the existing approaches (e.g., MSM[16][17], OPSU[18], P-BB[18]) is generally not equivalent to the original problem. This explains why our approach usually returns better solutions than the LP relaxation based approaches, as observed in the simulation.

#### Iv-C2 Comparison with the Work in [23]

It is interesting to note that, though with different motivations, problem (P) is in the same form as the problem considered in [23], where one-bit precoding design for QAM modulation based on the SEP metric is studied. In contrast to our proposed approach that deals with the non-smooth objective, the authors in [23] developed a penalty method based on a smooth approximation. Specifically, they applied the log-sum-exponential approximation to the maximum function and added a negative square penalty term, i.e., , to the objective. In order to obtain a tight approximation, the smoothing parameter should be chosen as small as possible, while a small smoothing parameter will result in a large Lipschitz constant of the gradient of the objective, which further leads to slow convergence. Therefore, the choice of the smoothing parameter is a key factor that affects the performance of their algorithm. In contrast, our proposed approach deals with the non-smooth objective directly and does not involve any smooth approximation, and thus avoid the dilemma of the choice of the smoothing parameter. Nevertheless, the resulting non-smooth penalty model (P) seems more challenging to solve than the smooth penalty model in [23]. In the next section, we shall propose an efficient algorithm for solving problem (P) by taking care of its special structure.

#### Iv-C3 Why not Negative Square Penalty

One may ask why we do not add the negative square penalty to the objective as in [23], whereby the resulting model is

 (6)

Next we show that (6) is not a good penalty model for (P). Specifically, for any , local minimizers of (6) are not necessarily feasible points of (P). We give an example as follows.

###### Example 1.

Consider the following problem:

 minx maxl∈{1,2,3,4} aTlx (7) s.t. xi∈{−1,1},i=1,2,

where

 A=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝aT1aT2aT3aT4⎞⎟ ⎟ ⎟ ⎟ ⎟⎠=⎛⎜ ⎜ ⎜⎝1−1−1111−1−1⎞⎟ ⎟ ⎟⎠.

The corresponding negative square penalty problem is

 minx maxl∈{1,2,3,4} aTlx−λ∥x∥22 (8) s.t. −1≤xi≤1,i=1,2.

We claim that for any , is a local minimizer of (8) (but not a feasible point of (7)). Note that (8) is equivalent to

 minx ∥x∥1−λ∥x∥22 s.t. −1≤xi≤1,i=1,2.

Given any , then for all , it holds that

 ∥x∥1−λ∥x∥22≥∥x∥2−λ∥x∥22=∥x∥2(1−λ∥x∥2)≥0,

which implies that is a local minimizer of (8).

The main reason for the failure of the negative square penalty lies in that a smooth penalty is utilized in the problem where the objective is non-smooth. This also explains why we choose the non-smooth negative penalty for problem (P).

## V An Efficient Alternating Optimization Algorithm for solving problem (Pλ)

In this section, we propose an efficient algorithm for solving the non-smooth non-convex subproblem (P) in the NL1P approach. More specifically, we first transform problem (P) into an equivalent min-max problem () in Section V-A. Then we propose an efficient alternating optimization (AO) algorithm for solving a class of non-smooth min-max problems (which includes our problem () as a special case) and give the convergence analysis in Section V-B and Section V-C, respectively. In Section V-D, we apply the proposed AO algorithm to solve problem () and give some discussions.

### V-a Min-Max Reformulation of (Pλ)

In this subsection, we reformulate problem (P) into an equivalent min-max problem. Recall that the objective in (P) is the maximum of a finite collection of functions. By introducing an auxiliary variable

 (9)

(P) can be equivalently transformed into the following min-max problem:

 (ˆPλ)minx∈[−1,1]n \definecolor[named]pgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0maxy∈Δ yTAx−λ∥x∥1.

The two problems (P) and () are equivalent in the sense that an optimal solution (stationary point) of one problem can be easily constructed given an optimal solution (stationary point) of the other problem [34].

Below we shall focus on designing an efficient algorithm for solving the reformulated min-max problem (). In the next subsection, we shall develop an algorithm for solving a class of non-smooth nonconvex-concave min-max problems, which includes () as a special case.

### V-B Proposed AO Algorithm

Min-max problems have drawn considerable interest (especially in machine learning and signal processing communities) in recent years. Various algorithms have been proposed for different types of min-max problems

[35, 36, 37, 38, 34, 39]. However, previous works mainly consider the smooth case[36, 35, 37, 38]. Few works that focus on non-smooth min-max problems all require the non-smooth term to be convex[34][39]. To the best of our knowledge, there is no existing works that cover our interested problem (), and thus no existing algorithms can be directly applied to solve problem ().

In this subsection, we consider a class of non-smooth nonconvex-concave min-max problems

 minx∈Xmaxy∈Y F(x,y)≜f(x,y)−g(x), (10)

where is a smooth function that is non-convex with respect to and concave with respect to , is a non-smooth, proper closed convex function, and and are compact convex sets in and , respectively. Problem (10) includes problem () as a special case. To be specific, and correspond to the linear term and the norm , respectively; and correspond to and the simplex set in (9), respectively.

Our proposed algorithm for solving problem (10) can be regarded as an extension of the algorithms proposed in [34] and [35] from the smooth case to the non-smooth case, which is independently interesting. In [34] and [35], the authors proposed unified frameworks for solving a few different classes of min-max problems including the smooth nonconvex-concave ones, which is a special case of (10) with .

Similar to [34] and [35], a perturbed function of the original objective:

 ~F(x,y)=F(x,y)−ck2∥y∥2=f(x,y)−g(x)−ck2∥y∥2

is considered, where the perturbed term is introduced to make strongly concave in . It is shown in [34] and [35] that the perturbed term is important for the convergence of the corresponding algorithms.

At each iteration, the proposed algorithm updates and alternately as follows:

 xk+1 ∈argminx∈X ~F(xk,yk)+⟨∇xf(xk,yk),x−xk⟩−g(x) +τk2∥x−xk∥2, (11a) yk+1 =argmaxy∈Y ~F(xk+1,yk)+⟨∇y~F(xk+1,yk),y−yk⟩ −12ρk∥y−yk∥2 =ProjY(yk+ρk∇yf(xk+1,yk)−ρkckyk), (11b)

where and are the properly selected regularization parameters. Generally, the solution to the -subproblem might not be unique, and in this case we only need to choose one from the solution set. Since the above algorithm updates variables and in an alternating fashion, we name it as the alternating optimization (AO) algorithm and summarize it as Algorithm 2.

Some remarks on the proposed AO algorithm and parameters in it are as follows. For the -subproblem (11a), the sum of the first three terms is a local (linear) approximation of . To make the approximation rather accurate, we need the next iterate to be not far from the current one , and thus a regularization term is included. This idea is the same as that in the gradient projection and proximal point algorithms. Similarly, is updated via a classical gradient projection step of the perturbed function. Note that the parameters and in (11) trade off between the goal of minimizing the local approximation of the corresponding functions and the goal of making the approximation accurate, and controls the accuracy and strong concavity of the perturbed function. Properly selecting those parameters plays a vital role in guaranteeing convergence and good performance of the proposed algorithm.

The efficiency of the proposed AO algorithm depends on the efficiency of solving the subproblems in (11). The -subproblem (11a) is a non-smooth non-convex problem, which generally does not admit a closed-form solution. However, for many cases of our interest, the -subproblem (11a) either has a closed-form solution or can be efficiently solved to high accuracy. For instance, if is a Cartesian product of simple compact convex sets, i.e., , and is simple and separable in , i.e., , then the exact solution can be obtained by solving simple one-dimensional problems. Fortunately, the interested problem () is such a case and we shall give a detailed discussion on this in Section V-D later on. The -subproblem (11b) is a projection problem onto set and can be efficiently solved for many cases of such as the simplex set in (9).

### V-C Convergence Analysis

In this subsection, we establish the global convergence of the proposed AO algorithm. Before doing this, we give the following definition of the stationary point, which is a generalization of [37, Definition 3.1] from the smooth case to the non-smooth case.

###### Definition 1 .

A pair is called a stationary point of problem (10) if

 {0∈∇xf(^x,^y)−∂g(^x)+∂IX(^x);0∈−∇yf(^x,^y)+∂IY(^y),

where and are the indicator functions of and , respectively.

To establish the convergence, we need to impose the following assumptions on and in problem (10).

###### Assumption 1.

The function is continuously differentiable and there exist constants , and such that for and , we have

 ∥∇xf(x1,y)−∇xf(x2,y)∥2 ≤Lx∥x1−x2∥2, ∥∇xf(x,y1)−∇xf(x,y2)∥2 ≤L21∥y1−y2∥2, ∥∇yf(x,y1)−∇yf(x,y2)∥2 ≤Ly∥y1−y2∥2, ∥∇yf(x1,y)−∇yf(x2,y)∥2 ≤L12∥x1−x2∥2.
###### Assumption 2.

The function is Lipschitz continuous on with constant , i.e.,

 |g(x1)−g(x2)|≤G∥x1−x2∥2,∀x1,x2∈X.

With the above definition and assumptions, we are ready to present the convergence result of the proposed AO algorithm. The proof of the following theorem can be found in [30].

###### Theorem 5.

Suppose that Assumptions 1 and 2 hold. Let be the sequence generated by Algorithm 2 with . If , with , , and with , , then any limit point of is a stationary point of problem (10).

### V-D AO Algorithm for Solving (ˆPλ)

In this subsection, we specialize the proposed AO algorithm to problem () and carefully investigate its behaviors on this special problem, including implementation details (see Algorithm 3) and convergence results. We also propose a low-complexity implementation of Algorithm 3 to further reduce the computational cost.

#### V-D1 Implementation Details

Specializing Algorithm 2 to problem (), the subproblems of and become

 xk+1∈argminx∈[−1,1]nyTkAx−λ∥x∥1+τk2∥x−xk∥22 (12)

and

 yk+1=ProjΔ(yk+ρkAxk+1−ρkckyk). (13)

The -subproblem (12) is separable and has a closed-form solution. More specifically, by denoting , the subproblem (12) decouples into of problems in the following form:

 xk+1(i)∈arg min−1≤x≤1(ATiyk)x−λ|x| (14) + τk2(x−xk(i))2, i=1,2,…,n,

which admits a closed-form solution as

 xk+1(i)=sgn(aik)min{|aik|+λτk,1}, i=1,2,…,n, (15)

where . A detailed derivation of (15) is given in Appendix B. Note that when the solution of (14) is not unique and we only need to choose one from the solution set. Here we choose , and thus the solution of (14) can be expressed in a unified way as (15). The solution of -subproblem (13) involves only simple matrix and vector operations and a projection onto the simplex, which has a very fast implementation [40].

In total, the dominant complexity at each iteration lies in calculating and , which requires real-number multiplications, and computing one projection onto the simplex of dimension whose computational complexity is . Therefore, the AO algorithm enables us to solve problem () in a computationally efficient manner, especially when the dimension of the problem is large. We summarize the specialization of the AO algorithm for solving problem () as Algorithm 3.

#### V-D2 Convergence Behavior

According to Theorem 5, the AO algorithm (with properly chosen parameters) is guaranteed to find a stationary point of problem (), whose corresponding -part is also a stationary point of problem (P) due to the equivalence between problems (P) and (). The remaining question is whether the obtained stationary point satisfies the one-bit constraint. Next we give an affirmative answer to this question.

We first characterize the stationary points of (P).

###### Theorem 6.

If all stationary points of (P) must satisfy

 ^xi∈{−1,1,0}, i=1,2,…,n.

The proof of Theorem 6 is provided in Appendix C. Theorems 5 and 6 suggest that every limit point of the sequence generated by Algorithm 3 must have all of its elements being either or . Obviously zero elements here do not satisfy the one-bit constraint in problem (P) and thus are undesirable. Fortunately, the following Corollary 1 shows that zero elements will not happen in Algorithm 3. Note that for problem (), . The following corollary is a combination of results in Theorem 5, Theorem 6, and the closed-form solution (15). The detailed proof is relegated to Appendix D.

###### Corollary 1.

Let be the sequence generated by Algorithm 3 with , and , where , , , , and . Then if every limit point of must satisfy the one-bit constraint.

In summary, when the penalty parameter in problem (P) is sufficiently large, every limit point of the sequence generated by Algorithm 3 (with properly selected parameters) is not only a stationary point of (P) but also a feasible point of (P) and thus a local minimizer (according to Theorem 4) of problem (P). This nice property is a result of the combination of nice properties of problem (P) and Algorithm 3.

#### V-D3 Remarks on AO Algorithm for Solving Problem (Pλ)

Recall that the core in our proposed NL1P approach is the AO algorithm for solving a sequence of penalty problems (P) (equivalent to ()) with gradually increasing , while the core in the LP relaxation based approaches (MSM[16][17], OPSU[18], P-BB[18]) is the interior-point algorithm for solving the LP relaxation model, followed by some rounding procedure. Compared to the interior-point algorithm for solving the LP relaxation, our proposed AO algorithm has the following advantages.

First, the AO algorithm can be performed efficiently when solving problem (P), where at each iteration only two matrix-vector multiplications and one projection onto the simplex need to be computed. In contrast, the per-iteration complexity of the interior-point algorithm is . When is large (which is the case for the massive MIMO system), such computational complexity is unacceptable for practical implementation. Therefore, our proposed algorithm is more suitable for solving large-scale problems arising from the massive MIMO scenario.

Second, our proposed AO algorithm enjoys nice theoretical properties. In particular, when the penalty parameter in problem (P) is sufficiently large and the parameters in the AO algorithm are properly selected, any limit point of the sequence generated by the AO algorithm is a local minimizer of (P) and more importantly, it satisfies the one-bit constraint.

#### V-D4 A Low-Complexity Implementation of Algorithm 3

To further reduce the computational cost, in this part we propose a low-complexity implementation of Algorithm 3. To be specific, we consider performing Algorithm 3 in a more aggressive manner by keeping the values of variables fixed in later iterations once they satisfy the one-bit constraint. For clarity, we summarize the above procedure in Algorithm 4.

If Algorithm 4 is employed to solve the subproblem (P) in Algorithm 1, then the number of elements in that need to be updated will gradually decrease as the algorithm proceeds. Therefore, replacing Algorithm 3 with Algorithm 4 to solve the subproblem (P) can accelerate the convergence of the NL1P approach. We name the corresponding algorithm as the accelerated negative penalty (ANL1P) approach. It will be shown in the simulation that ANL1P can achieve almost the same performance as NL1P with less CPU time.

## Vi Simulation Results

In this section, we present simulation results to demonstrate the performance of our proposed algorithms.

### Vi-a Simulation Setup and Choice of Parameters

We consider multiuser massive MIMO systems where the BS is equipped with hundreds of antennas. We assume standard Rayleigh fading channel, i.e., the channel matrix

is composed of independent and identically distributed Gaussian random variables with zero mean and unit variance. We set the length of the transmission block to be

and define the transmit SNR to be , where the unit transmit power is assumed. High-order PSK modulations, including -PSK and -PSK, are considered. All the results are obtained with Monte Carlo simulations of 1000 independent channel realizations. All the algorithms are implemented in MATLAB (Release 2018b) in OS X 10.14 on a MacBook Pro with a 2.4-GHz Intel Core i5 processor with access to 8 GB of RAM.

As in Section II, we use the triple to describe the considered system, where denotes the total number of users in the system, is the number of transmit antennas at the BS, and refers to the constellation level for PSK modulation.

The parameters used in our algorithms are as follows. In Algorithm 1, the initial point is chosen as ; the penalty parameter is initialized as and increased by a factor of at each iteration. In Algorithm 3 (Algorithm 4), we set the initial point of as and the other parameters as and . We terminate Algorithm 3 (Algorithm 4) for solving the subproblem in Algorithm 1 when its iteration number is more than or when the distance of its successive iterates is less than . Recall that we can always obtain an intermediate point at each iteration of Algorithm 1. In our implementation, we quantize all intermediate points generated by Algorithm 1 to satisfy the one-bit constraint and choose the quantized point with the best function value as the final solution.

We compare the proposed NL1P and ANL1P approaches with existing state-of-the-art linear and nonlinear precoding approaches listed in Table I. We also include the unquantized ZF precoder, termed as Inf-Bit ZF, which serves as the BER lower bound of the one-bit precoding approaches.

### Vi-B BER Performance

We first present the BER results for different multiuser massive MIMO systems.

In Fig. 3, a system with -PSK modulation is considered. It can be observed that linear precoding suffers a BER floor in the high SNR regime due to the coarse one-bit quantization, while all of the nonlinear approaches exhibit significantly better BER performance. Of the presented nonlinear precoding schemes, the CI-based methods generally perform better than the MMSE-based methods, among which the P-BB algorithm achieves the best BER performance. However, since a branch and bound process is included, the P-BB algorithm is computationally inefficient (especially when the number of users is large), as will be demonstrated in Section VI-C. As can be observed from Fig. 3, all the CI-based approaches achieve comparable BER performance in this system, with the two proposed algorithms showing a slightly better performance than the state-of-the-art OPSU precoder.

In Fig. 4 and Fig. 5, we investigate the more difficult cases, i.e., higher user-antenna ratio and higher-level modulation, respectively. More specifically, in Fig. 4 we present the BER result for a system with -PSK modulation and in Fig. 5 we consider a system as in Fig. 3 but with higher-order -PSK modulation. The P-BB approach is not included in Fig. 4 due to its prohibitively high complexity. Since the problem becomes more difficult in these two cases, it is not surprising to observe remarkable performance loss for all the precoding methods. In particular, only the CI-based OPSU, P-BB, and the two proposed approaches can achieve satisfactory BER performance, while all the other approaches suffer from severe error floors at relatively high SNRs. Moreover, compared to the OPSU approach, the performance gain of our proposed algorithms becomes more prominent in these two difficult cases. In particular, we can observe an SNR gain up to nearly 6dB and 2.5dB in Fig. 4 and Fig. 5 respectively when the BER is ; as the BER becomes lower, the performance gain in terms of the SNR also becomes larger.

In Fig. 6, we further depict the BER of the compared one-bit precoders versus the number of users, where the number of transmit antennas at the BS is fixed to be , the SNR is fixed to be , and -PSK modulation is adopted. Among all the presented precoders, the proposed NL1P approach achieves the best BER performance, followed by the proposed ANL1P approach, where only a slight performance loss is observed. Both of them exhibit significantly better performance than the other precoding schemes in the sense that with the same BER requirement, the two proposed precoders can serve much more users. For example, if we require the BER to be less than , then the NL1P precoder and the ANL1P precoder can serve nearly 40 and 38 users, respectively, while the state-of-the-art OPSU precoder can serve only 32 users, which demonstrates the superiority of our algorithms.

### Vi-C Computational Efficiency

Now we evaluate the computational efficiency of the compared algorithms by reporting their CPU time. Since linear and MMSE-based approaches fail to achieve satisfactory BER performance in many cases, we are mostly interested in the CPU time comparison of the CI-based methods in this subsection.

In Fig. 7 and Fig. 8, we compare the average CPU time (in seconds) of the algorithms versus different numbers of transmit antennas and different numbers of users, respectively. We can make the following observations from Figs. 7 and 8.