I Introduction
Multiuser multiantenna techniques (or multipleinput multipleoutput, MIMO) techniques can significantly improve the spectral and energy efficiency of wireless communications by exploiting the degree of freedom in the spatial domain. They have been widely adopted in modern wireless communications systems such as the fourth and the fifthgeneration (4G and 5G) of cellular networks
[1][2], the high efficiency wireless local area (WiFi) networks standard 802.11ax [3], and the latest satellite digital video broadcasting standard DVBS2X [4]. Among the multiuser MIMO techniques, beamforming is one of the most promising and practical schemes to mitigate multiuser interference and exploit the gain of MIMO antennas.In the last two decades, the optimal beamforming strategies have been intensively studied for the multipleinput singleoutput (MISO) downlink where a base station with multiple antennas serves multiple singleantenna users. For instance, the problem of signaltointerferenceplusnoise ratio (SINR) balancing or maximization of the minimum SINR of all users, under a total power constraint was studied in [5, 6], the total BS transmit power minimization problem under quality of service (QoS) constraints was investigated in [7, 8, 9, 10], and the sum rate maximization problem under the total power constraint was tackled in [6, 11, 12, 13]. The existing approaches mainly make use of the advances of convex optimization techniques such as secondorder cone programming (SOCP) [8, 9] and semidefinite programming (SDP) [14], and the uplinkdownlink duality which indicates that under the sum power constraint, the achievable SINR region and the normalized beamforming in the downlink are the same as those in the dual uplink channel.
Early works mostly focus on the optimal beamforming design under the sum power constraint across all antennas of a transmitter. This constraint does not take into account the fact that each transmit antenna has its own power amplifier, and therefore its power is individually limited. The perantenna power constraints were first systematically studied in [15] where a dual framework was proposed to minimize the maximum transmit power of each antenna under users’ SINR constraints. This work has sparked much research interest in optimizing beamforming under perantenna power constraints. The work in [16] studied the optimization of the nonlinear zero forcing (ZF) dirty paper coding based beamforming under perantenna power constraints. Generic optimization of beamforming for multibeam satellite systems was studied in [17] under general linear and nonlinear power constraints. The perantenna constant envelope precoding for large multiuser MIMO systems was investigated in [18]. The transceiver designs for multiantenna multihop cooperative communications under perantenna power constraints were proposed in [19] and both linear and nonlinear transceivers were investigated. The signaltoleakageplusnoise ratio (SLNR) maximized precoding for the downlink under perantenna power constraints was considered in [20] where a semiclosed form optimal solution was proposed. A general framework for covariance matrix optimization of MIMO systems under different types of power constraints was proposed in [21]. More recently, the optimal MIMO precoding under the constraints of both the total consumed power constraint and the individual radiated power constraints was studied in [22] and numerical algorithms were developed to maximize the mutual information.
The problem of interest in this paper is to efficiently maximize the minimum received SINR or to balance SINR, in the multiuser MISO downlink under perantenna power constraints at the BS. This problem, although being quasiconvex, is more challenging than the counterpart with the total power constraint and the problem of minimizing the perantenna power in [15], and until now there does not exist efficient algorithms. Consequently, existing beamforming techniques are unable to support realtime applications because the smallscale fading channel varies considerably fast. For instance, in a WLAN 802.11n system operating at 2.4 GHz with a pedestrian speed of m/s, the coherence time is ms; and in a LongTerm Evolution (LTE) downlink operating at 2.6 GHz with a residential area vehicle velocity of m/s, the coherence time is only ms. Traditional timeconsuming optimization routines will produce obsolete beamforming solution that is not timely for the current channel state and lead to significant performance degradation which will be demonstrated in our experiment. In [23], the dual problem was derived and the optimal solution at much reduced computational cost was developed. However, it was found out that the best solution is obtained by a commercial nonlinear solver [24]
, which does not explore the structure of the problem and is still not efficient. Although there are simple heuristic beamforming solutions which have closedform solutions such as the ZF beamforming and the regularized ZF (RZF) beamforming, the reduced complexity often leads to performance loss. Even worse, the work in
[25] showed that the conventional ZF beamforming under perantenna power constraints no longer admits a simple pseudoinverse form as the case under the total power constraint, and instead the optimal ZF beamforming requires solving an SOCP problem which has much higher complexity.In this paper, we take a different approach and develop deep learning (DL) enabled beamforming solutions to dramatically improve the computational efficiency. Recently DL has been recognized as a promising solution for addressing various problems in several areas of wireless networks. This is because deep neural networks have the ability to model highly nonlinear functions at considerably low complexity. One of the areas of interest is to deal with scenarios in which the channel model does not exist, e.g., in underwater and molecular communications [26] or is difficult to characterize analytically due to imperfections and nonlinearities [27]. In these situations, DL based detection has been proposed to tackle the underlying unknown nonlinearities [28]. Another area of interest is to optimize the endtoend system performance [29, 30]. Conventional communication systems are based on the modular design and each block (e.g., coding, modulation) is optimized independently, which can not guarantee the optimal overall performance. However, DL holds great promises for further improvement by considering endtoend performance optimization. The third area of interest is to overcome the complexity of wireless networks [27] which is the focus of our paper. In this aspect, DL has found many exciting applications in wireless communications such as channel decoding [31, 32], MIMO detection [33, 34]
, channel estimation
[35, 36]. The current work belongs to the framework of learning to optimize in wireless resource allocation. The rationale is that the DL technique bypasses the complex optimization procedures, and learns the optimal mapping from the channel state to produce the beamforming solution directly by training a neural network. The result is that the trained neural network can be used as a function mapping to obtain the realtime beamforming solution with channel state as input. As a result, the computational complexity is transferred to offline training phase^{1}^{1}1To the best of our knowledge, the computational complexity of the training phase is not well understood, due to the complex implementation of the backpropagation process and that it depends very much on the specific application regarding the required number of training examples for satisfactory generalization. That said, this is usually not a concern in most applications because training takes place offline given sufficient computational capability and retraining is only performed infrequently when the specific applications depart considerably from those training examples.
, and hence the complexity during the online transmission phase is greatly reduced. The mostly successful applications of DL in this framework by far is power allocation [37, 38, 39, 40, 41], in which the power vector is treated as the training output, while the channel gains are taken into the input of the DL network. In this case, the power variables only take positive values and the number of power variables is normally the number of users and therefore relatively small and easy to handle.
However, there are few works that focus on the learning approach to optimize the beamforming design in multiantenna communications, with the exception of [42, 43, 44, 45, 46, 47]. The difficulty is partly due to the large number of complex variables contained in the beamforming matrix that need to be optimized. An outagebased approach to transmit beamforming was studied in [42] to deal with the channel uncertainty at the BS, however, only a single user was considered. The work in [43] designed a decentralized robust precoding scheme based on a deep neural network (DNN). The projection over a finite dimensional subspace in [43] reduced the difficulty, but also limited the performance. A DL model was used in [44] to predict the beamforming matrix directly from the signals received at the distributed BSs in millimeter wave systems. However, both [43] and [44] predicted the beamforming matrix in the finite solution space at the cost of performance loss. The works in [42, 45] directly estimated the beamforming matrix without exploiting the problem structure in which the number of variables to predict increases significantly as the numbers of transmit antennas and users increase. This will lead to high training complexity and low learning accuracy of the neural networks when the numbers of transmit antennas and users are large. In our previous works [46][47], we proposed a beamforming neural network to optimize the beamforming vectors, but it is restricted to the total power constraint. We notice that none of existing works addressed the SINR balancing problem under the practical perantenna power constraints, for which DL solution becomes even more attractive.
In this paper, we propose a DL enabled beamforming optimization approach for SINR balancing to provide an improved performancecomplexity tradeoff under perantenna power constraints. Inspired by the model driven learning philosophy [48], we propose to first learn the dual variables with reduced dimension rather than the original large beamforming matrix and then recover the beamforming solution from the learned dual solution, by exploiting the structure or model of the beamforming optimization problem. Our main contributions are summarized as follows:

A subgradient algorithm is first proposed which not only demonstrates faster convergence than the best known algorithm in [23], but also facilitates the development of the DL solutions.

A general DL structure to learn the dual variables is proposed, and two learning strategies are proposed to achieve the performancecomplexity tradeoff. A heuristic method is developed to facilitate the generalization of the proposed DL algorithms by augmenting the training set so that they can adapt to the varying number of active users and antennas without retraining.

Both software simulations and testbed experiments using software defined radio (SDR) are carried out to validate the performance of the proposed algorithms. To the best of our knowledge, this is the first testbed demonstration of deep learning enabled multiuser beamforming.
The remainder of this paper is organized as follows. Section II introduces the system model and formulates the SINR balancing problem and its dual formulation. Section III proposes the subgradient algorithm. Section IV provides the general structure framework for the beamforming optimization based on learning the dual variables and the recovery algorithms. Numerical and experimental results are presented in Section V. Finally, conclusion is drawn in Section VI.
Notations: The notations are given as follows. Matrices and vectors are denoted by bold capital and lowercase symbols, respectively. , , and stand for transpose, conjugate, conjugate transpose and inverse/pseudo inverse (when applicable) operations of a matrix, respectively. indicates that the matrix is positive definite. The operator denotes the operation to diagonalize the vector into a matrix whose main diagonal elements are from . Finally, represents a complex Gaussian vector with zeromean and covariance matrix . denotes the nonnegative field.
Ii System Model and Problem Formulation
Consider an MISO downlink channel where an antenna BS transmits signals to singleantenna users. For the user , its channel vector, beamforming vector, and data symbol are denoted as , , respectively, where . The additive white Gaussian noise (AWGN) at the received is denoted as . All wireless links exhibit independent frequency nonselective Rayleigh block fading. The received signal at user is
(1) 
The SINR at the receiver of user is given by
(2) 
The beamforming matrix is collected in . Then the perantenna power at antenna can be expressed as
(3) 
where is a zero vector except its th element being 1.
The problem of interest is to maximize the minimum user SINR, i.e., SINR balancing, under perantenna power constraints . Mathematically, it can be formulated as follows:
P1:  
s.t.  (5)  
The SINR balancing problem is in general quasiconvex, so it can be solved via methods such as bisection search and generalized eigenvalue programming
[8][23]. However, these methods suffer from high complexity and computational delay, and are not practical for realtime data transmissions.In [23], a useful dual formulation of P1 is derived as
P2:  
s.t.  (7)  
where , are dual variables associated with the SINR constraint (5) and the perantenna power constraint (5) in P1, and is related to the minimum SINR in P1 by the relation . For the solution of P2, it is assumed that .
Although the problem P2 is still a quasiconvex problem, compared to the original problem P1, it can be more efficiently solved because it only involves nonnegative variables while P1 needs to optimize real variables. The problem P2 can be solved using standard nonlinear solvers such as Matlab’s builtin function ‘fmincon’. Currently the fastest optimal solution is known to be achieved by Ziena’s nonlinear solver Knitro [24], which is compared and shown in [23]. However, the general solvers do not exploit the special analytical properties of the problem P2, so they are not efficient. In addition, it is not known how to recover the optimal solution to the beamforming matrix once P2 is solved. These issues will be studied in the next section.
Iii A Subgradient Algorithm to Solve P2
In this section, we derive a fast subgradient algorithm to solve P2, based on the downlinkuplink duality results derived in [15]. According to [15, Theorem 1], the problem P2 can be equivalently written as the following maxmax problem:
P3:  
s.t.  (8)  
P3 can be interpreted as the maximization of the minimum user SINR in the virtual uplink in which singleantenna users transmit signals to the BS with the total power constraint . The uncertain covariance matrix of the received noise vector is characterized by . The normalized receive beamforming at the BS for user is denoted by which has the same direction as the downlink transmit beamforming, while denotes the uplink transmit power of user . Because the covariance matrix of the received noise vector is also a variable, P3 is still difficult to solve. To tackle this problem, we first keep the variable fixed, and then reach the subproblem below:
P4:  
s.t.  (11)  
P4 can be interpreted as the nonlinear SINR balancing problem with a total power constraint and colored noise with covariance matrix . In the following, we propose an efficient fixedpoint iteration in Algorithm 1 below to solve P4.
Algorithm 1 to Solve P4:

Initialize that satisfies . Suppose is the iteration index, and the achievable SINR in the uplink is . Repeat the following steps 2)5) until convergence.

For each , define .

Solve an auxiliary variable as
(12) 
Normalize to obtain as:
(13) 
Calculate . Then update the achievable SINR in the uplink as
(14)
It can be proved that Algorithm 1 converges to the optimal solution of P4. The proof is similar to [53, Theorem 11.1] and a refined version is provided in Appendix A for completeness.
The optimal uplink beamforming for a given can be derived according to the minimum mean square error (MMSE) criterion:
(15) 
With the inner maximization problem P4 solved for given , we can obtain the objective function value . Next we solve the outer maximization of using a subgradient projection algorithm, where the subgradient can be found using the downlink beamforming obtained from the normalized uplink beamforming.
As proved in Appendix B, is a concave function in . A subgradient of can be expressed as because is the dual variable associated with the th antenna power constraint. The proof is omitted. Based on this result, we propose the following subgradient based algorithm [15][49] to solve P3.
Algorithm 2 to Solve P3:

Initialize . Suppose is the iteration index. Repeat the following steps 2)7) until convergence.

Given , call Algorithm 1 to find the optimal .

Calculate . Then update the achievable SINR in the uplink as
(16) 
Find the optimal normalized uplink beamforming
(17) 
Find the downlink power to achieve the SINR , i.e., to solve the following linear equation set:
(18) 
Update the downlink beamforming vector as and .

Update using the subgradient Euclidean projection method with step size :
(19) where .
This projection can be solved efficiently using the bisection search. The detailed projection algorithm is provided in Algorithm 3 of Appendix C.

Regulate the downlink beamforming. Update the beamforming vector as follows to satisfy all perantenna power constraints:
(20)
Remarks: The subgradient algorithm exploits the structure of the original problem P1, so it is more efficient than a general nonlinear solver. However, the step size is a critical parameter. We find that gives a satisfactory performance. We observe that in general Algorithm 2 can solve P3 faster than the available numerical solver such as Knitro and achieve close to optimal performance, which can be seen in Fig. 1(a) and will be verified using simulation results in Section V. However, there is no guarantee that a subgradient algorithm converges to the exact optimal solution. It may only converge to the neighbourhood of the optimal solution, and its convergence may be slow, as seen in Fig. 1(b). In addition, the subgradient algorithm may not guarantee that the perantenna power constraints will be satisfied, and that is why Step 8) of Algorithm 2 is necessary to regulate the perantenna power.
Another important implication of the development of Algorithm 2 is that it provides an efficient way to recover the primal variable, i.e., the downlink beamforming vectors, given various dual variables either and , or only . The details will be given in the next section.
Iv The Proposed Deep Learning Structure and Strategies
In this section, we develop DL based solutions of P1 that can achieve better performanceefficiency tradeoff than the currently available solutions. Instead of learning to optimize the original beamforming matrix directly, we will learn the optimization of the dual variables in P2. This will dramatically reduce the number of variables that need to be learned. In the sequel, we will first introduce a general DL structure that takes the channel as the input, and the output is the dual variable(s) in P2. We will also devise a generalized learning solution such that the proposed DL structure can deal with varying number of users and antennas and transmit power without retraining. We will then propose two learning strategies, i.e., one is to learn the dual variables and with fast recovery of the original beamforming solution, and the other is to learn only the dual variable with improved learning accuracy, to achieve various tradeoffs.
Iva A General DL Structure
We first show the existence of a neural network that can approximate the solution of the optimization problem P2. To this end, we define and
as two tensors with the optimized dual variables
and , respectively. The neural network aims to learn the continuous mapping(21) 
where is the initialization set of dual variables and denotes the continuous mapping process in Algorithm 2 to achieve the stationary point from the input set of channel coefficients together with the initialization set of dual variables. The following theorem will prove the existence of a feedforward network which imitates the continuous mapping in (21).
Theorem 1
For any given accuracy , there exists a positive constant large enough such that a feedforward neural network with layers can produce similar performance to the mapping process in (21), i.e.,
(22) 
where is the set of the neural network parameters including weights and biases.
Proof: The result in Theorem 1 can be obtained directly by applying the universal approximation theorem in [50] to the continuous mapping in Algorithm 2.
Based on results in Theorem 1, next we find solutions through designing the neural networks with the DL technique. Similar to our previous work [47], we introduce a general DL structure to approximate the mapping function from the channel coefficients to the beamforming solutions, as shown in Fig. 2. In addition to the conventional neural network module, the adopted DL structure also introduces a signal processing module based on expert knowledge for beamforming recovery from the key features, such as the dual variables and in problem P2. Predicting the beamforming matrix directly may lead to high complexity since the number of the variables in the beamforming matrix depends on both the number of users and the number of BS antennas . Thus instead of predicting the beamforming matrix directly, we predict some key features (i.e., the dual variables and ) whose variables are much less than those in the beamforming matrix. Then these key features are used to recover the beamforming matrix in the signal processing module.
The adopted DL structure takes the convolutional neural network (CNN) architecture as the backbone because the parameter sharing adopted in the CNN can reduce the number of the learned parameters when compared to a fullyconnected DNN. Moreover, CNN is well known to be effective for extracting features, which will benefit the generation of the beamforming solution using the channel features. The adopted DL structure includes two main modules: the neural network module and the signal processing module
[51]. Here we give a short description about the two modules, and for more details readers are referred to [47].IvA1 Neural Network Module
The neural network module is a datadriven approach to approximate the mapping function from the complex channels to the key features. In addition to the input and output layers, the neural network module also includes convolutional (CL) layers, batch normalization (BN) layers, activation (AC) layers, a flatten layer, and a fullyconnected (FC) layer. The input of the neural network module is the complex channel coefficients, which are not supported by the current neural network software. To address this issue, we separate the complex channel vector into two components and and form the new input , where and contain the real and imaginary parts of each element in
, respectively. Each CL layer consists of many filters which apply convolution operations to the layer input, capture special patterns and pass the result to the next layer. The parameters of the filters are shared among different channel coefficients. The main function of the BN layers is to normalize the output of the CL layers by two trainable parameters, i.e., a “mean” parameter and a “standard deviation” parameter. Besides, the BN layers can reduce the probability of overfitting and enable a higher learning rate.
AC layers help neural networks extract the useful information and suppress the insignificant points of the input data. The rectified linear unit (ReLU) and sigmoid functions are suitable choices for the last AC layer, since the predicted variables are continuous and positive numbers. The function of the flatten layer is to change the shape of its input into a vector for the FC layer to interpret. In addition to these functional layers, the loss function, marked ‘MSE/MAE’ on the output layer in Fig.
2, is also very important in the introduced DL structure. The mean absolute error (MAE) or the mean square error (MSE) is used in the loss function to update parameters. The loss function together with the learning rate determines how to update the parameters of the neural network module.IvA2 Signal Processing Module
The neural network module offers universality in learning the key features from data, while the signal processing module aims to recover the beamforming matrix from the predicted key features at the output layer. Different from the neural network module whose model is unknown, the signal processing module utilizes the (partially) known models of the data to recover the beamforming matrix. The learned key features and the functionalities in the signal processing module are designated according to the expert knowledge. Note that the expert knowledge is problemdependent and has no unified form, but what is in common is that the expert knowledge can significantly reduce the number of variables to be predicted compared to the beamforming matrix [47]. For example, the dual forms of the original problems are the typical expert knowledge for beamforming optimization. The details of the signal processing module used to recover the beamforming matrix is provided in the next two subsections.
IvB To Learn and and the Recovery Algorithm
With the above proposed general DL structure, we need to decide which features of dual optimization variables in P2 will be learned, and what signal processing function is needed to recover the beamforming matrix. The first option is to learn both and , so the output has variables. Once they are learned, the following algorithm with steps taken from Algorithm 2 can be used to find a feasible beamforming solution that satisfies the perantenna power constraints.
Algorithm 4: To recover from and

Given the learned solution of and , Calculate . Then update the achievable SINR in the uplink as
(23) 
Find the optimal normalized uplink beamforming as
(24) 
Find the downlink power to achieve the SINR , i.e., to solve the following linear equation set:
(25) 
Update the downlink beamforming vector as and .

Regulate the downlink beamforming. Update the beamforming vector as follows to satisfy all perantenna power constraints:
(26)
IvC To Learn Only and the Recovery Algorithm
The above learning strategy is straightforward and fast if the learning result is satisfactory, however, the learning accuracy can be much improved if the number of variables is reduced. This motivates us to use the proposed DL structure to learn only the dual variable with output size of , which contains less variables than the above approach that learns both and . The idea of this approach is that given , the optimal can be efficiently optimized using Algorithm 2, which is more accurate than the learning approach above. An additional advantage is that the output size does not depend on the number of users, so it can more easily adapt to the varying number of users. Once is learned, the following algorithm with steps taken from Algorithm 2 can be used to derive a feasible beamforming solution to the original problem P1.
Algorithm 5: To recover from

Given the learned solution , call Algorithm 1 to find the optimal .

Calculate . Then update the achievable SINR in the uplink as
(27) 
Find the optimal normalized uplink beamforming as
(28) 
Find the downlink power to achieve the SINR , i.e., to solve the following linear equation set:
(29) 
Update the downlink beamforming vector as and .

Regulate the downlink beamforming. Update the beamforming vector as follows to satisfy all perantenna power constraints:
(30)
IvD Generalization of the Proposed DL Structure
In this section, we will generalize the proposed universal DL so that it can adapt to the change of the number of users and antennas. Although the above DL approaches can achieve satisfactory performance for beamforming design, applying the DL approaches to practical applications faces the difficulties caused by the dynamic wireless networks. In other words, when the number of transmit antennas or the number of users
changes, a new model should be trained for prediction. This fact suggests that the applicability of the DL approaches is limited. Transfer learning and training set augmentation are effective ways to improve the generalization. The former transfers an existing model to a new scenario with some additional training and labelling effort
[52], whereas the latter aims to train a largescale model which adapts to different and by adding more samples into the training set, so that the training set can cover more possible scenarios. In this work, we adopt the latter method for simplicity. Without losing generality, we take the DL approach to learning only as an example and give more details about the training set augmentation method.In the training set augmentation method, we aim to train a largescale model with input and output. In order to make the largescale model adaptable to different and values, we generate an augmented training set. Different from the training set whose samples have the same and values, the samples in the augmented training set are diverse, i.e., the numbers of the transmit antennas and the numbers of users in different samples could vary. However, the size of each sample is fixed as input and output. For the cases where (or ) , the redundant rows (or columns) of the channel matrix are filled with 0’s. Similarly, the redundant elements of output are set as 0 when . In each sample, we assume each is generated with the equal probability of and each is generated with the equal probability of . Therefore, the occurrence probabilities of different values are statistically equal among all samples and so are different values. It is suggested that the number of the samples in the augmented training set for the largescale model should be 510 times as many as that in the training set with fixed and values. However, this approach works only if the number of users or antennas does not exceed the maximum values used in the training set, otherwise retraining will be needed.
V Performance Evaluation
Both simulations and experiments are carried out to evaluate the performance of the proposed DL enabled beamforming optimization. We assume that all channel entries undergo independent and identically distributed Rayleigh flatfading with zero mean and unit variance unless otherwise specified, and perfect CSI is available at the BS. All transmit power is normalized by the noise power.
The training samples (dual variables) are generated by solving the problem P2 using Knitro for its stability and efficiency, but can also be generated by solving the problem P1 using the bisection search method at the cost of more computational time during the offline training. In our simulation, we use 20000 training samples and 5000 testing samples, respectively. All of proposed DL networks have one input layer, two CL layers, two BN layers, three AC layers, one flatten layer, one FC layer, and one output layer. Besides, each CL layer has 8 kernels of size
and the first two AC layers adopt the ReLU function. Each CL applies stride 1 and zero padding 1 such that the output width and height of all CLs remain the same as those of the input
[56]. To be specific, the input size of the first CL is and the output size is . Both the input size and output size of the second CL are . When parameter sharing is considered, the numbers of parameters in the first and second CL are (weights)+8(bias)=80, and , respectively, with a total of 664. When no parameter sharing is considered, the numbers of parameters in the two CLs are and , respectively, with a total of . Adam optimizer [57] is used with the mean squared error based loss function. We adopt the sigmoid function in the last AC layer.We will compare the performance and running time of the following schemes when possible:

The optimal solution to solve P2 using Knitro.

The proposed subgradient algorithm (Algorithm 2) in Section III.

The proposed solution based on learned and .

The proposed solution based on learned only.

ZF Solution [25].

When , pseudo inverse of the channel is the optimal beamforming direction, i.e.,
(31) and the achievable SINR is . The overall optimal beamforming matrix is given by .

However, when , the optimal solution relies on solving the following SOCP problem P7, so the associated complexity is high:
P7: (32) s.t. 
When , there is no feasible ZF solution.


RZF Solution [58]. This is a lowcomplexity heuristic solution that improves the performance of ZF especially at the low SNR region. The beamforming direction is given by:
(33) where and the overall beamforming matrix is given by .
For fair comparison, the convergence of all iterative algorithms is achieved when the relative change of the objective function values is below . All algorithms are implemented on an Intel i77700U CPU with 32 GB RAM using Matlab R2017b. One NVIDIA Titan Xp GPU is used to train the neural network.
Va Simulation Results
We first compare the SINR and running time results for a system with in Fig. 3. In Fig. 3 (a), we can see that both the proposed subgradient solution and the solution based on learned can achieve close to optimal solution and outperform the RZF solution and the ZF solution especially at the low signal to noise (SNR) regime. As the SNR increases, all solutions converge to the optimal solution. Fig. 3 (b) shows that both of the proposed learning based solutions can achieve more than an order of magnitude gain in terms of computational time when compared to the optimal algorithm. The proposed subgradient algorithm is more efficient than the optimal solution using Knitro. ZF and RZF solutions have the lowest possible complexity because there is no optimization involved. In addition, we compare the robustness of various schemes against channel errors in Fig. 4. The channel vectors are modelled as , where is the imperfect channel estimate, is the channel error vector and is the variance of channel estimation error. As expected, we can see that the channel estimation error causes degradation of the SINR performance for all the solutions. However, the results show that the proposed learning based solutions and the optimal solution are very robust, but the performance loss of the ZF and RZF beamforming is severe.
Next we demonstrate the scalability of the algorithms when and the number of users varies from 2 to 10 when dB in Fig. 5. As can be seen from Fig. 5 (a), as both the numbers of users and antennas increase, the achievable SINR first decreases and then increases. The performance of the ZF and RZF solutions drops quickly. As the number of users increases, both learning based solutions significantly outperform the ZF solution and the performance gap is enlarged while their gap to the optimal solution remains constant. Fig. 5 (b) shows the complexity performance. The proposed algorithm that learns both and has a lower complexity. As the number of users increases, e.g., when , it can achieve nearly 50fold gain in terms of computational time when compared to the optimal algorithm. The proposed algorithm that learns only achieves 0.5 dB higher SINR than that learns both and at the cost of slightly increased time complexity. Next we examine the SINR performance of the system using a more realistic 3GPP Spatial Channel Model (3GPP TR 25.996) [59] as shown in Fig. 6. We consider a scenario of urban micro cells and assume the distances between the BS and the users are between 50 m and 300 m and distributed uniformly. The total system bandwidth is 20 MHz. Similar trends of the algorithms are observed in Fig. 6 as those in Fig. 5 (a), and both learning based solutions still significantly outperform the RZF and the ZF solutions.
We then consider the performance of a system with transmit antennas at the BS, and vary the number of users when dB in Fig. 7 (a). It is noticed that there is about 1 to 2 dB gap between the learned solutions and the optimal solution, while the ZF solution is almost optimal when . However, from Fig. 7 (b), we can see that the ZF solution has the highest complexity in this case because its solution needs to be optimized via solving the SOCP problem P7. The proposed algorithm that learns both and achieves more than two orders of magnitude gain in terms of computational complexity when compared to the ZF solution.
Next we demonstrate the generalization property of our proposed algorithm that learns only . We train a model with only once, and then use it when and vary. As shown in Fig. 8, it is observed the SINR performances of the optimal solution and the proposed generalization algorithm using the same model not only has the same trend with respect to the number of users, but also are close to each other. More specifically, the achieved SINRs of the two schemes decrease with the increase of the user number when the number of BS antennas is fixed. Such observation validates the feasibility of the training set augment method and motivates further research on improving the generalization of the proposed DLbased algorithms. Besides, we find that adding more antennas can improve the SINR performance because of the spatial gain.
VB Testbed Results
To evaluate the proposed learningbased algorithm in a realworld scenario, we have implemented a multiuser beamforming testbed system based on SDR in our lab environment.
VB1 Testbed Setup
The multiuser beamforming testbed system is based on the SDR structure, which consists of one PC hosting Matlab, a Gigabit Ethernet switch, four NI’s USRP devices as transmitters or receivers and a CDA2990 Clock Distribution Device. The USRP devices and the Clock Distribution Device for synchronization are illustrated in Fig. 9.
We adopt the SDR system since it provides a flexible development environment as well as a practical prototype. The USRP devices are exploited as the radio fronts in the SDR system, which can support different interfacing methods including PCIe and Gigabit Ethernet connections. Besides, the USRP devices can support a wide range of baseband signal processing platforms, including Matlab, Labview and GNU Radio. The transmitters and receivers are implemented using USRP2950 devices, which support the Radio Frequency (RF) range from 50MHz to 2.2GHz [60]. For the evaluation purpose, the 900 MHz Industrial, Scientific and Medical (ISM) frequency band is used. The key parameters of the multiuser beamforming system are listed in Table I.
In the experiment, we consider the scenario consisting of one BS with four transmit antennas and four singleantenna users, i.e., . We combine two USRP2950 devices as a cooperative fourantenna transmitter and employ two USRP2950 devices as four individual singleantenna users. All channels on the USRP devices are synchronized using the CDA2990 Clock Distribution Device. The omnidirectional triband SMA703 antennas are used for both the transmitters and the receivers, while the receiver antennas are extended using RF cables. Specifically, both static and dynamic channel conditions are examined to evaluate the proposed learningbased beamforming algorithms. For the static channel scenario, the transmitter antennas are placed next to each other with a space of 0.1 m, while the receiver antennas are placed 1.5 m away from the transmitter antennas as well as from each other. For the dynamic scenario, a lowmobility scenario is simulated, where one of the receiving antennas is moving at the speed of 0.6 m/s. Besides, the experiment also exploit different transmitter powers to evaluate the algorithms’ performance in different SNR configurations, where 0 dB of transmit power gain corresponds to a transmit power of dBm. Since the multiuser beamforming system coordinates several USRP devices as transmitters and receivers at the same time, a Gigabit Ethernet switch is used to enable multiple USRP interfacing.
Parameters  Descriptions  

Clock and PPS source  CDA2990 10MHz and 1PPS  
Radio Front  USRP 2950, 50MHz2.2GHz  
Antennas  Triband SMA703  
Modulation  QPSK  
Prefix  Gold code of length 127  
Baseband Sample Rate  40 kilosample/second (ksps)  
Pulse Shaping 


Channel Estimation  MMSE Estimator 
The baseband signal processing modules and the proposed learningbased beamforming algorithms are implemented as Matlab function scripts on a PC with 1 Intel i74790 CPU Core, and RAM of 32GB. In the experiment, all users are sharing the same channel and they all use the Quadrature Phase Shift Keying (QPSK) modulation. The payloads are prefixed with different Gold sequences for each user, which are exploited for both synchronization and channel estimation. Besides, all baseband signals are shaped using a Raised Cosine Filter. During the experiment, each user decodes its own payload and provides channel estimation as feedback to the transmitter. The transmitters and receivers are controlled using different Matlab sessions, while the channel estimation information is exchanged locally on the PC’s cache storage. The beamforming algorithms optimize the beam weight vectors using the aggregated channel estimation information. The transmitter applies the optimized beam weight vectors to generate the signals for each antenna before transmission.
VB2 Experiment Results and Discussions
To demonstrate the performance of the proposed learning algorithm (based on learned and ), three benchmark algorithms are implemented on the multiuser beamforming system, which are the theoretically optimal solution, the ZF solution and the RZF solution. Each algorithm is evaluated under both static and dynamic conditions, and we choose bit error rate (BER) as the performance metric. In order to generate the BER performance of each solution, a realtime experiment is conducted using the testbed illustrated in Fig. 9 with different transmitter power. For each transmit power, the BS sends packets each containing 256 QPSK symbols and the BER is calculated based on the averaged bit error of all packets.
Fig. 10 depicts the BER results in the static and dynamic channel conditions as the transmit power gain varies. Under the static condition as shown in Fig. 10 (a), the proposed learningbased algorithm outperforms the ZF solution and RZF solution across the considered transmit power range. Specifically, the BER performance gain of the learning based algorithm is approximately 4 dB over the ZF solution and 3 dB over the RZF solution in the relatively low transmit SNR regime, and this performance gain reduces as the transmit SNR grows. Compared to the theoretically optimal solution, the learningbased algorithm has a close performance in the low transmit SNR regime but becomes inferior for high transmit SNR conditions. This is expected because under static channel conditions, there is sufficient time to implement the theoretically optimal algorithm, therefore it achieves the best performance.
Processing/Solution  CSI Feedback Period  Learned Solution  Theoretically Optimal Solution  ZF Solution  RZF Solution 

Typical Time (second)  2x  5x  8x  2x  2x 
However, the algorithms show difference BER performance under the dynamic channel conditions, as depicted in Fig. 10 (b). The learningbased algorithm outperforms all benchmark algorithms in the relatively medium to high SNR ranges, which corresponds to 0 to 12 dB in Fig. 10 (b). It is worth noting that the learningbased algorithm is superior to the alleged theoretically optimal solution under dynamic channel conditions and in particular, the maximum achieved BER performance gain is approximately 1 dB over the theoretically optimal solution. This result is expected, and can be explained as follows. The beamforming algorithms require up to date CSI for optimization, but the computational delay of the theoretically optimal solution is considerably long, and by the time the solution is found, the channel would have changed. In other words, the theoretically optimal beamforming solution is optimized only based on the outdated CSI, and therefore the mismatch leads to performance degradation, and the theoretically optimal performance can no longer be guaranteed. This can be verified by the typical timeconsumption performance for the considered algorithms as illustrated in Table II. This performance degradation becomes worse when the channel conditions are dynamic than that in the static channel conditions as shown by Fig. 10(a) and Fig. 10(b). It is seen from Table II that the ZF and RZF solutions require much less computational time when optimizing the beamforming weights, so the performance of the ZF solution is close to that of the theoretically optimal solution (degraded by operating on outdated CSI) in the experiment, and the RZF solution even outperforms the optimal solution. However, the BER performance of the ZF and RZF solutions is still inferior to that of the proposed learningbased algorithm. It is worth noticing that under both the static and dynamic channel conditions, the precise channel models are not known, so in the experiment, we resort to the trained neural network based on the smallscale fading for online learning of the beamforming solution. The results in Fig. 10 show that the trained network for one channel model generalizes well to cope with different channel conditions and this will greatly reduce the need to retrain the neural network.
Vi Conclusions and Future Directions
In this paper, we have developed deep learning enabled solutions for fast optimization of downlink beamforming under the perantenna power constraints. Our solutions are both model driven and data driven, and are achieved by exploiting the structure of the beamforming problem, learning the dual variables from labelled data and then recovering the original beamforming solutions. Our solutions can naturally adapt to the varying number of active users in dynamic environments without retraining thus making it more general. The simulation results have shown the superior performancecomplexity tradeoff achieved by the proposed solutions, and the results have been further verified by the testbed experiments using software defined radio.
We would like to point out a few promising future directions. This paper assumes that perfect CSI is available; however in practice, CSI estimation is never perfect. One future direction would be to investigate a more advanced robust learning framework to mitigate channel estimation errors or other types of impairments. As a step further, another promising future direction will be to study how to use deep learning to map directly from the pilot signals to the beamformed signals, bypassing the explicit channel estimation step.
In order to reduce computational complexity of the training process when the channel conditions change, one possible method is to use a wide range of channel realizations during the offline training phase, in order that the neural network can learn to generalize from a wider range of channel variations. Another approach is to employ transfer learning [52]. The main idea is that knowledge learned from one training task for a given channel condition may be transferred to a similar training task for a different channel condition, and can help train a new model with additional examples, which is worthy of further study.
Appendix A. Proof of the convergence and optimality of Algorithm 1 to Solve P4
The proof has two parts. The first part is devoted to the proof of convergence and the second part addresses the uniqueness and optimality of the fixed point after convergence.
Let us start with which is achievable for the power vector . It is easily seen that given , (user index is omitted for convenience) is a standard interference function, which satisfies the following properties [54][55]:

is componentwise monotonically decreasing;

If , then ;

, for all , are all feasible solutions given the SINR constraint .
Assume that at the th iteration, the dual variable is and the achievable SINR is . Then at the th iteration, according to (P1), , and as such and in Step 4). According to P2, in Step 5) we have the SINR result . Then, according to (P3), , and therefore , i.e., the balanced SINR is increasing as the iteration goes. Since is upper bounded, the algorithm converges to a fixed point . Next, we prove that the fixed point is also optimal.
We see that satisfies the following fixedpoint equation:
(34) 
and it satisfies the total virtual uplink power is . Clearly, the total uplink transmit power is a monotonic nondecreasing function of the SINR constraint. This implies that there is no solution which provides a strictly higher SINR but still maintains the power constraint .
Appendix B. Proof that of P4 is a concave function in .
Proof: First note that Algorithm 1 to solve P4 belongs to a fixedpoint iteration, which means a solution that satisfies the first two constraints (11) and (11) with equality ensuring an optimal solution. This indicates there is no local optimum, and the gap between P4 and its dual problem is zero. Then it suffices to prove that the objective function of the dual problem of P4 is concave in .
By using (11) of [23], we can rewrite P4 as
P4’:  
s.t.  (35)  
Its Lagrangian is
(36)  
where are dual variables. Note that it is derived based on the maximization rather than the commonly used minimization of an objective function .
The dual objective function is expressed as which is to be minimized over and only contains a linear term of about , and the constraints of the dual problem (although not derived here) do not involve . Therefore the dual objective function is a pointwise minimum of a family of affine functions about and as a result concave [61, Sec.3.2.2], so is . This completes the proof.
Appendix C. To find the subgradient Euclidean projection in Algorithm 2
The Euclidean projection is needed when the update of based on the subgradient in Algorithm 2 does not fall into the feasible set . It needs to solve the following optimization problem:
(37) 
where . Although P5 is a convex problem and can be solved by a standard numerical algorithm, below we derive its analytical property and propose a more efficient bisection algorithm to solve it.
Its Lagrangian can be expressed as
(38) 
where and are dual variables.
Setting its firstorder derivative to be zero leads to
(39) 
Substitute it to and we get
(40) 
Therefore the remaining task is to find that satisfies (40). Obviously the left hand side of (40) is monotonic in , so we propose the following bisection method to find the optimal .
Algorithm 3 to Solve P5:

Set the upper and lower bounds of as and . Repeat the following steps until convergence.

Calculate .

If , ; otherwise .
References
 [1] C. Lim, T. Yoo, B. Clerckx, B. Lee and B. Shim, “Recent trend of multiuser MIMO in LTEadvanced,” IEEE Commun. Mag., vol. 51, no. 3, pp. 127–135, Mar. 2013.
 [2] F. Boccardi, R. W. Heath Jr., A. Lozano, T. L. Marzetta, and P. Popovski, “Five disruptive technology directions for 5G,” IEEE Commun. Mag., vol. 52, no. 2, pp. 74–80, Feb. 2014.
 [3] B. Bellalta, “IEEE 802.11ax: Highefficiency WLANS,” IEEE Wireless Commun., vol. 23, no. 1, pp. 38–46, Feb. 2016.
 [4] P. D. Arapoglou, A. Ginesi, S. Cioni, S. Erl, F. Clazzer, S. Andrenacci, and A. VanelliCoralli, “DVB‐S2X‐enabled precoding for high throughput satellite systems,” International Journal of Satellite Communications and Networking, vol. 34, no. 3, pp. 439–455, May 2016.
 [5] M. Schubert and H. Boche, “Solution of the multiuser downlink beamforming problem with individual SINR constraints,” IEEE Trans. Veh. Technol., vol. 53, no. 1, pp. 18–28, Jan. 2004.
 [6] E. Björnson, M. Bengtsson, and B. Ottersten, “Optimal multiuser transmit beamforming: A difficult problem with a simple solution structure,” IEEE Signal Process. Mag., vol. 31, no. 4, pp. 142–148, Jul. 2014.
 [7] F. RashidFarrokhi, K. R. Liu, and L. Tassiulas, “Transmit beamforming and power control for cellular wireless systems,” IEEE J. Sel. Areas Commun., vol. 16, no. 8, pp. 1437–1450, Oct. 1998.
 [8] A. Wiesel, Y. C. Eldar, and S. Shamai, “Linear precoding via conic optimization for fixed MIMO receivers,” IEEE Trans. Signal Process., vol. 54, no. 1, pp. 161–176, Jan. 2006.
 [9] A. B. Gershman, N. D. Sidiropoulos, S. Shahbazpanahi, M. Bengtsson, and B. Ottersten, “Convex optimizationbased beamforming,” IEEE Signal Process. Mag., vol. 27, no. 3, pp. 62–75, May 2010.
 [10] Q. Shi, M. Razaviyayn, M. Hong, and Z. Luo, “SINR constrained beamforming for a MIMO multiuser downlink system: Algorithms and convergence analysis,” IEEE Trans. Signal Process., vol. 64, no. 11, pp. 2920–2933, Jun. 2016.
 [11] T. Yoo and A. Goldsmith, “On the optimality of multiantenna broadcast scheduling using zeroforcing beamforming,” IEEE J. Sel. Areas Commun., vol. 24, no. 3, pp. 528–541, Mar. 2006.
 [12] S. S. Christensen, R. Agarwal, E. D. Carvalho, and J. M. Cioffi, “Weighted sumrate maximization using weighted MMSE for MIMOBC beamforming design,” IEEE Trans. Wireless Commun., vol. 7, no. 12, pp. 4792–4799, Dec. 2008.
 [13] Q. Shi, M. Razaviyayn, Z. Luo, and C. He, “An iteratively weighted MMSE approach to distributed sumutility maximization for a MIMO interfering broadcast channel,” IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4331–4340, Sept. 2011.
 [14] Z.Q. Luo, W.K. Ma, A. M.C. So, Y. Ye, and S. Zhang, “Semidefinite relaxation of quadratic optimization problems,” IEEE Signal Process. Mag., vol. 27, no. 3, pp. 20–34, May 2010.
 [15] W. Yu and T. Lan, “Transmitter optimization for the multiantenna downlink with perantenna power constraints,” IEEE Trans. Signal Process., vol. 55, no. 6, pp. 2646–2660, June 2007.
 [16] L.N. Tran, M. Juntti, M. Bengtsson, and B. Ottersten, “Beamformer designs for MISO broadcast channels with zeroforcing dirty paper coding,” IEEE Trans. Wireless Commun., vol. 12, no. 3, pp. 1173–1185, Mar. 2013.
 [17] G. Zheng, S. Chatzinotas, and B. Ottersten, “Generic optimization of linear precoding in multibeam satellite systems,” IEEE Trans. Wireless Commun., vol. 11, no. 6, pp. 2308–2320, June 2012.
 [18] S. K. Mohammed and E. G. Larsson, “Perantenna constant envelope precoding for large multiuser MIMO systems,” IEEE Trans. Commun., vol. 61, no. 3, pp. 1059–1071, Mar. 2013.
 [19] C. Xing, Y. Ma, Y. Zhou, and F. Gao, “Transceiver optimization for multihop communications with perantenna power constraints,” IEEE Trans. Signal Process., vol. 64, no. 6, pp. 1519–1534, Mar. 2016.
 [20] H. Shen, W. Xu, A. Swindlehurst and C. Zhao, “Transmitter optimization for perantenna power constrained multiantenna downlinks: an slnr maximization methodology,” IEEE Trans. Signal Process., vol. 64, no, 10, pp. 2712–2725, May 2016.
 [21] C. Xing, Y. Jing, S. Wang, J. Wang, and J. An, “A general framework for covariance matrix optimization in MIMO systems,” [Online]. Available: http://arxiv.org/abs/1711.04449.
 [22] H. V. Cheng, D. Persson, and E. G. Larsson, “Optimal MIMO precoding under a constraint on the amplifier power consumption,” IEEE Trans. Signal Process., vol. 67, no. 1, pp. 218–229, Jan. 2019.
 [23] A. J. Fehske, F. Richter and G. P. Fettweis, “SINR balancing for the multiuser downlink under general power constraints,” in Proc. IEEE Global Telecommunications Conference (GLOBECOM), New Orleans, Louisiana, 30 Nov.  4 Dec. 2008, pp. 1–6.
 [24] Ziena Optimization Inc, available: http://www.ziena.com.
 [25] A. Wiesel, Y. C. Eldar and S. Shamai, “Zeroforcing precoding and generalized inverses,” IEEE Trans. Signal Processing, vol. 56, no. 9, pp. 4409–4418, Sept. 2008.
 [26] N. Farsad and A. Goldsmith, “Detection algorithms for communication systems using deep learning,” [Online]. Available: http://arxiv.org/abs/1705.08044.
 [27] R. Shafin, L. Liu, V. Chandrasekhar, H. Chen, J. Reed, and J. Zhang, “Artificial intelligenceenabled cellular networks: a critical path to beyond5G and 6G,” [Online.] Available: https://arxiv.org/abs/1907.07862, Jul. 2019.
 [28] S. Mosleh, L. Liu, C. Sahin, Y. R. Zheng, and Y. Yi, “Braininspired wireless communications: where reservoir computing meets MIMOOFDM,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 10, pp. 4694–4708, Oct. 2018.
 [29] T. J. O’Shea, and Jakob Hoydis, “An introduction to deep learning for the physical layer,” IEEE Trans. Cogn. Commun. Netw, vol. 3, no. 4, pp. 563–575, Dec. 2017.
 [30] S. Dörner, S. Cammerer, J. Hoydis, and S. t. Brink, “Deep learning based communication over the air,” IEEE J. Sel. Topics Signal Process., vol. 12, no. 1, pp. 132–143, Feb. 2018.
 [31] F. Liang, C. Shen, and F. Wu, “An iterative BPCNN architecture for channel decoding,” IEEE J. Sel. Topics Signal Process., vol. 12, no. 1, pp. 144–159, Feb. 2018.
 [32] H. Kim, Y. Jiang, R. Rana, S. Kannan, S. Oh, and P. Viswanath, “Communication algorithms via deep learning,” [Online]. Available: http://arxiv.org/abs/1805.09317.
 [33] H. He, C.K. Wen, S. Jin, and G. Y. Li, “A modeldriven deep learning network for MIMO detection,” in Proc. IEEE Global Conference on Signal and Information Processing (GlobalSIP), Anaheim, CA, USA, Nov. 2018, pp. 1–5.
 [34] N. Samuel, T. Diskin, and A. Wiesel, “Learning to detect,” IEEE Trans. Signal Process., vol. 67, no. 10, pp. 2554–2564, May 2019.
 [35] H. He, C. Wen, S. Jin, and G. Y. Li, “Deep learningbased channel estimation for beamspace mmWave massive MIMO systems,” IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 852–855, Oct. 2018.
 [36] C.K. Wen, W.T. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback,” IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 748–751, Oct. 2018.
 [37] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos, “Learning to optimize: Training deep neural networks for wireless resource management,” IEEE Trans. Signal Process., vol. 66, no. 20, pp. 5438–5453, Oct. 2018.
 [38] F. Liang, C. Shen, W. Yu, and F. Wu, “Towards optimal power control via ensembling deep neural networks,” to appear in IEEE Trans. Commun.
 [39] W. Lee, M. Kim, and D.H. Cho, “Deep power control: Transmit power control scheme based on convolutional neural network,” IEEE Commun. Lett., vol. 22, no. 6, pp. 1276–1279, Jun. 2018.

[40]
X. Li, J. Fang, W. Cheng, H. Duan, Z. Chen, and H. Li, “Intelligent power control for spectrum sharing in cognitive radios: A deep reinforcement learning approach,”
IEEE Access, vol. 6, pp. 25 463–25 473, May 2018.  [41] T. V. Chien, T. N. Canh, E. Bjornson, and Erik G. Larsson, “Power control in cellular massive MIMO with varying user activity: a deep learning solution,” [Online]. Available: http://arxiv.org/abs/1901.03620.
 [42] Y. Shi, A. Konar, N. D. Sidiropoulos, X. Mao, and Y. Liu, “Learning to beamform for minimum outage,” IEEE Trans. Signal Process., vol. 66, no. 19, pp. 5180–5193, Oct. 2018.
 [43] P. de Kerret and D. Gesbert, “Robust decentralized joint precoding using team deep neural network,” in Proc. Int. Symp. Wireless Commun. Systems (ISWCS), Lisbon, Portugal, Aug. 2018.
 [44] A. Alkhateeb, S. Alex, P. Varkey, Y. Li, Q. Qu, and D. Tujkovic, “Deep learning coordinated beamforming for highlymobile millimeter wave systems,” IEEE Access, vol. 6, pp. 37 328–37 348, June 2018.

[45]
H. Huang, W. Xia, J. Xiong, J. Yang, G. Zheng, and X. Zhu, “Unsupervised learningbased fast beamforming design for downlink MIMO,”
IEEE Access, vol. 7, pp. 75997605, Jan. 2019.  [46] W. Xia, G. Zheng, Y. Zhu, J. Zhang, J. Wang, and A. Petropulu, “Deep learning based beamforming neural networks in downlink MISO systems,” in Proc. IEEE Int. Conf. Commun. (ICC) Workshop, Shanghai, China, May 2019, pp. 15.
 [47] W. Xia, G. Zheng, Y. Zhu, J. Zhang, J. Wang, and A. P. Petropulu, “A deep learning framework for optimization of MISO downlink beamforming,” to appear in IEEE Trans. Commun. [Online]. Available: http://arxiv.org/abs/1901.00354.
 [48] H. He, S. Jin, C.K. Wen, F. Gao, G. Y. Li, and Z. Xu, “Modeldriven deep learning for physical layer communications,” IEEE Wireless Commun., vol. 26, no. 5, pp. 7783, Oct. 2019
 [49] “Subgradient methods for constrained problems,” Stanford EE364B lecture notes, [Online]. Available: https://web.stanford.edu/class/ee364b/lectures/constr_subgrad_slides.pdf.
 [50] K. Hornik, M. Stinchcombe, and H. White,“Multilayer feedforward networks are universal approximators,” Neural networks, vol. 2, no. 5, pp. 359–366, 1989.
 [51] Z. Zhang, X. Chen, and Z. Tian, “A hybrid neural network framework and application to radar automatic target recognition,” arXiv preprint arXiv:1809.10795, 2018.
 [52] Y. Shen, Y. Shi, J. Zhang, and K. B. Letaief, “Transfer learning for mixedinteger resource allocation problems in wireless networks,” in Proc. IEEE Int. Conf. Commun. (ICC), Shanghai, China, May 2019.
 [53] G. Zheng, Y. M. Huang and K. K. Wong, “Chapter 10: Network MIMO Techniques” in the book Heterogeneous Cellular Networks: Theory, Simulation, and Deployment, edited by Xiaoli Chu, David LópezPérez, Fredrik Gunnarsson, and Yang Yang, Cambridge University Press, July 2013.
 [54] R. D. Yates, “A framework for uplink power control in cellular radio systems,” IEEE J. Select. Areas Commun., vol. 13, no. 7, pp. 1341–348, Sep. 1995.
 [55] S. Ulukus and R. D. Yates, “Adaptive power control and multiuser interference suppression,” ACM Wireless Net., vol. 4, no. 6, pp. 489–496, 1998.
 [56] “Stanford CS class CS231n: Convolutional Neural Networks for Visual Recognition”, [Online]. Available: http://cs231n.github.io/convolutionalnetworks/.
 [57] J. Ba and D. Kingma, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learning Representations (ICLR), San Diego, USA, May 2015, pp. 1–15.
 [58] C. Peel, B. Hochwald, and A. Swindlehurst, “A vectorperturbation technique for nearcapacity multiantenna multiuser communicationpart I: Channel inversion and regularization,” IEEE Trans. Commun., vol. 53, no. 1, pp. 195–202, Jan. 2005.
 [59] J. Salo, G. Del Galdo, J. Salmi, P. Kyösti, M. Milojevic, D. Laselva, and C. Schneider, MATLAB implementation of the 3GPP Spatial Channel Model (3GPP TR 25.996), [Online]. Available: http://www.tkk.fi/Units/Radio/scm/, Jan. 2005.
 [60] USRP2950, [Online]. Available: http://www.ni.com/engb/shop/select/usrpsoftwaredefinedradioreconfigurabledevice?modelId=125061.
 [61] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004.