PrecoderNet: Hybrid Beamforming for Millimeter Wave Systems Using Deep Reinforcement Learning

07/31/2019 ∙ by Qisheng Wang, et al. ∙ 0

Millimeter wave (mmWave) with large-scale antenna arrays is a promising solution to resolve the frequency resource shortage in next generation wireless communication. However, fully digital beamforming structure becomes infeasible due to its prohibitively high hardware cost and unacceptable energy consumption while traditional hybrid beamforming algorithms have unnegligible gap to the optimal up bound. In this paper, we consider a mmWave point-to-point massive multiple-input-multiple-output (MIMO) system and propose a new hybrid analog and digital beamforming (HBF) scheme based on deep reinforcement learning (DRL) to improve the spectral efficiency and reduce system bit error rate (BER). At the base station (BS) side, we propose a novel DRL-based HBF design method called PrecoderNet to design the hybrid precoding matrix. The DRL agent denotes the system sum rate as state and the real /imaginary part of the digital beamformer as actions. For the user side, the minimum mean-square-error (MMSE) criterion is used to design the receiving hybrid precoders which minimizes the distance between the processed signals and the transmitted signals. Furthermore, HBF design algorithm such as weighted MMSE and orthogonal matching pursuit (OMP) are regarded as benchmarks to verify the performance of our algorithm. Finally, simulation results demonstrate that our proposed PrecoderNet outperforms the benchmarks in terms of spectral efficiency and BER while is more tractable in practical implementation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

With huge amounts of traffic data constantly rushing into the cellular mobile systems, mmWave communication is considered as a promising solution to resolve the frequency resource shortage problem, meanwhile, improving system spectral efficiency compared to the current 4G-LTE system [1]. Commonly, the mmWave signals experience fast attenuation during transmission and have weak penetration ability due to its extremely short wavelength. However, the conventional massive multiple-input-multiple-output (MIMO) beamforming technologies are capable of providing sufficient antenna array gain for mmWave system to coupe with the high penetration loss in practical implementations. Massive MIMO technology combined with mmWave communication guarantees more physically achievable antennas to obtain non-trivial antenna array gain which sequentially leads to better performance. For example, MIMO of 128 antennas in 3GHz frequency point of LTE with half-wavelength of antenna spacing needs a square with size of 6.4m, while equipped with a millimeter wave of 28GHz, the same MIMO array requires only 0.68m that is drastically scaled down[2].

However, the design of the beamforming matrix in mmWave systems is constrained by expensive millimeter-wave radio-frequency (RF) chains. Traditional full-digital beamformer needs to connect a corresponding RF chain used for AD/DA and up-and-down conversion for each transmit antenna and receive antenna when combined with massive MIMO, imposing intolerant power consumption and hardware cost on the system. Against these problems, there are two main approaches in the research of the hybrid beamforming (HBF) design: the first regards HBF as a matrix factorization problem and minimize the Frobenius norm or, equivalently, the Euclidean distance [3] and [4]. Authors in [3] proposed a HBF architecture called spatially sparse precoding (SSP) based on orthogonal matching pursuit (OMP) by jointly designing and optimizing the precoding matrix of the transceiver which converts the full digital beamformer [5] into a low-dimension baseband digital one requiring RF chains and a high-dimension analog one realized only by phase shifters to significantly reduce the number of expensive RF chains required and achieve near full-digital performance. They decouple the original precoding problem into hybrid precoding and combing sub-problems and assume high transmit signal-to-noise-ratio (SNR) for the convenience of problem relation, which resulting in suboptimal solutions.

The second manner is directly optimizing the original objective using the minimal mean square error (MMSE) criterion [5], [6], [7]. Massive MIMO hybrid precoding design algorithm based on the MMSE criterion in [6] proves the hybrid beamforming can realize any fully digital beamformer with the RF chains number greater than or equal to twice the number of transmitting streams. Despite the viable performance, higher algorithm complexity and RF number introduce unnegligible processing delays in most application scenarios in 5G systems. The same delay problem also emerges in manifold optimization (MO) based hybrid precoding algorithm [7] and [8]. Due to the hardware and constant modulus constraints on the analog precoding matrix, it is still challenging to design an optimal and practical hybrid beamformer, especially with low complexity and processing delay mmWave hybrid beamforming.

In previous literatures, deep supervised learning (DSL) are used to reduce the algorithm complexity in hybrid beamformer design

[9], [10], [11]

. Deep neural network based supervised learning shows great performance during online testing, but often requires a extensive sample library for offline training, which is sensitive to environment, i.e., the channel conditions in mmWave systems. In this paper, we demonstrate that deep reinforcement learning (DRL) can be used for the design of hybrid beamforming matrices. DRL has been indicated to achieve close or even surpassed human performance in Go game

[12] and robot control [13]

due to its powerful ability to deal with nonlinear non-convex problems. Model-free DRL agent recasts the action prediction problem as a markov decision process (MDP)

[14] obtaining feedback and current state from the environment, and uses a few shots to effectively learn the optimal behavioral policy for complex problems settlement based on the principle of long-term expectation reward maximization. Value-based DRL algorithm deep Q-networks (DQN) [12] focus on processing discrete control problems, while policy-based deep deterministic policy gradient (DDPG) can be used to deal with continuous action control problems [15], which we used to design the hybrid beamformer for its continuous action space and sparse reward in this paper.

In our work, we propose a novel DDPG-based hybrid beamforming design algorithm called the PrecoderNet. The current channel information is taken as the state while the performance indicators such as spectral efficiency and bit error rate (BER) are regarded as the reward function. The real/imaginary part of precoder matrix elements are selected as the action. Therefore, the mmWave hybrid precoding problem can be modeled as a MDP that can be effectively solved by DRL. We use the state as the input of the DRL agent, therefore, the output of the agent is exactly the vectorization formation of the HBF matrix. More specifically, we develop a novel network architecture called the PrecoderNet based on the DDPG algorithm

[15] to eliminate the performance gap with low computational complexity. We remark that DDPG-based PrecoderNet can efficiently use the samples generated previously to train the agent without calculating numerous database for offline training in DSL. Thus, our algorithm is more energy-efficient and tractable than the DSL method. Furthermore, the DRL algorithm is essentially a gradient descent algorithm, so good initial points have significant impacts on the algorithm convergence. We utilize external knowledge from the hybrid precoding designs in [3] to significantly accelerate the learning process of precoding design problem inspired by [16].

According to the idea that initializes the PrecoderNet with the OMP solution in [3] and explores the global optimal HBF solution by DDPG, we then evaluate our approach empirically by putting forward the proposed PrecoderNet on a narrowband single-user massive MIMO mmWave HBF communication scenario to improve the performance and ensure the convergence of our algorithm. Simulation results show that both the spectral efficiency (rate) and the BER outperforms the benchmarks, and has a more smaller gap to the full-digital upper bound. It is worth nothing that our algorithm can also be extended to multi-user (MU) large-scale MIMO system and wideband aspects.

The rest of this paper is organized as follows. Section II introduces the researched mmWave system. After the introduction of RL background (Section III), we describe the proposed algorithm in Section IV. The experimental results are given in Section V. Finally, Section VI concludes the paper and provides some discussions about the approach.

Ii System model

Ii-a Network Model

Consider a mmWave single-cell multiuser downlink large scale MIMO system in which the base station (BS) is equipped with transmitting antennas and independent data streams up-converted by RF chains (), then transmitted simultaneously to serve users with receiving antennas per user. The number of transceiver antennas satisfy . Each data stream on the BS side is converted from digital-to-analog (DA) by a dedicated RF chain after processed via a baseband digital beamforming matrix. At the user side, receiving antennas connected with RF chains () for analog-to-digital conversion (AD) decode the receiving signal. Due to the limited number of RF chains of both sides, the full digital beamforming requiring one RF chain per transceiver antenna is impossible under mmWave condition. Instead, we consider using a hybrid beamforming architecture, as shown in Figure 1.


Fig. 1: System model

Based on the aforementioned hardware constraints, the equivalent beamforming matrix consists of one baseband digital beamforming matrix and one analog beamforming matrix connected after the RF chains, where the low-dimension only needs a small number of RF chains, and the high-dimension can be constructed with simple phase shifters (PS) to greatly reduce the hardware complexity. The analog beamforming matrix consisting of PS is subjected to constant modulus constraints, i.e., , or . Though the PS can only provide limited beamforming gain, the large scale antenna arrays will compensate its performance.


Fig. 2: Architecture of a point-to-point full-connected HBF

In our hybrid beamforming architecture, we use a fully-connected structure between transmitting RF chains to the transmit antennas similar to [3]. For the simplicity of presentation, we consider a point-to-point mmWave MIMO single-user HBF scenario as shown in Fig 2. The output signal of each RF chain is propagated to the transmit antenna via phase shifters. Then the signals are combined and finally transmitted by the transmitting antenna. Therefore, the transmitter needs a total of PSs. The signal received by each receiving antenna at the user side is divided into streams by a splitter and processed by the receiving analog precoding matrix , then, the data streams are incorporated and passed to RF chains. The analog precoding matrix also satisfies constant modulus constraints. A total of PSs are required at the user side. After the RF chain performs ADC and down-conversion to the signals, the receiving digital beamforming matrix recovers the data streams prepared for subsequent demodulation.

Ii-B Channel Model

According to [1], severe decadency, strong penetration loss and limited scattering paths exist in mmWave system.. In addition, the large-scale MIMO antenna arrays are integrated in a much smaller physical size, as a result, so the spatial correlation between antennas cannot be ignored. Therefore, we adopt the geometric Saleh-Valenzuela (S-V) channel model [17] similar to [3] and [18]. Consider a uniform linear array (ULA) with half-wavelength of the antenna spacing . Assuming that there are scattering clusters in the environment and each cluster can provide scattering ray. The discrete narrowband channel as shown in (1):

(1)

where denotes the complex path gain in the ray of the cluster, and , are the normalized receiver and transmitter array response, respectively, where the angle of arrival and departure are denoted as and respectively. The array response of a ULA with antennas can be expressed as (2):

(2)

The average power of all clusters must satisfy the power constraints of the channel: , where is a constant to .

Ii-C Signal Model

The discrete time transmit signal is denoted by , where represents the transmitted data streams satisfying power constraint . Then we can present the receiving signal at user side as (3):

(3)

where is the additional white Gaussian noise with zero mean and covariance matrix , i.e., . When we transmit cyclic symmetric complex Gaussian signal s in the system, the spectral efficiency can be represented by (4):

(4)

where is the interference and noise covariance matrix after combination in the receiver.

Iii Preliminaries

In this segment, we precisely introduce the basal knowledge about the dynamic programming model MDP and the used DRL algorithm DDPG for readers’ reference.

Markov Decision Process (MDP): A MDP consists of one agent, of which the interaction between agent and environment can be represented by a quintuple <S, ,r,,>. S represents the state space while means the action space. Reward r:SR is the feedback from environment measuring the chosen action under current state. is a discount factor that converts an infinite sequence problem into a matter with a maximum upper bound in order that the MDP can converge within finite steps. represents the policy on which the agent selects action depends, and the chosen action is .

Deep Q-networks (DQN): DQN[12] approximates the value-based Q-learning state-value function (s,a)= as a deep neural network with parameter , where is the expected return of the current state-action against the discount factor. The goal of DQN is to maximize the target [] of the s-a pair, and update Q-value by bellman equation in dynamic programming. Then the gradient descent

will be carried out after random sampling in the experience replay, and the action with the largest Q value is selected with probability

or randomly selected with probability .

Deep Deterministic Policy Gradient (DDPG): DDPG[15] is an actor-critic (AC) algorithm using the policy-based deterministic policy network parameterized by to generate deterministic action . DDPG updates the learned actor policy networks parameterized by with gradient descent by taking advantage of the Q-network in DQN as the critic so that it can maximize the output Q-value.

We also offer the summary of symbols and notations for convenience shown in Table I.

Action the agent choose at time slot
State the agent reach at time slot
Reward of the agent at time slot
Number of transmitting antennas
Number of receiving antennas
Number of transmitting RF chains
Number of receiving RF chains
Transmitting digital beamforming matrix
Transmitting analog beamforming matrix
Receiving digital beamforming matrix
Receiving analog beamforming matrix
Number of scatter clusters
Number of ray of each cluster
Power gain of ray in cluster
Angle of department of ray in cluster
Angle of arrival of ray in cluster
Wave length of mmWave
Antenna spacing distance
Covariance of environment AWGN n
Power covariance of cluster
Noise covariance matrix in the receiver
Parameter of neural network
TABLE I: Summary of symbols and notations

Iv Algorithm

Iv-a Problem Formulation

In this work, we consider a narrowband mmWave point-to-point downlink massive MIMO system as shown in Fig 2. In such a communication system, we aim to maximum the spectral efficiency (4) by hybrid beamforming and ensure accepatble user quality of service (QoS) measured by BER under the hardware constraints aforementioned. Perfect instantaneous channel state information (CSI) is assumed to be known at both transmiter/receiber which can be accurately estimated by the zero-forcing method

[19]. Thus the HBF design problem can be written as

(5)

(5a), (5b) are the constant modulus constraint of transceiver analog beamforming matrix and (5c) is the total transmitter power constraint. The joint optimization of four precoders is usually found to be difficult to solve along with non-convex constraints [3], [18]. A tractable sub-optimal but efficient method is to decouple the transmitter and receiver HBF design and solve them is a sequential manner [3], [6], [8], [18]. Previous literatures indicate that this approach can achieve near-full-digital performance. Following this trajectory, we further use deep reinforcement learning to search for a near-global optimal solution via the ability of DRL algorithm to process nonlinear non-concave problem, and propose the so-called PrecoderNet to design the HBF by combining DRL and MMSE criterion.

Iv-B DDPG-based Transmitting Hybrid Beamformer Design

In this section, we first focus on the design of hybrid beamforming matrix at the transmitter side. Without loss of generality, we assume identical number of transmit and receive RF chains, i.e., , to simplify the notation. According to [3], the original problem (5) with fixed and can be converted to an Euclidean distance minimum problem as following (6):

(6)

where

is the full-digital solution was well as the right single value decomposition unitary matrix. This conversion is based on the assumption that

is an approximate diagonal matrix and high transmit SNR, i.e., , which results in a suboptimal design for . However, it has been found in [3] that optimal is exactly the linear combination of array response vector in (2) and the design of is to select best complex weighting factor of these with constant modulus in nature.

Traditional approaches to design like OMP in [3] and MO in [8] both assume good sparsity of the digital precoding matrix which is not satisfied in practical implementation. In addition, the hypothesis of approximately infinite number , makes the dimension of go to infinity. This property inspires us to utilize the continuous control DRL approach DDPG [15] to deal with such a high dimension problem. To our best knowledge, this is the first time that DRL is successfully applied to hybrid beamforming design.

As introduced in Section III, DDPG composed of a actornet to generate action and a criticnet to evaluate the output of actor can take out continuous action value which is corresponding to the continuity of HBF elements. We propose a DDPG-based mmWave HBF architecture to device transmitting hybrid precoding matrix called PrecoderNet of which each part possesses specifical implication, as shown in Fig. 3.


Fig. 3: Architecture of PrecoderNet

As illustrated in Fig. 3, the beamforming agent first receives and the estimated channel H and interacts with environment to obtain spectral efficiency as reward. Then our agent reshapes the complex-value matrix into a vector and further separate the real/imaginary parts as the final input of the neural networks. The input series expressed as (7) are denoted as and abbreviated as and , respectively.

(7)

where denotes current communication state composed of and at time slot t and K equals to . The baseband digital beamforming design strategy is based on the quality value (Q-function) expressed in (8) of state :

(8)

The notation is the finite set of all actions a and is the discount factor to maintain the MDP a bounded iteratively solvable problem. Taking the state as excitation, the actor net A consisting of neural networks gives out a vector as selected action. This vector is recognized to be a matrix as the new state . Afterwards, the agent stores the tuple into a experience replay D [12]. The critic net C evaluates the actornet by sampling a N-size minibatch prior experience from replay buffer D as approximation of

and the loss function of

C is given in (9) and (10).

(9)
(10)

Then the policy gradient to update C and A is in accordance with (11), (12). The loss function of evaluated network A and the target network parameterized by and respectively are used to mitigate the over-fitting problem. The critic net also contains an evaluated network C and a target network C’ parameterized by and respectively as shown in Fig 3. We soft update all the target networks by according to [15]. The algorithm will converage in few time slots as shown in Section V.

(11)
(12)

In this way, our PrecoderNet can learn a optimal digital beamformer online and use samples from previously stored experience to update parameters, which improves the learning efficiency while reducing the computational complexity compared with the deep supervised learning [10], [13] and [16]. Finally we can transmit the signals and further design the receiver beamforming matrix via MMSE criterion nextly.

Iv-C MMSE-based Receiving Hybrid Beamformer Design

In the second part of this section, we solve the receiver hybrid beamforming combiner design problem, i.e., and , based on the learned and via MMSE criterion. The received signal at receiver antennas is and the processed received signals are shown in (3). With fixed hybrid precoders , we can minimize the mean-square-error (MSE) between the transmitted and processed signals which can be stated as following (13):

(13)

Where represents the product of and analog beamformer still has to satisfy the constant modulus constraint (13a). Such a minimum MSE problem without hardware constraint is well-known [20] as (14):

(14)

In hybrid receiving beamforming design, we still use the same idea as the transmitting beamformer which converts this problem into a Euclidean distance minimum one as proved in [3]. The optimal analog beamformer is a linear combination of receiver array response similar to and thus the optimal can be obtained by OMP method. Note that after the update of , the PrecoderNet iterates a new until the agent converges. In addition, if the CSI changes, our proposed algorithm can automatically learn a new optimal solution in nearly none time.

V Simulation

In this section, we demonstrate the performance of proposed algorithm by presenting the simulation results compared with benchmarks [3], [5] and give corresponding parameter setting.

V-a Hyperparameters of PrecoderNet

In our experiments, we construct the DDPG-based PrecoderNet via four-layered forward neutral networks using Adam optimizer [21] to operate the gradient descent of the evaluated network and the target network. The size of the input layer is and the output layer has neurons. There are two hidden layers in the networks of which the neuron number is 400 and 300 in order and each of first three layers follows a ReLU function as activation layer while the output layer uses tanh function to provide descent gradient [22]. The learning rate equals to 1e-4 and discount factor empirically. We set to soft update the target network. The same as DDPG, the additional noise for exploration is selected as Gaussian noise which obeys .

V-B Results

Consider a narrowband111We remark that our proposed algorithm is model-free because of its direct evaluation of value function and thus can be easily extended to wideband scenarios regardless of the concrete environment model which is more general in mmWave system. mmWave massive MIMO point-to-point hybrid beamforming system consisting of a BS with transmitting antennas and a user with receiving antennas. Without loss of generality, we set representing there are six data streams to be sent and the number of RF chains at both transmitter/receiver sides. Environment noise

obeys complex Gaussian distribution with zero mean and covariance

, i.e., . The spread angles of transmitter and receiver in azimuth domain are equal, i.e., . Assume the scatted cluster number and all clusters have equal power, i.e., , while ray number of each cluster in the limited-scatter mmWave environment. We first compare the performance of the proposed PrecoderNet to the traditional hybrid beamforming algorithms [3] and the fully-digital beamforming algorithm [5] while the theoretical up bound is also provided in which signals are sent via the eigenmodes of channel. The horizontal axis signal-to-noise ratio is given as SNR.


Fig. 4: Spectral efficiency of HBF for 6416 mmWave system

The spectral efficiency is shown in Fig. 4 in a system with uniform linear arrays (ULA) at both BS and user sides. Our proposed algorithm obtains higher rate than optimal unconstrained MMSE-based full digital beamformer [8] and the OMP-based hybrid beamformer [3]. At the low SNR region ranging from -15dB to -10dB, the rate our method achieved is slightly smaller than MMSE but higher than SSP-OMP. When the SNR is larger than 10dB, we obtain the best spectral efficiency performance. In addition, the results of the PrecorderNet is much closer to the upper bound than the contrast algorithms.


Fig. 5: BER of HBF for 6416 mmWave system
(15)

We further compare the achieved bit-error-ratio (BER) after processed by the receiver hybrid precoders of above three algorithms as shown in Fig. 5. The BER is defined as the number of error demodulated signal to the total transmit signal number as expressed in (15). We transmit symbols per data streams and use the quadrature phase shifting keying (QPSK) to modulate the data into four constellation points in the Cartesian coordinate. With the additional white Gaussian noise , the received signals at user side are preprocessed by and then demodulate according to the maximum likelihood criterion [23]. Simulation results indicates that our proposed approach achieves the best BER performance. For example, with the same SNR=-5dB, our algorithm obtains a BER at while the benchmarks are both about .


Fig. 6: Spectral efficiency of HBF for 12832 mmWave system

Fig. 7: BER of HBF for 12832 mmWave system

Then we extend these methods to a system and observe the same performance indicators as above. From Fig. 6, we can see that the PrecoderNet always achieves better spectral efficiency in both low and high SNR region compared with the other two algorithms. We still examine the BER of these three approached in this scenario with larger antenna arrays as shown in Fig. 7 with the same setting as Fig. 5, The results show that our method obtain better BER and when SNR>15dB, the BER of PrecoderNet is nearly zero which dramatically outperforms the benchmarks.

Finally, statistical analyses about application complexity was performed by comparing the consumed time of the proposed algorithm with benchmarks as summarized in Table II.

Algorithms Time (averaged in 2000 episodes)
PrecoderNet   0.0023934s / 2.3934ms
SSP-OMP     0.254s / 254ms
MMSE     0.466s / 466ms
TABLE II: Run time comparison of three approaches

The above time is averaged on the results obtained after 2000 simulations. It can be seen from the time consumption that a after-trained PrecoderNet calculates a available digital precoding matrix one in tenth of other algorithms which means more energy efficient and befitting to mmWave system.

Vi Conclusion

In this paper, we focus on the hybrid beamforming design problem for mmWave massive MIMO system and propose a novel HBF design algorithm called PrecoderNet using DRL and MMSE criterion at the transmitter and receiver sides respectively. The system spectral efficiency and BER are used to demonstrate the performance of our proposed algorithm. Numerical results reveals that the proposed algorithm outperforms the benchmarks in the high SNR region and is closer to the upper bound as the SNR increases. Moreover, the spectral efficiency gain compared to the benchmarks becomes more pronounced in the large SNR regime. As for the system reliability, by using the PrecoderNet, the BER of the entire system can be decreased nearly to zero as the SNR increases, which certifies deep reinforcement learning is a promising approach to deal with the (hybrid) beamforming design problem.

References

  • [1] Y. Niu, Y. Li, D. Jin, L. Su, and A. V. Vasilakos, “A survey of millimeter wave communications (mmwave) for 5g: opportunities and challenges,” Wireless networks, vol. 21, no. 8, pp. 2657–2676, 2015.
  • [2] W. Hong, K.-H. Baek, Y. Lee, Y. Kim, and S.-T. Ko, “Study and prototyping of practically large-scale mmwave antenna systems for 5g cellular devices,” IEEE Communications Magazine, vol. 52, no. 9, pp. 63–69, 2014.
  • [3] O. El Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath, “Spatially sparse precoding in millimeter wave mimo systems,” IEEE transactions on wireless communications, vol. 13, no. 3, pp. 1499–1513, 2014.
  • [4] X. Gao, L. Dai, S. Han, I. Chih-Lin, and R. W. Heath, “Energy-efficient hybrid analog and digital precoding for mmwave mimo systems with large antenna arrays,” IEEE Journal on Selected Areas in Communications, vol. 34, no. 4, pp. 998–1009, 2016.
  • [5] H. Sampath, P. Stoica, and A. Paulraj, “Generalized linear precoder and decoder design for mimo channels using the weighted mmse criterion,” IEEE Transactions on Communications, vol. 49, no. 12, pp. 2198–2206, 2001.
  • [6] F. Sohrabi and W. Yu, “Hybrid digital and analog beamforming design for large-scale antenna arrays,” IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 3, pp. 501–513, 2016.
  • [7] D. H. Nguyen, L. B. Le, and T. Le-Ngoc, “Hybrid mmse precoding for mmwave multiuser mimo systems,” in 2016 IEEE International Conference on Communications (ICC), pp. 1–6, IEEE, 2016.
  • [8] X. Yu, J.-C. Shen, J. Zhang, and K. B. Letaief, “Alternating minimization algorithms for hybrid precoding in millimeter wave mimo systems,” IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 3, pp. 485–500, 2016.
  • [9] W. Xia, G. Zheng, Y. Zhu, J. Zhang, J. Wang, and A. P. Petropulu, “A deep learning framework for optimization of miso downlink beamforming,” arXiv preprint arXiv:1901.00354, 2019.
  • [10]

    A. Alkhateeb, S. Alex, P. Varkey, Y. Li, Q. Qu, and D. Tujkovic, “Deep learning coordinated beamforming for highly-mobile millimeter wave systems,”

    IEEE Access, vol. 6, pp. 37328–37348, 2018.
  • [11] H. Huang, Y. Song, J. Yang, G. Gui, and F. Adachi, “Deep-learning-based millimeter-wave massive mimo for hybrid precoding,” IEEE Transactions on Vehicular Technology, vol. 68, no. 3, pp. 3027–3032, 2019.
  • [12] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, p. 529, 2015.
  • [13] S. Gu, E. Holly, T. Lillicrap, and S. Levine, “Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates,” in 2017 IEEE international conference on robotics and automation (ICRA), pp. 3389–3396, IEEE, 2017.
  • [14] D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein, “The complexity of decentralized control of markov decision processes,” Mathematics of operations research, vol. 27, no. 4, pp. 819–840, 2002.
  • [15] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
  • [16] X. Gao, S. Jin, C.-K. Wen, and G. Y. Li, “Comnet: Combination of deep learning and expert knowledge in ofdm receivers,” IEEE Communications Letters, vol. 22, no. 12, pp. 2627–2630, 2018.
  • [17] V. Raghavan and A. M. Sayeed, “Sublinear capacity scaling laws for sparse mimo channels,” IEEE Transactions on Information Theory, vol. 57, no. 1, pp. 345–364, 2010.
  • [18] T. Lin, J. Cong, Y. Zhu, J. Zhang, and K. B. Letaief, “Hybrid beamforming for millimeter wave systems using the mmse criterion,” IEEE Transactions on Communications, vol. 67, no. 5, pp. 3693–3708, 2019.
  • [19] Q. H. Spencer, A. L. Swindlehurst, and M. Haardt, “Zero-forcing methods for downlink spatial multiplexing in multiuser mimo channels,” IEEE transactions on signal processing, vol. 52, no. 2, pp. 461–471, 2004.
  • [20] T. Kailath, A. H. Sayed, and B. Hassibi, Linear estimation. No. BOOK, Prentice Hall, 2000.
  • [21] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [22] H. V. Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” Computer Science, 2015.
  • [23] D. J. Zwickl, Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. PhD thesis, 2006.