I Introduction
Millimeter wave (mmWave) communication attracted considerable interest in the last few years, thanks to the high data rates enabled by its large available bandwidth [1, 2, 3]. This makes mmWave a key technology for nextgeneration wireless systems [4, 5, 6, 7]. Most of the prior research has focused on developing beamforming strategies [8, 9, 10], evaluating the performance [11, 12, 13], or studying the practical feasibility of mmWave communication at fixed or lowmobility wireless systems [14, 15, 16]. But can mmWave also support highlymobile yet datahungry applications, such as vehicular communications or wireless augmented/virtual reality (AR/VR)? Enabling these applications faces several critical challenges: (i) the sensitivity of mmWave signal propagation to blockages and the large signaltonoise ratio (SNR) differences between lineofsight (LOS) and nonLOS links severely impact the reliability of the mobile systems, (ii) with mobility, and in dense deployments, the user needs to frequently hand over from one base station (BS) to another, which imposes control overhead and introduces a latency problem, and (iii) with large arrays, adjusting the beamforming vectors requires large training overhead, which imposes a fundamental limit on supporting mobile users. In this paper, we develop a novel solution based on coordinated beamforming, and leveraging tools from machine learning, to jointly address all these challenges and enable highlymobile mmWave systems.
Ia Prior Work
Coordinating the transmission between multiple BSs to simultaneously serve the same user is one main solution for enhancing the coverage and overcoming the frequent handover problems in dense mmWave systems [17, 18, 19]. In [17], extensive measurements for 73 GHz coordinated multipoint transmission were done at an urban open square scenario in downtown Brooklyn. The measurements showed that serving a user simultaneously by a number of BSs achieves significant coverage improvement. Analyzing the network coverage of coordinated mmWave beamforming was also addressed in prior work [18, 19], mainly using tools from stochastic geometry. In [18], the performance of heterogeneous mmWave cellular networks was analyzed to show that a considerable coverage gain can be achieved using base station cooperation, where the user is simultaneously served by multiple BSs. In [19]
, a setup where the user is only connected to LOS BSs was considered and the probability of having at least one LOS BS was analyzed. The results showed that the density of coordinating BSs should scale with the square of the blockage density to maintain the same LOS connectivity. While
[17, 18, 19] proved the significant coverage gain of BS coordination, they did not investigate how to construct these coordinated beamforming vectors, which is normally associated with high coordination overhead. This paper, therefore, targets developing lowcomplexity mmWave coordination strategies that harness the coordination coverage and latency gains but with low coordination overhead.The other major challenge with highlymobile mmWave systems is the huge training overhead associated with adjusting large array beamforming vectors. Developing beamforming/channel estimation solutions to reduce this training overhead has attracted considerable research interest in the last few years
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33]. This prior research has mainly focused on three directions: (i) beam training [20, 21, 22, 23], (ii) compressive channel estimation [24, 25, 26, 27, 28], and (iii) location aided beamforming [29, 30, 31, 32, 33]. In beam training, the candidate beams at the transmitter and receiver are directly trained using exhaustive or adaptive search to select the ones that optimize the metric of interest, e.g., SNR. Beam training, though, requires large overhead to train all the possible beams and is mainly suitable for singleuser and single stream transmissions [20, 21, 22, 23]. In order to enable spatial multiplexing at mmWave systems, [24, 25, 26, 27, 28] proposed to leverage the sparsity of mmWave channels and formulated the mmWave channel estimation problem as a sparse reconstruction problem. Compressive sensing tools were then used to efficiently estimate the parameters (angles of arrival/departure, path gains, etc.) of the sparse channel. While compressive channel estimation techniques can generally reduce the training overhead compared to exhaustive search solutions, they still require relatively large training overhead that scales with the number of antennas. Further, compressive channel estimation techniques normally make hard assumptions on the exact sparsity of the channel and the quantization of the angles of arrival/departure, which leaves their practical feasibility uncertain.To further reduce the training overhead, and given the directivity nature of mmWave beamforming, outofband information such as the locations of the transmitter and receiver can be leveraged to reduce the beamforming training overhead [29, 30, 31, 32, 33]. In [29], the transmitter/receiver location information was exploited to guide the sensing matrix design used in the compressive estimation of the channel. Position information was also leveraged in [30, 31] to build the beamforming vectors in LOS mmWave backhaul and vehicular systems. In [32, 33], the BSs serving vehicular systems build a database relating the vehicle position and the beam training result. This database is then leveraged to reduce the training overhead with the knowledge of the vehicle location. While the solutions in [29, 30, 31, 32, 33] showed that the position information can reduce the training overhead, relying only on the location information to design the beamforming vectors has several limitations. First, positionacquisition sensors, such as GPS, have limited accuracy, normally in the order of meters, which may not work efficiently with narrowbeam systems. Second, GPS sensors do not work well inside buildings, which makes these solutions not capable of supporting indoor applications. Further, the beamforming vectors are not merely a function of the transmitter/receiver location but also of the environment geometry, blockages, etc. This makes locationbased beamforming solutions mainly suitable for LOS environment, as the same location in NLOS environment may correspond to different beamforming vectors depending, for example, on the position of the obstacles.
IB Contribution
In this paper, we propose a novel integrated communication and machine learning solution for highlymobile mmWave applications. Our proposed solution considers a coordinated beamforming system where a set of BSs simultaneously serve one mobile user. For this system, a deep learning model learns how to predict the BSs beamforming vectors directly from the signals received at the distributed BSs using only omni or quasiomni beam patterns. This is motivated by the intuition that the signals jointly received at the distributed BSs draw a defining multipath signature not only of the user location, but also of its surrounding environment. This proposed solution has multiple gains. First, making beamforming prediction based on the uplink received signals, and not on position information, enables the developed strategy to support both LOS and NLOS scenarios and waves the requirement for special positionacquisition sensors. Second, the prediction of the optimal beams requires only omni received pilots, which can be captured with negligible training overhead. Further, the deep learning model in the proposed system operation does not require any training before deployment, as it learns and adapts to any environment. Finally, since the proposed deep learning model is integrated with the coordinated beamforming system, it inherits the coverage and reliability gains of coordination. More specifically, this paper contributions can be summarized as follows.

We propose a lowcomplexity coordinated beamforming system in which a number of BSs adopting RF beamforming, linked to a central cloud processor applying baseband processing, simultaneously serve a mobile user. For this system, we formulate the training and design problem of the central baseband and BSs RF beamforming vectors to maximize the system effective achievable rate. The effective rate is a metric that accounts for the tradeoff between the beamforming training overhead and achievable rate with the designed beamforming vectors, which makes it a suitable metric for highlymobile mmWave systems.

We develop a baseline coordinated beamforming strategy for the adopted system, which depends on uplink training in designing the RF and baseband beamforming vectors. With this baseline solution, the BSs first select their RF beamforming vectors from a predefined codebook. Then, a central processor designs its baseband beamforming to ensure coherent combining at the user. We prove that in some special yet important cases, the baseline beamforming strategy obtains optimal achievable rates. This solution, though, requires high training overhead, which motivates the integration with machine learning models.

We propose a novel integrated deep learning and coordinated beamforming solution, and develop its system operation and machine learning modeling. The key idea of the proposed solution is to leverage the signals received at the coordinating BSs with only omni or quasiomni patterns, i.e., with negligible training overhead, to predict their RF beamforming vectors. Further, the developed solution enables harvesting the widecoverage and lowlatency coordinated beamforming gains with low coordination overhead, rendering it a promising enabling solution for highlymobile mmWave applications.
Extensive simulations were performed to evaluate the performance of the developed solution and the impact of the key system and machine learning parameters. At both LOS and NLOS scenarios, the results show that the effective achievable rate of the developed solution approaches that of the genieaided coordinated beamforming which knows the optimal beamforming vectors with no training overhead. Compared to the baseline solution, deeplearning coordinated beamforming achieves a noticeable gain, especially when users are moving with high speed and when the BSs deploy large antenna arrays. The results also confirm the ability of the proposed deep learning based beamforming to learn and adapt to timevarying environment, which is important for the system robustness. Further, the results show that learning coordinated beamforming may not require phase synchronization among the coordinating BSs, which is especially important for practical implementations. All that highlights the capability of the proposed deeplearning solution in efficiently supporting highlymobile applications in largearray mmWave systems.
Notation: We use the following notation throughout this paper: is a matrix, is a vector, is a scalar, and is a set. is the determinant of , whereas , , are its transpose, Hermitian (conjugate transpose), and conjugate respectively. is a diagonal matrix with the entries of on its diagonal, and is a block diagonal matrix with the matrices on the diagonal.
is the identity matrix and
is a complex Gaussian random vector with mean and covariance .Ii System and Channel Models
In this section, we describe the adopted frequencyselective coordinated mmWave system and channel models. The key assumptions made for each model are also highlighted.
Iia System Model
Consider the mmWave communication system in Fig. 1, where base stations (BSs) or access points (APs) are simultaneously serving one mobile station (MS). Each BS is equipped with antennas and all the BSs are connected to a centralized/cloud processing unit. For simplicity, we assume that every BS has only one RF chain and is applying analogonly beamforming using networks of phase shifters [1]. Extensions to more sophisticated mmWave precoding architectures at the BSs such as hybrid precoding [8, 9] are also interesting for future research. In this paper, we assume that the mobile user has a single antenna. The developed algorithms and solutions, though, can be extended to multiantenna users.
In the downlink transmission, the data symbol at subcarrier is first precoded using the digital precoder at the central/cloud processing unit. The resulting symbols are transformed to the time domain using point IFFTs. A cyclic prefix of length is then added to the symbol blocks before sending them to the BSs using errornegligible and delaynegligible wired channels, e.g., optical fiber cables. Every BS applies a timedomain analog beamforming and transmits the resulting signal. The discretetime transmitted complex baseband signal from the th BS at the th subcarrier can then be written as
(1) 
where the transmitted signal on the th subcarrier is normalized such that , with the average total transmit power. Since the RF beamforming is assumed to be implemented using networks of quantized phase shifters, the entries of are modeled as , where is a quantized angle. Adopting a persubcarrier transmit power constraint and defining , the cloud baseband precoder and the BSs RF beamformers satisfy
(2) 
At the user, assuming perfect frequency and carrier offset synchronization, the received signal is transformed to the frequency domain using a Kpoint FFT. Denoting the channel vector between the user and the th BS at the th subcarrier as , the received signal at subcarrier after processing can be expressed as
(3) 
where is the receive noise at subcarrier .
IiB Channel Model
We adopt a geometric wideband mmWave channel model [3, 34, 7, 35] with clusters. Each cluster is assumed to contribute with one ray that has a time delay , and azimuth/elevation angles of arrival (AoA) . Further, let denote the pathloss between the user and the th BS, and represents a pulse shaping function for spaced signaling evaluated at seconds [27]. With this model, the delayd channel vector between the user and the th BS, , follows
(4) 
where is the array response vector of the th BS at the AoA . Given the delayd channel in (4), the frequency domain channel vector at subcarrier , , can be written as
(5) 
Considering a blockfading channel model, are assumed to stay constant over the channel coherence time, denoted , which depends on the user mobility and the channel multipath components [36] . In the next section, we will develop the problem formulation and discuss this channel coherence time in more detail.
Iii Problem Formulation
The main goal of the proposed coordinated mmWave beamforming system is to enable wireless applications with high mobility and high data rate requirements, and with strict constraints on the coverage, reliability, and latency. Thanks to simultaneously serving the user from multiple BSs, the coordinated beamforming system in Section II provides transmission diversity and robustness against blockage, which directly enhances the system coverage, reliability, and latency. The main challenge, however, with this system is achieving the high data rate requirements, as the time overhead of training and designing the cloud baseband and terminals RF beamforming vectors can be very large, especially for highlymobile users. With this motivation, this paper focuses on developing efficient channel training and beamforming design strategies that maximize the system effective achievable rate, and enable highlymobile mmWave applications. Next, we formulate the effective achievable rate optimization problem.
Achievable Rate: Given the system and channel models in Section II, and employing the cloud and RF beamformers , , the user achievable rate is expressed as
(6) 
where denotes the signaltonoise ratio.
Due to the constraints on the RF hardware, such as the availability of only quantized angles, , for the RF phase shifters, the BSs RF beamforming vectors , can take only certain values [8, 20, 37, 24]. Therefore, we assume that the RF beamforming vectors are selected from finitesize codebooks, which we formally state in the following assumption.
Assumption 1
The BSs RF beamforming vectors are subject to the quantized codebook constraint, , where the cardinality of is .
The optimal cloud baseband and terminals RF beamforming vectors that maximize the system achievable rate can then be found by solving
(7)  
(8)  
(9) 
which is addressed in the next lemma.
Lemma 1
Proof: The proof is straightforward, and follows from the maximum ratio transmit solution by noting that the power constraint can be reduced to , given the block diagonal structure of the RF precoding matrix .
Effective Achievable Rate: The optimal achievable rate , given by Lemma 1, assumes perfect channel knowledge at the cloud processing unit and RF terminals. Obtaining this channel knowledge, however, is very challenging and requires large training overhead in mmWave systems with RF architectures. This is mainly due to (i) the large number of antennas at the BSs, and (ii) the RF filtering of the channel seen at the baseband [9]. To accurately evaluate the actual rate experienced by the mobile user, it is important to incorporate the impact of this time overhead required for the channel training and beamforming design. For that, we adopt the effective achievable rate metric, which we define shortly.
The formulation of the effective achievable rate requires understanding how often the beamforming vectors need to be redesigned as the user moves. This can be captured by one of two metrics: (i) the channel coherence time , which is the time over which the multipath channel remains almost constant, and (ii) the channel beam coherence time , which is a recent concept introduced for mmWave systems to represent the average time over which the beams stay aligned [36]. While the channel coherence time is normally shorter than the beam coherence time, It was shown in [36] that updating the beams every beam coherence time incurs negligible receive power loss compared to updating them every channel coherence time. Adopting this model, we make the following assumption on the system operation.
Assumption 2
The cloud baseband and terminal RF beamforming vectors are assumed to be retrained and redesigned every beam coherence time, , such that the first time of every beam coherence time is allocated for the channel training and beamforming design, and the rest of it is used for the data transmission using the designed beamforming vectors.
Now, we define the effective achievable rate, , as the achievable rate using certain precoders, , times the percentage of time these precoders are used for data transmission, i.e.,
(12) 
The effective achievable rate in (12) captures the impact of user mobility on the actually experienced data rate. For example, with higher mobility, the beam coherence time decreases, which results in lower data rate for the same beamforming vectors and beam training overhead. The objective of this paper is then to develop efficient channel training and beamforming design strategies that maximize the system effective achievable rate. If represents a certain channel training/beamforming design strategy that requires training overhead to design the cloud and RF beamforming vectors , the final problem formulation can then be written as
(13)  
(14)  
(15) 
Solving the problem in (13)(15) means developing solutions that require very low channel training overhead to realize beamforming vectors that maximize the system achievable rate, . It is worth noting also that represents an ultimate upper bound for the effective achievable rate with and .
In the literature, two main directions to address this mmWave channel estimation/beamforming design problem are compressed sensing and beam training. In compressed sensing, the sparsity of mmWave channels is leveraged and random beams are employed to estimate the multipath channel parameters, such as the angles or arrival and path gains [24, 38, 25, 27, 26]. The estimated channel can then be used to construct the beamforming vectors. The other approach is to directly train the RF beamforming vectors through exhaustive or hierarchical search to find the best beams [20, 21, 7]. Each of the two directions has its own advantages and limitations. Both of them, though, require large training overhead which makes them inefficient in handling highlymobile mmWave applications. In this paper, we show that integrating machine learning tools with typical mmWave beam training solutions can yield efficient channel training/beamforming design strategies that have very low training overhead and nearoptimal achievable rates, which enables highlymobile mmWave systems.
In the next sections, we present a baseline coordinated mmWave beamforming solution based on conventional beam training techniques. Then, we show in Section V how machine learning models can be integrated with the proposed baseline solution, leading to novel techniques with nearoptimal effective achievable rates for mmWave systems.
Iv Baseline Coordinated Beamforming
In this section, we present a baseline solution for the channel training/beamforming design problem in (13)(15) based on conventional communication system tools. The proposed solution has low beamforming design complexity and enables the integration with the machine learning model in Section V. In the following subsections, we present the baseline solution and evaluate its achievable rate performance and mobility support.
Iva Proposed Solution
As shown in Section III, for a given set of RF beamforming vectors , the cloud baseband beamformers can be written optimally as a function of the effective channel . This implies that the cloud baseband and terminal RF beamforming design problem is separable and can be solved in two stages for the RF and baseband beamformers. To find the optimal RF beamforming vectors, though, an exhaustive search over all possible BSs beamforming combinations is needed, as indicated in (11). This yields high computational complexity, especially for large antenna systems with large codebook sizes. For the sake of lowcomplexity solution, we propose the following system operation.
Uplink Simultaneous Beam Training: In this stage, the user transmits repeated pilot sequences of the form to the BSs. During this training time, every BS switches between its RF beamforming vectors such that it combines every received pilot sequence with a different RF beamforming vector. Let denotes the th beamforming codeword in , then the combined received signal at the th BS for the th training sequence can be expressed as
(16) 
where is the receive noise vector at the th BS and th subcarrier.
The combined signals for all the beamforming codewords are then fed back from all the BSs/terminals to the cloud processor, which calculates the received power using every RF beamforming vector and selects the BSs downlink RF beamforming vectors separately for every BS, according to
(17) 
Note that selecting the RF beamforming vectors disjointly for the different BSs avoids the combinatorial optimization complexity of the exhaustive search and enables the integration with the machine learning model, as will be discussed in Section
V. Further, this disjoint optimization can be shown to yield optimal achievable rate in some important special cases for mmWave systems, which will be discussed in the next subsection. Once the RF beamforming vectors are selected, the cloud baseband beamforming vectors are constructed according to (10).Downlink Coordinated Data Transmission: The designed cloud and RF beamforming vectors are employed for the downlink data transmission to achieve the coverage, reliability, and latency gains of the coordinated beamforming transmission. With the proposed baseline solution for the channel training/beamforming design, and denoting the beam training pilot sequence time as , the effective achievable rate, , can be characterized as
(18) 
where the RF beamforming vectors , are given by (17).
IvB Performance Analysis and Mobility Support
In this subsection, we evaluate the achievable rate performance of the proposed solution and discuss its mobility support.
Achievable Rate: Despite its low complexity and the disjoint RF beamforming design, the achievable rate of the baseline coordinated beamforming solution converges to the upper bound in important special cases for mmWave systems, namely in the singlepath channels and large antenna regimes, which is captured by the following proposition.
Proposition 1
Consider the system and channel models in Section II, with a pulse shaping function , then the achievable rate of the baseline coordinated beamforming solution satisfies
(19) 
and when a beamsteering codebook is adopted, with beamforming codewords for some quantized angles , the achievable rate of the baseline solution follows
(20) 
Proof: The proof is simple and is omitted due to space limitation.
Proposition 1 shows that, for some important special cases, the disjoint RF beamforming design across BSs achieves the same data rate of the upper bound which requires combinatorial optimization complexity.
Effective Achievable Rate and Mobility Support: The effective achievable rate depends on (i) the time overhead in training the channel and designing the beamforming vectors, and (ii) the achievable rate using the constructed beamforming vectors. While the baseline solution can achieve optimal rate in some special yet important mmWaverelevant cases, the main drawback of this solution is the requirement of large training overhead, as it exhaustively searches over all the codebook beamforming vectors. This makes it very inefficient in supporting wireless applications with high throughput and mobility requirements. For example, consider a system model with BSs employing uniform planar antenna arrays, and adopting an oversampled beamsteering RF codebook of size . If the pilot sequence training time is us, this means that the training over head will consume of the channel beam coherence time for a vehicle moving with speed mph, whose beam coherence time is around ms [36]. In the next section, we show how machine learning can be integrated with this baseline solution to dramatically reduce this training overhead and enable highlymobile mmWave applications.
V Deep Learning Coordinated Beamforming
Machine learning has attracted considerable interest in the last few years, thanks to its ability in creating smart systems that can take successful decisions and make accurate predictions. Inspired by these gains, this section introduces a novel application of machine learning in mmWave coordinated beamforming. We show that leveraging machine learning tools can yield interesting performance gains that are very difficult to attain with traditional communication systems. In the next subsections, we first explain the main idea of the proposed coordinated deep learning beamforming solution, highlighting its advantages. Then, we delve into a detailed description of the system operation and the machine learning modeling. For a brief background on machine/deep learning, we refer the reader to [39].
Va The Main Idea
As discussed in Section IV, the key challenge in supporting highlymobile mmWave applications is the large training overhead associated with estimating the largescale MIMO channel or scanning the large number of narrow beams. An important note about these beam training solutions (and similarly for compressed sensing) is that they normally do not make any use of the past experience, i.e., the previous beam training results. Intuitively, the beam training result is a function of the environment setup (user/BS locations, room furniture, street buildings and trees, etc.). These functions, though, are difficult to characterize by closedform equations, as they generally convolve many parameters and are unique for every environment setup.
In this paper, we propose to integrate deep learning models with the communication system design to learn the implicit mapping function relating the environment setup, which include the environment geometry and user location among others, and the beam training results. To achieve that, the main question is how to characterize the user locations and environment setup in the learning models at the BSs? One solution is to rely on the GPS data fed back from the users. This solution, however, has several drawbacks: (i) the GPS accuracy is normally in the order of meters, which may not be reliable for mmWave narrow beamforming, (ii) GPS devices do not work well inside buildings, and therefore will not support indoor applications, such as wireless virtual/augmented reality. Further, relying only on the user location is insufficient as the beamforming direction depends also on the environment, which is not captured by the GPS data. In the proposed solution, the machine learning model uses the uplink pilot signal received at the terminal BSs with only omni or quasiomni beam patterns to learn and predict the best RF beamforming vectors. Note that these received pilot signals at the BSs are the results of the interaction between the transmitted signal from the user and the different elements of the environment through propagation, reflection, and diffraction. Therefore, these pilots, which are received jointly at the different BSs, draw an RF signature of the environment and the user/BS locations — the signature we need to learn the beamforming directions.
This proposed coordinated deep learning solution operates in two phases. In the first phase (learning), the deep learning model monitors the beam training operations and learns the mapping from the omnireceived pilots to the beam training results. In the second phase (prediction), the system relies on the developed deep learning model to predict the best RF beamforming using only the omnireceived pilots, totally eliminating the need for beam training. This solution, therefore, achieves multiple important gains in the same time. First, it does not need any special resources for learning, such as GPS data, as the deep learning model learns how to select the beamforming vectors directly from the received uplink pilot signal. Second, since the deep learning model predicts the best RF beamforming vectors using only omnireceived uplink pilots, the proposed solution has negligible training overhead and can efficiently support highlymobile mmWave applications, as will be shown in Section VI. It is worth noting here that while combining the uplink training signal with omni patterns penalizes the receive SNR, we show in Section VIC that this is still sufficient to efficiently train the learning model with reasonable uplink transmit power. Another key advantage of the proposed system operation is that the deep learning model does not need to be trained before deployment, as it learns and adapts to any environment, and can support both LOS and NLOS scenarios. Further, as we will see in Section VI, the deep learning model learns and memorizes the different scenarios it experiences, such as different traffic patterns, which enables it to become more robust over time. Finally, since the proposed deep learning model is integrated with the baseline coordinated beamforming solution, the resulting system inherits the coverage, reliability, and latency gains discussed in Section III.
VB System Operation
The proposed deep learning coordinated beamforming integrates machine learning with the baseline beamforming solution in Section IV to reduce the training overhead and achieves high effective achievable rates. This integrated system operates in two phases, namely the online learning and the deep learning prediction phases depicted in Figures 2 and 3. Next, we explain the two phases in detail.
Phase 1: Online learning phase:
In this phase, the machine learning model monitors the operation of the baseline coordinated beamforming system and trains its neural network. Specifically, for every beam coherence time
, the user sends repeated uplink training pilot sequences . Similar to the baseline solution explained in Section IVA, every BS switches between its RF beamforming beams in the codebook such that it combines every received pilot sequence with a different RF beamforming vector. The only difference is that every BS will also receive one additional uplink pilot sequence using an omni (or quasiomni) beam, , as depicted in Fig. 3(a), to obtain the received signal(21) 
The combined signals will be fed back from all the BS terminals to the cloud. The cloud performs two tasks. First, it selects the downlink RF beamforming vector for every BS according to (17) and the baseband beamformers as in (10), which is similar to the baseline solution in Section IVA. Second, it feeds the machine learning model with (i) the omnireceived sequences from all the BSs which represent the inputs to the deep learning model, and (ii) the achievable rate of every RF beamforming vector defined as
(22) 
which represent the desired outputs from the machine learning model, as will be described in detail in Section VC. The deep learning model is, therefore, trained online to learn the implicit relation between the OFDM omnireceived signals captured jointly at all the BSs, which represent a defining signature for the user location/environment, and the rates of the different RF beamforming vectors. Once the model is trained, the system operation switches to the second phase — deep learning prediction. It is important to note here that using omni patterns at the BSs during the uplink training reduces the receive SNR compared to the case when combining the received signal with narrow beams. We show in Section VIC, though, that this receive SNR with omni patterns is sufficient to efficiently train the neural networks under reasonable assumptions on the uplink training power.
Phase 2: Deep learning prediction phase : In this phase, the system relies on the trained deep learning model to predict the RF beamforming vectors based on only the omnireceived signals captured at the BS terminals. Specifically, at every beam coherence time, , the user transmits an uplink pilot sequence . The BS terminals combine the received signals using the omni (or quasiomni) beamforming patterns used in the online learning phase. This constructs the combined signals which are fed back to the cloud processor, as depicted in Fig. 3(b). Using these omni combined signals , the cloud then asks the trained deep learning model to predict the best RF beamforming vector that maximizes the achievable rate in (22) for every BS . Finally, the predicted RF beamforming vectors are used by the BS terminals to combine the uplink pilot sequence, and to estimate the effective channels , which are used to construct the cloud baseband beamforming vectors according to (10).
In the deep learning prediction phase, the system effective achievable rate is given by
(23) 
where the training time represents the time spent for the uplink training of the omni pattern and the predicted beam , each requiring one beam training pilot sequence time, . Note that we neglected the processing time of executing the deep learning model, as it is normally one or two orders of magnitude less than the overtheair beam training time, . It is also worth mentioning that, in general, the deep learning model can predict the best beams for every BS to be refined in the uplink training, instead of just predicting the best beam, . In this case, the training overhead will be , which will still be much smaller than the baseline training overhead, as should typically be much smaller than .
An important question is when will the system switch its operation from the first phase (learning) to the second phase (prediction)? During the learning phase, and thanks to the proposed system design, the cloud processor can keep calculating both the effective achievable rate of the baseline solution , and the estimated effective rate of the learning phase . The system can then switch to the deep learning prediction phase when . This also results in an overall effective achievable rate of . Note that this result implies that the deep learning model will only be leveraged when it can achieve a better rate than the baseline solution and that it has almost no cost on the system performance. Finally, we assume for simplicity that the system will completely switch to the second phase after the deep learning model is trained. In practice, however, the system should periodically switch back to the online learning phase to ensure updating the learning model with any changes in the environment. Designing and optimizing this mixed system operation for timevarying environment models is an interesting future research direction.
VC Machine Learning Modeling
In this subsection, we describe the different elements of the proposed machine learning model: (i) the input/output representation and normalization, (ii) the neural network architecture, and (iii) the adopted deep learning model. It is worth mentioning that the machine learning model presented in this section is just one possible solution for the integrated communication and learning system proposed in Section VB, with no optimality guarantees on its performance or complexity. Developing other machine learning models with higher performance and less complexity is an interesting and important future research direction.
Input representation and normalization: As discussed in Section VB, the proposed deep learning coordinated beamforming solution relies on omni (or quasiomni) received signals to predict distributed beamforming directions. Based on that, we propose to define the inputs to the neural network model as the OFDM omnireceived sequences, , collected from the BSs. Since the sparse mmWave channel is highly correlated in the frequency domain [54], we will only consider a subset of the OFDM symbols for the inputs of the learning model. For simplicity, we will set the inputs of the model to be equal to the first samples, of the Kpoint OFDM symbol. Note that inputing the raw data directly to the neural network without extracting further features is motivated by the ability of deep neural networks in learning the hidden and relevant features of the inputs [39]. Finally, We represent every received signal by two inputs, , carrying the real and imaginary components of . Therefore, the total number of inputs to the learning model is , as depicted in Fig. 4.
Normalizing the inputs of the neural network normally allows using higher learning rates and makes the model less affected by the initialization of the neural network weights and the outliers of the training samples
[40]. For our application, there are four main approaches in normalizing the model inputs: (i) percarrier perBS normalization, where we independently normalize every received signal of every carriers and BS, (ii) perBS normalization, where we apply the same normalization/scaling to all the carriers of the BS, but independently from the other BSs, (iii) persample normalization, where the inputs of every learning sample are subject to the same normalization/scaling, and (iv) perdataset normalization, where we only scale the whole dataset by a single factor.In our coordinated beamforming application, the correlation between the received signals at the same BS may carry important information that will be lost if a percarrier normalization is adopted. Similarly, the correlation between the signals received at different BSs from the same user may carry some information about the relative location and multipath patterns for this user and every BS. This information will be distorted when using a perBS normalization. Further, the correlation between the joint multipath patterns at the BSs for different user locations may carry relevant information, which will be lost when using a persample normalization. Therefore, it is intuitive to adopt a perdataset normalization in our coordinated beamforming application to avoid losing any information that could be useful for the learning model. This intuition is also confirmed by the simulation results in Section VI. In these simulations, we consider a simple perdataset normalization where all the inputs are divided by a constant scaler , defined as
(24) 
where denotes the absolute value of the omnireceived signal at the th BS and th subcarrier for the th learning sample.
Output representation and normalization: As shown in Section IV, separating the BS RF and cloud baseband beamforming design problems yields lowcomplexity yet highlyefficient systems, with achievable rates approaching the optimal bound in some important cases. With this motivation, we propose to have independent deep learning models for the BSs, where the objective of every model, , is to predict the best RF beamforming vector with the highest data rate for the th BS. Note that every model, , will still rely on the omnireceived sequences from the BSs to predict the beamforming vectors of BS , as shown in Fig. 4. Further, every deep learning model has outputs, each representing the predicted rate with one of the RF beamforming vectors.
In the online learning phase, explained in Section VB, a new training sample for the deep learning models is generated every beam coherence time, . This training sample for the th BS model consists of (i) the omnireceived sequences , which are the inputs to the deep learning model, and (ii) the achievable rates, , for the RF beamforming vectors, which represent the desired outputs from the model. Note that both the omnireceived sequences and the achievable rates are constructed during the uplink training phase, as described in Section VB. These training samples are used by the cloud to train the deep learning models of the BSs. For the training of the th BS model, , the desired outputs of every training sample are normalized as
(25) 
The objective of this persample normalization is to regularize the deep neural network and make sure it does not learn only from the samples with higher data rates (higher output values). This is particularly important for mmWave systems where some user locations have LOS links (with high data rates) while others experience nonLOS connections (with much lower data rates). In this case, if the training samples are not normalized, the neural network model may learn only from the LOS samples, as will be illustrated in Section VI.
Neural network architecture: The main objective of this paper is to develop an integrated communicationlearning coordinated beamforming approach for highlymobile mmWave applications. Optimizing the deep neural network model, though, is out of the scope of this paper, and is one of the important future research directions. In this paper, we adopt a simple neural network architecture based on fullyconnected layers. As shown in Fig. 4, the neural network architecture consists of fullyconnected layers, each with
nodes. The fullyconnected layers use rectifier linear units (ReLU) activations
[39]. Every fullyconnected layer is followed by a dropout layer to ensure the regularization and avoid the overfitting of the neural network [41]. The performance of the proposed deep learning coordinated beamforming solution with the adopted neural network architecture as well as comparisons with other network architectures will be discussed in Section VI.Loss function and learning model: The objective of the deep learning model is to predict the best RF beamforming vectors with the highest achievable rates for every BS. Therefore, we adopt a regression learning model in which the neural network of every model , is trained to make its outputs, , as close as possible to the desired normalized achievable rates, . Note that adopting a regression model enables the neural network to predict not only the best RF beamforming vector, but the second best, third best, etc. — or generally, the best RF beams. Formally, the neural network for every model
is trained to minimize the loss function,
, defined as(26) 
where is the meansquarederror between and , and denotes the set of all the parameters in the neural network. Note that the outputs of the learning model, , are functions of the network parameters and the model inputs . To simplify the notation, though, we dropped these dependencies from the symbol .
VD Effective Achievable Rate and Mobility Support
As shown in Section IV, the achievable rate with the baseline coordinated beamforming solution approaches the optimal bound in some special yet important cases. The challenge with the baseline solution, though, is the requirement of exhaustive beam training which consumes a lot of training resources and significantly reduces the effective achievable rate. For the deep learning coordinated beamforming solution, the learning model is trained to approach the achievable rate of the baseline solution, which is optimal in some cases. Further, it requires only two training resources for the omni pattern and predicted beam training, which makes its training overhead almost negligible. This means that the proposed deep learning coordinated beamforming solution, when efficiently trained, can approach the optimal effective achievable rate, , and support highlymobile mmWave applications, as will be shown in the following section.
Vi Simulation Results
In this section, we evaluate the performance of the proposed coordinated deeplearning beamforming solution, and illustrate its ability to support highlymobile mmWave applications. First, we present the considered simulation setup in Section VIA. Then, we show the capability of the proposed deep learning solution in predicting the beamforming directions and approach the optimal effective achievable rate in Section VIB. In Sections VIC  VID, we study the impact of the main communication and machine learning parameters on the system performance. Finally, Sections VIE  VIF investigate several important aspects of the integrated communication/learning beamforming system such as its ability to adapt with the environment, its sensitivity to BSs synchronization, and its performance with untrained scenarios.
Via Simulation Setup
This section describes in detail the various aspects of the considered simulation setup including the communication system/channel models, the machine learning model, and the simulation scenarios. While the coordinated beamforming strategies proposed in this paper are general for indoor/outdoor applications, we focus in these simulation results on the vehicular application, which is one important use case for 5G cellular systems [32, 42].
System Setup and Channel Generation: We adopt the mmWave system and channel models in Section II, where a number of BSs are simultaneously serving one mobile user over the GHz band. Since the proposed deeplearning coordinated beamforming approach relies on learning the correlation between the transceiver locations/environment geometry and the beamforming directions, it is important to generate realistic data for the channel parameters (AoAs/AoDs/pathloss/delay/etc.). With this motivation, our simulations use the commercial raytracing simulator, Wireless InSite [43], which is widely used in mmWave research [44, 33, 45], and is verified with channel measurements [46, 47]. In the following points, we summarize the environment/system setup and channel generation.

Environment setup: We consider the system model in Section II in a streetlevel environment, where BSs are installed on lamp posts to simultaneously serve one vehicular mobile user, as depicted in Fig. 5(a). The 4 lamp posts are located on the corners of a rectangle, with m distance between the lamp posts on each side of the street (along the yaxis), and m distance between the lamp posts across the street (along the xaxis). In the raytracing, we use ITU 60 GHz 3layer dielectric material for the buildings, ITU 60 GHz singlelayer dielectric for the ground, and ITU 60 GHz glass for the windows. This ensures that the important raytracing parameters, such as the reflection and penetration coefficients, accurately model the mmWave system operational frequency.

Base stations setup: Each BS is installed on one lamp post at height 6 m, and has a uniform planar array (UPA) facing the street, i.e., on the yz plane. Unless otherwise mentioned, the BS UPAs consist of 32 columns and 8 rows resulting in a total of antenna elements, and use dBm transmit power. Adopting the system model in Section II, the BSs are assumed to be connected with a central processing via errornegligible delaynegligible links. In practice, this can be realized using optical fiber links connecting the four BSs together, with one of them hosting the central processor.

mobile user setup: The vehicular mobile user has a single antenna that is deployed at a height of 2m. We show the car in Fig. 5(a) only for illustration. This car, though, is not modeled in the raytracing simulations, which only consider the mobile user antenna. At every beam coherence time, the location of the mobile user antenna is randomly selected from a uniform xy grid of candidate locations, as depicted in Fig. 5(b). The xy rectangular grid has dimensions m m with a resolution of m, i.e., a total of 240 thousand points. This xy grid shares the same center with the rectangle defined by the 4 BSs. During the uplink training, the MS is assumed to use dBm transmit power.

Raytracing based channel generation: In our simulations, we adopt the frequencyselective geometric channel model in Section IIB. For this model, the important question is how to generate the channel parameters, such as the AoAs, AoDs, path gains and delays of each ray. We normally resort to stochastic models in generating these parameters [48, 1, 49]. In this paper, though, the key idea is to leverage the deep neural network power in learning the mapping between the omnireceived multipath signatures and the beamforming directions. This implicitly relies on learning the underlying environment geometry and the interplay between this geometry and the transmitter/receiver locations. Therefore, it is crucial to generate realistic channel parameters that correspond to real environment geometry. This is the main motivation for using raytracing in generating the channel parameters.
In the Wireless InSite raytracing [43], we use the X3D model with Shooting and Bouncing Ray (SBR) tracing mode. In this mode, the simulator shoots hundreds of rays from the transmitters and select the ones that find paths to the receiver for which it generates the key parameters (AoAs/AoDs/etc.). Considering the raytracing channel parameters for the strongest 25 paths, which normally have power gap more than 20dB, we construct the channel matrix between each BS and mobile user using MATLAB, according to (4). The considered setup adopts an OFDM system of size . Note that for every candidate user location in the xy grid, we generate channel vectors which correspond to the channels between this user and the 4 BSs.
Coordinated Beamforming: In the simulation results, the beamforming vectors are constructed as described in Sections IVV. At every beam coherence time, a new user location is selected, and the channel vectors are constructed based on the parameters generated from the raytracing simulations as described earlier in this section. For the baseline coordinated beamforming, we first simulate the uplink beam training at each BS by calculating for all the beamforming vectors in the codebook . Then, the best RF beamforming vector for every BS is determined based on (17). Finally, the effective achievable rate is calculated according to (18). In these simulations, we consider an oversampled beamsteering codebook of beams, with denoting the number of columns and rows of the BSs UPAs, and defining the oversampling factors in the azimuth and elevation directions. The th beamforming vector in this codebook is expressed as , where is the UPA array steering vector with the quantized angles .
The simulation of the deeplearning coordinated beamforming approach is similar to the baseline coordinated beamforming with the following extra steps. First, at every beam coherence time, , i.e., a new user location, in addition to calculating for all the beams, we also calculate the omnireceived sequences in (21). To do that, we consider the signal received by only the first antenna element, which is equivalent to adopting a beamforming vector in (21). For the noise term in (21), we add random noise samples taken from with the noise power corresponding to GHz system bandwidth and dB noise figure. The omni received sequence from the BSs and the rate corresponds to every BF vector, calculated based on (22), form one data point for the machine learning model. By randomly picking user locations, we build an point dataset for the machine learning model. In the second phase of the deeplearning coordinated beamforming approach, we simulate the uplink training by only calculating the omnireceived sequence . We then use the machine learning model to predict the best RF beamforming vector for every BS n. Finally, the effective achievable rate is calculated using (23).
Machine Learning Model: We consider the deep learning model described in detail in Section VC. The neural network model of every BS has inputs, which are the the real and imaginary components of the omnireceived sequences of the BSs, and outputs, which represent the achievable rates of the RF candidate beamforming vectors. Unless otherwise mentioned, the neural network model has fully connected layers, each of nodes, i.e., with . The fullyconnected layers use ReLU activation units and every layer is followed by a dropout regulation layer of dropout rate . For training the model, we use a dataset with a maximum size of thousand samples and a batch size of . In the deep learning experimental work, we used the Keras libraries [50] with a TensorFlow [51] backend.
LOS and NLOS Scenarios: In order to evaluate the performance of our proposed deeplearning coordinated beamforming solution in rich mmWave environment with blockage, we consider both LOS and NLOS scenarios in the simulations. Earlier in this section, we described the LOS scenario, which is depicted in Fig. 5. The NLOS scenario is similar to the LOS one but with a large bus of dimensions 20 m x 5 m standing in front of BS 3, as shown in Fig. 6. This bus blocks the LOS path between BS 3 and most of the candidate user locations in the xy grid.
Next, we evaluate the performance of the proposed deep learning coordinated beamforming solution for various communication and machine learning parameters.
ViB Does the System Learn How to Beamform?
The proposed deeplearning coordinated beamforming solution relies on the ability of deep neural networks in learning the relation between the multipath signatures collected jointly at multiple BS locations and the RF beamforming vectors. The first question that we need to address then is whether these networks are successfully learning how to select the optimal RF beamforming vectors, with the optimality defined according to (17). To answer this question and to evaluate the quality of this learning, we plot the effective achievable rate of the proposed deeplearning coordinated beamforming for different training dataset sizes in Figures 7 and 8.
In Fig. 7, we consider the LOS scenario, described in Section VIA, where BSs, each with UPA are simultaneously serving one mobile user, moving with speed mph. The BSs use beamsteering codebook with oversampling factor of 2 at both the azimuth and elevation directions. For this scenario, we plot the effective achievable rate of the proposed deeplearning coordinated beamforming solution in Fig. 7 versus the size of the dataset used in training the neural network model. Recall that every point in the training dataset is collected in one beam coherence time, . This means that if the system spent time equals, for example, to in training its neural network model, then it will be able to predict the beamforming vectors that achieves the effective rate corresponding to the dataset size k samples in Fig. 7. This figure shows that the effective achievable rate of the proposed deeplearning coordinated beamforming approaches the optimal rate , defined in Lemma 1 with reasonable dataset sizes. This means that the neural network model is successfully predicting the best RF beamforming vector, out of 1024 candidate beams, for every BS using multipath signatures received with only a single antenna (or omnipattern) at every BS. This clearly illustrates the ability of the proposed deeplearning based solution in supporting highlymobile mmWave applications with negligible training overhead. Fig. 7 also shows that it is better to select the best beams predicted by the neural network and refine them through beam training, as described in Section VB. Finally, Fig. 7 illustrates that leveraging deep learning can achieve considerable data rate gains compared to the baseline coordinated beamforming solution.
In Fig. 8, we adopt the NLOS scenario described in Section VIA, where a large bus is standing in front of BS 3, as shown in Fig. 6. The system, channel, and machine learning models are identical to those adopted in Fig. 7. For this NLOS scenario, Fig. 8 compares between the effective achievable rate of (i) the developed deeplearning coordinated beamforming strategy with , (ii) the baseline coordinated beamforming, and (iii) the upper bound, , for different training dataset sizes. The result in this figure is very important as it shows that the deep learning model can learn not only LOS beamforming, but also predicting best NLOS beamforming vectors given the joint multipath signatures. Note that this is a key advantage of our proposed deep learning solution that relies on the multipath signature, not on the user location/coordinates, in predicting the beams. If the system relies only on the knowledge of the user location, it will not be able to efficiently predict the beamforming vectors in NLOS scenarios, as the same user location may correspond to different NLOS setups and, consequently, different beamforming vectors.
ViC Impact of Communication System Parameters
The main motivation for the deeplearning coordinated beamforming solution is supporting highlymobile applications in largearray mmWave systems. In achieving that, our proposed deep learning model makes beamforming predictions based on signals received with only omni or quasi omni antennas, i.e., with lowSNR. In this section, we evaluate the impact of the key system parameters, namely the user mobility, the number of BS antennas, and the uplink transmit power, on the performance of the developed deeplearning coordinated beamforming strategy.
Impact of User Speed and Number of BS antennas: In Fig. 9, we consider the LOS scenario, described in Section VIA, with 4 BSs serving one mobile user. Each BS is assumed to have a UPA with rows, columns, and is using a beamsteering codebook with oversampling factor of 2 in both the elevation and azimuth directions. In Fig. 9, we plot the effective achievable rate of the deeplearning coordinated beamforming solution, the baseline coordinated beamforming, and the upper bound for different number of BS antennas and user speeds. Recall that the number of beams in the beamsteering codebooks equals 4 (the overall azimuth/elevation oversampling factor) times the number of antennas. First, consider the baseline coordinated beamforming solution performance in Fig. 9. As more antennas are deployed at the BSs, the beamforming gain increases but the training overhead also increases, resulting in a tradeoff for the effective achievable rate in (18). This tradeoff defines an optimal number of BS antennas for every user speed (or equivalently beam coherence time), as shown in Fig. 9. It is important to note that the performance of the baseline coordinated beamforming solution degrades significantly with increasing the number of BS antennas or the user speed. This illustrates why traditional beamforming strategies are not capable of supporting highlymobile users in mmWave systems with large arrays.
In contrast, the deeplearning coordinated beamforming, which is trained with a dataset of size k samples, achieves almost the same performance of the upper bound for different values of user speeds and BS antennas. This is thanks to the negligible uplink training overhead using omni patterns. It is worth noting here that while larger arrays may require bigger datasets (longer time) for training the neural network model during the online learning phase, the uplink training overhead in the deep learning prediction phase does not depend on the number of antennas as it relies on omni or quasiomni patterns. Therefore, once the neural network model is trained, the deeplearning coordinated beamforming solution works efficiently with large antenna arrays. This is a key advantage of our developed deeplearning based solution over traditional mmWave channel training/estimation techniques such as analog beam training [52, 20] and compressive sensing [24, 25, 53].
Impact of Uplink Transmit Power and Omni Training Pattern:
An important aspect of the proposed deep learning coordinated beamforming solution is the use of only omni (or quasiomni) beam patterns at the BSs during the uplink training. This raises, though, questions on whether the received signals with omni reception, , which are the inputs to the neural network model, have sufficient SNR for the system operation, and whether the MS will need to use very high uplink transmit power to ensure enough receive SNR at the BSs. To answer these questions, we plot the effective achievable rates of the proposed deep learning solution, baseline solution, and optimal bound in Fig. 10 versus the uplink transmit power. We also assume that the downlink transmit power during data transmission by every BS equals the uplink transmit power from the MS. The rest of the communication system and machine learning parameters are similar to the setup in Fig. 9. As shown in Fig. 10, for low values of uplink transmit power, the performance of the deeplearning strategy is worse than the baseline solutions, as the SNR of the omnireceived sequences is low and the learning model is not able to learn and predict the right beamforming vectors. For reasonable uplink transmit power, though, dBm to dBm, the deeplearning coordinated beamforming achieves good gain over the baseline solution. This means that the receive SNR with omni patterns during uplink training is sufficient to draw a defining RF signature of the environment and efficiently train the neural network model.It is important to note here that the main reason why we need to use beamforming during mmWave beam training or channel estimation is to estimate the directional information at every BS, such as the angles of arrival and departure, which we do not need in the proposed deep learning coordinated beamforming system that relies on predicting this information via deep learning using the signals captured at multiple distributed BSs.
ViD Impact of Machine Learning Parameters
The primary objective of this paper is to motivate leveraging machine learning tools in highlymobile mmWave communication systems. Optimizing the machine learning model itself, though, is out of the scope of this paper, and is worthy for independent publications. In this section, we briefly highlight the impact of some machine learning parameters, such as the input/output normalization and the neural network architecture, on the system performance.
Impact of Input and Output Normalization: The proper normalization of the inputs and outputs of the neural network allows realizing efficient machine learning models with high learning rates, robustness against weight initialization biases, among other system gains. In Fig. 11, we plot the effective achievable rates for different input normalization strategies, namely perdataset, persample, perbasestation, and perelement normalization, which are explained in detail in Section VC. This figure considers the NLOS scenario, described in Section VIA, with BSs employing 168 UPAs and with a deeplearning model trained using a 20ksamples dataset. As shown in Fig. 11, the perdataset normalization achieves the highest effective achievable rate among the four candidate strategies. To understand the intuition behind this performance, it is important to note that the correlation among the received signals at the different subcarriers of each BS may carry useful information, such as the distance between the user and the BS. Similarly, the correlation between the received signals of the same user at the 4 BSs and the correlation between the received signals at different user locations may carry logical information that helps the neural network model in learning the mapping between the multipath signatures and the beamforming beams. The perdataset normalization is the only strategy, among the 4 candidates, that preserves all theses kinds of correlation. Therefore, it allows the machine learning model to leverage all the information carried by the training dataset.
In Fig. 11, we also plot the effective rates with and without perBS output normalization. The normalization strategy is explained in Section VC. Fig. 11 shows that normalizing the outputs of the training dataset is required to achieve good data rates. To justify this performance, we first emphasize that these results consider the NLOS scenario in Fig. 6. In this scenario, some achievable rates, (the outputs of the machine learning model), correspond to NLOS links while others are results of LOS links. The challenge here is that the achievable rates corresponding to NLOS links have much smaller values compared to those of LOS links. Without output normalization, the training of the neural network weights will be dominated by the LOSrelated outputs which have large differences between its candidate beams (the output bins). These weights will not be sensitive to the relatively small differences between the rates of the NLOSrelated outputs. In other words, the machine learning model will only learn how to beamform to the users with LOS links. This draws insights into the importance of normalizing the outputs of the neural network training dataset.
Impact of Network Architecture: In the simulation results of this paper, we adopt the fullyconnected neural network architecture in Fig. 4
. For the sake of motivating the future research into optimizing the machine learning model, we compare the effective achievable rates of the fullyconnected architecture and another architecture based on convolutional neural networks (CNN) in Fig.
12. This figure adopts the LOS scenario with BSs employing UPAs and steering codebooks with oversampling factor of 2 in the azimuth direction. The fullyconnected architecture consists of layers with 512 nodes per layer. For the CNNbased architecture, it first applies four 2D convolutional filters on two input channels representing the real and imaginary of the omnireceived sequences. A maxpooling layer is then added and followed by three fullyconnected layers with 512 nodes per layer. This results in a total of
754k parameters in the CNNbased architecture compared to 1048k parameters in the fullyconnected architecture. Despite its lower complexity compared to the fullyconnected architecture, the CNN architecture achieves almost the same effective spectral efficiency of the fullyconnected model, as shown in Fig. 12. One intuition for this efficient performance comes from the CNN dependence on extracting local information using smallsized filters. In our model, these filters may capture the correlation between the adjacent samples in the OFDM sequence, which helps extracting valuable information with lower complexity compared to the bruteforce approach in the fullyconnected model. This highlights the potential of exploring new neural network architectures for integrated learning/communication systems.ViE System Adaptability and Robustness
One main advantage of integrating machine learning in wireless communication is realizing robust systems that adapt efficiently to the highlymobile aspects of the environment. To examine this gain, we plot the effective achievable rates in Fig. 13 for an important setup where the environment changes multiple times as follows.

First, when the system started working, at dataset size equals 0 samples, the LOS scenario in Fig. 5 was considered where 4 BSs is serving a car moving alone in the street. The BSs employ UPAs and using beamsteering codebooks with oversampling factor of 2 in only the azimuth direction.

After some time, which is spent to build a dataset of size k samples, a large bus appeared suddenly and stopped in front of BS 3, as depicted in Fig. 5
. Since the deeplearning model was trained only for the LOS scenario before the bus arrives, the effective achievable rate of the deeplearning coordinated beamforming solution degraded significantly at the first moment of the bus arrival. This is clear in the effective rate transition at dataset size
k samples in Fig. 13. Assuming that the bus parked in front of BS 3 for some time, the deep learning model started learning this new NLOS scenario. In other words, the neural network weights that were initially adjusted for the LOS dataset are now being refined again based on the new NLOS training samples. 
After more time, which is spent to build an overall dataset of size k samples, the bus left. Interestingly, the performance of the proposed deep learning solution now did not degrade again, but rather did as well as the first stage (before the bus arrives). This is very important as it shows that the deep learning model has generalized its learning to both the LOS and NLOS scenarios, which is also confirmed by the performance of the deeplearning solution after the bus arrives again at the dataset size k samples.
The results in Fig. 13 show that the coordinated beamforming system became more robust over time, and is able to adapt and perform well at both the LOS and NLOS scenarios. More generally, this means that when we first deploy the deeplearning coordinated beamforming system in a new environment, it will experience many new scenarios, such as cars and pedestrian blocking the signals, trees growing, etc., for which the system was not trained. After some time, the model will generalize its learning to cover all these scenarios and develop into a robust and adaptable system.
ViF Does the System Require Phase Synchronization to Learn?
The machine learning model, in the proposed deeplearning beamforming solution, relies on the signals received jointly at multiple BSs. Therefore, the phase of these signals may intuitively carry useful information that helps the model in learning how to predict the beamforming for each multipath signature. Maintaining this phase information, though, is difficult in practice as it requires perfect synchronization of the terminal BSs oscillators. In this section, we are interested in evaluating the performance of the proposed deeplearning coordinated beamforming solution in a setting where we relax the phase synchronization requirements.
In Fig. 14, we consider the LOS scenario in Section VIA, and plot the effective achievable rates of the proposed deep learning coordinated beamforming solution under three different assumptions on the phase synchronization: (i) perfect phase synchronization where the clocks of 4 BSs are perfectly synchronized, (ii) no synchronization, where uniform random phase is added to the omni received signal at every BS , and (iii) received signal strength indicators (RSSI), where only the amplitude of the omni received sequence, , is fed to the neural network model. As shown in Fig. 14, the performance of the deeplearning coordinated beamforming with no phase synchronization approaches that with perfect phase synchronization as more time is spent in training the neural network (or equivalently large datasets are adopted). This result is very useful for practical implementations as it means that the phase synchronization may not be needed to learn coordinated beamforming if large enough datasets are adopted. Fig. 14 also illustrates that relying only on RSSI in deeplearning coordinated beamforming, which does not require any phase information, still achieves a reasonable gain over the baseline coordinated beamforming solution.
Finally, it is worth mentioning that while Fig. 14 shows that the machine learning model can learn well with no phase synchronization, both the baseline and the deeplearning coordinated beamforming solutions still need this synchronization in the downlink data transmission phase, as the signals from the 4 BSs need to add coherently at the mobile user antenna. This requirement though can be relaxed if the user is served with only one BS at a time. This way, the 4 BS coordinate the learning but only one of them beamform to the user at any given time. Clearly, these different approaches for coordinated beamforming have a tradeoff between implementation complexity and system performance (data rate, reliability, etc.). Investigating this tradeoff for practical systems is an interesting future research direction.
Vii Conclusion
In this paper, we developed an integrated machine learning and coordinated beamforming strategy that enables highlymobile applications in large antenna array mmWave systems. The key idea of the developed strategy is to leverage a deep learning model that learns the mapping from omnireceived uplink pilots and the beam training result. This is motivated by the intuition that the signal received at multiple distributed BSs renders an RF defining signature for the user location and its interaction with the surrounding environment. The proposed solution requires negligible training overhead and performs almost as good as the genieaided solution that perfectly knows the optimal beamforming vectors. Further, thanks to integrating deep learning with the coordinated transmission from multiple BSs, the developed solution ensures reliable coverage and low latency, resulting in a comprehensive framework to enable highlymobile mmWave applications. Extensive simulations, based on accurate raytracing, were performed to evaluate the proposed solution in various LOS and NLOS environment. These results indicated that the proposed solutions attains high data rate gains compared to coordinated beamforming strategies that do not leverage machine learning, especially in highmobility largearray scenarios. The results also illustrated that with sufficient learning time, the deep learning model efficiently adapts to changing environment, yielding a robust beamforming system. From a practical perspective, the results illustrated that phase synchronization among the coordinated BSs is not necessary for learning how to accurately predict the beamforming vectors. The results in this paper encourage several future research directions such as the extension to multiuser systems, the investigation of timevarying scenarios, and the development of more sophisticated machine learning models for mmWave beamforming.
References
 [1] R. W. Heath, N. GonzalezPrelcic, S. Rangan, W. Roh, and A. M. Sayeed, “An overview of signal processing techniques for millimeter wave MIMO systems,” IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 3, pp. 436–453, April 2016.
 [2] J. G. Andrews, T. Bai, M. N. Kulkarni, A. Alkhateeb, A. K. Gupta, and R. W. Heath, “Modeling and analyzing millimeter wave cellular systems,” IEEE Transactions on Communications, vol. 65, no. 1, pp. 403–430, Jan 2017.
 [3] T. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. Wong, J. Schulz, M. Samimi, and F. Gutierrez, “Millimeter wave mobile communications for 5G cellular: It will work!” IEEE Access, vol. 1, pp. 335–349, May 2013.
 [4] F. Boccardi, R. Heath, A. Lozano, T. Marzetta, and P. Popovski, “Five disruptive technology directions for 5G,” IEEE Communications Magazine, vol. 52, no. 2, pp. 74–80, Feb. 2014.
 [5] W. Roh, J.Y. Seol, J. Park, B. Lee, J. Lee, Y. Kim, J. Cho, K. Cheun, and F. Aryanfar, “Millimeterwave beamforming as an enabling technology for 5G cellular communications: theoretical feasibility and prototype results,” IEEE Communications Magazine, vol. 52, no. 2, pp. 106–113, February 2014.
 [6] S. Hur, S. Baek, B. Kim, Y. Chang, A. F. Molisch, T. S. Rappaport, K. Haneda, and J. Park, “Proposal on millimeterwave channel modeling for 5G cellular system,” IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 3, pp. 454–469, Apr. 2016.
 [7] IEEE 802.11ad, “IEEE 802.11ad standard draft D0.1.” [Online]. Available: www.ieee802.org/11/Reports/tgadupdate.htm
 [8] O. El Ayach, S. Rajagopal, S. AbuSurra, Z. Pi, and R. Heath, “Spatially sparse precoding in millimeter wave MIMO systems,” IEEE Transactions on Wireless Communications, vol. 13, no. 3, pp. 1499–1513, Mar. 2014.
 [9] A. Alkhateeb, J. Mo, N. GonzalezPrelcic, and R. Heath, “MIMO precoding and combining solutions for millimeterwave systems,” IEEE Communications Magazine,, vol. 52, no. 12, pp. 122–131, Dec. 2014.
 [10] A. F. Molisch, V. V. Ratnam, S. Han, Z. Li, S. L. H. Nguyen, L. Li, and K. Haneda, “Hybrid beamforming for massive MIMO: A survey,” IEEE Communications Magazine, vol. 55, no. 9, pp. 134–141, 2017.
 [11] T. Bai, A. Alkhateeb, and R. Heath, “Coverage and capacity of millimeterwave cellular networks,” IEEE Communications Magazine, vol. 52, no. 9, pp. 70–77, Sept. 2014.
 [12] S. Singh, M. Kulkarni, A. Ghosh, and J. Andrews, “Tractable model for rate in selfbackhauled millimeter wave cellular networks,” IEEE Journal on Selected Areas in Communications, vol. 33, no. 10, pp. 2196–2211, Oct. 2015.
 [13] Y. Zhu, Z. Zhang, Z. Marzi, C. Nelson, U. Madhow, B. Y. Zhao, and H. Zheng, “Demystifying 60 GHz outdoor picocells,” in Proc. of the 20th annual international conference on Mobile computing and networking. ACM, 2014, pp. 5–16.
 [14] M. Cudak, T. Kovarik, T. A. Thomas, A. Ghosh, Y. Kishiyama, and T. Nakamura, “Experimental mm wave 5G cellular system,” in Globecom Workshops (GC Wkshps), 2014. IEEE, 2014, pp. 377–381.
 [15] A. Ghosh, T. Thomas, M. Cudak, R. Ratasuk, P. Moorut, F. Vook, T. Rappaport, G. MacCartney, S. Sun, and S. Nie, “Millimeterwave enhanced local area systems: A highdatarate approach for future wireless networks,” IEEE Journal on Selected Areas in Communications, vol. 32, no. 6, pp. 1152–1163, June 2014.
 [16] W. Hong, K.H. Baek, Y. Lee, Y. Kim, and S.T. Ko, “Study and prototyping of practically largescale mmwave antenna systems for 5G cellular devices,” IEEE Communications Magazine, vol. 52, no. 9, pp. 63–69, Sept. 2014.
 [17] G. R. MacCartney, T. S. Rappaport, and A. Ghosh, “Base station diversity propagation measurements at 73 GHz millimeterwave for 5G coordinated multipoint (CoMP) analysis,” CoRR, vol. abs/1710.03626, 2017. [Online]. Available: http://arxiv.org/abs/1710.03626
 [18] D. Maamari, N. Devroye, and D. Tuninetti, “Coverage in mmwave cellular networks with base station cooperation,” IEEE Transactions on Wireless Communications, vol. 15, no. 4, pp. 2981–2994, April 2016.
 [19] A. K. Gupta, J. G. Andrews, and R. W. Heath, “Macrodiversity in cellular networks with random blockages,” IEEE Transactions on Wireless Communications, vol. 17, no. 2, pp. 996–1010, Feb 2018.
 [20] J. Wang, Z. Lan, C. Pyo, T. Baykas, C. Sum, M. Rahman, J. Gao, R. Funada, F. Kojima, H. Harada et al., “Beam codebook based beamforming protocol for multiGbps millimeterwave WPAN systems,” IEEE Journal on Selected Areas in Communications, vol. 27, no. 8, pp. 1390–1399, Nov. 2009.
 [21] S. Hur, T. Kim, D. Love, J. Krogmeier, T. Thomas, and A. Ghosh, “Millimeter wave beamforming for wireless backhaul and access in small cell networks,” IEEE Transactions on Communications, vol. 61, no. 10, pp. 4391–4403, Oct. 2013.
 [22] S. Noh, M. D. Zoltowski, and D. J. Love, “Multiresolution codebook and adaptive beamforming sequence design for millimeter wave beam alignment,” IEEE Transactions on Wireless Communications, vol. 16, no. 9, pp. 5689–5701, Sept 2017.
 [23] D. D. Donno, J. Palacios, and J. Widmer, “Millimeterwave beam training acceleration through lowcomplexity hybrid transceivers,” IEEE Transactions on Wireless Communications, vol. 16, no. 6, pp. 3646–3660, June 2017.
 [24] A. Alkhateeb, O. El Ayach, G. Leus, and R. Heath, “Channel estimation and hybrid precoding for millimeter wave cellular systems,” IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 5, pp. 831–846, Oct. 2014.
 [25] D. Ramasamy, S. Venkateswaran, and U. Madhow, “Compressive adaptation of large steerable arrays,” in Proc. of Information Theory and Applications Workshop (ITA), CA, 2012, pp. 234–239.
 [26] M. E. Rasekh, Z. Marzi, Y. Zhu, U. Madhow, and H. Zheng, “Noncoherent mmwave path tracking,” in Proc. of International Workshop on Mobile Computing Systems and Applications. NY, USA: ACM, 2017, pp. 13–18.
 [27] P. Schniter and A. Sayeed, “Channel estimation and precoder design for millimeter wave communications: The sparse way,” in the Asilomar Conference on Signals, Systems and Computers (ASILOMAR), Nov. 2014.
 [28] Y. Han and J. Lee, “Twostage compressed sensing for millimeter wave channel estimation,” in Proc. of IEEE International Symposium on Information Theory (ISIT), July 2016, pp. 860–864.
 [29] A. Abdelreheem, E. M. Mohamed, and H. Esmaiel, “Millimeter wave locationbased beamforming using compressive sensing,” in Proc. of International Conference on Microelectronics (ICM), Dec 2016, pp. 213–216.
 [30] G. C. Alexandropoulos, “Position aided beam alignment for millimeter wave backhaul systems with large phased arrays,” CoRR, vol. abs/1701.03291, 2017. [Online]. Available: http://arxiv.org/abs/1701.03291
 [31] N. Garcia, H. Wymeersch, E. G. Strm, and D. Slock, “Locationaided mmwave channel estimation for vehicular communication,” in Proc. of IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), July 2016, pp. 1–5.
 [32] J. Choi, V. Va, N. GonzalezPrelcic, R. Daniels, C. R. Bhat, and R. W. Heath, “Millimeterwave vehicular communication to support massive automotive sensing,” IEEE Communications Magazine, vol. 54, no. 12, pp. 160–167, December 2016.
 [33] V. Va, J. Choi, T. Shimizu, G. Bansal, and R. W. Heath, “Inverse multipath fingerprinting for millimeter wave V2I beam alignment,” IEEE Transactions on Vehicular Technology, vol. PP, no. 99, pp. 1–1, 2017.
 [34] M. Akdeniz, Y. Liu, M. Samimi, S. Sun, S. Rangan, T. Rappaport, and E. Erkip, “Millimeter wave channel modeling and cellular capacity evaluation,” IEEE Journal on Sel. Areas in Communications, vol. 32, no. 6, pp. 1164–1179, June 2014.
 [35] M. Samimi and T. Rappaport, “Ultrawideband statistical channel model for non line of sight millimeterwave urban channels,” in Proc. of the IEEE Global Communications Conference (GLOBECOM), Dec 2014, pp. 3483–3489.
 [36] V. Va, J. Choi, and R. W. Heath, “The impact of beamwidth on temporal channel variation in vehicular channels and its implications,” IEEE Transactions on Vehicular Technology, vol. 66, no. 6, pp. 5014–5029, June 2017.
 [37] P. Wang, Y. Li, L. Song, and B. Vucetic, “Multigigabit millimeter wave wireless communications for 5G: From fixed access to cellular networks,” IEEE Communications Magazine, vol. 53, no. 1, pp. 168–178, Jan. 2015.
 [38] A. Alkhateeb, G. Leus, and R. Heath, “Compressedsensing based multiuser millimeter wave systems: How many measurements are needed?” in in Proc. of the IEEE International Conf. on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, arXiv preprint arXiv:1505.00299, April 2015.
 [39] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press. [Online]. Available: http://www.deeplearningbook.org

[40]
S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in
Proc. of the International Conference on Machine Learning, vol. 37. PMLR, pp. 448–456.  [41] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
 [42] 5GInfrastructureAssociation (5G PPP), “5G automotive vision.” [Online]. Available: https://5gppp.eu/wpcontent/uploads/2014/02/5GPPPWhitePaperonAutomotiveVerticalSectors.pdf
 [43] Remcom, “Wireless insite.” [Online]. Available: http://www.remcom.com/wirelessinsite
 [44] X. Yang and Y. Lu, “Propagation characteristics of millimeter wave in circular tunnels,” in Proc. of International Symposium on Antennas, Propagation EM Theory, Oct 2006, pp. 1–5.
 [45] Q. Li, H. ShiraniMehr, T. Balercia, A. Papathanassiou, G. Wu, S. Sun, M. K. Samimi, and T. S. Rappaport, “Validation of a geometrybased statistical mmwave channel model using raytracing simulation,” in Proc. of IEEE Vehicular Technology Conference (VTC Spring), May 2015, pp. 1–5.
 [46] S. Wu, S. Hur, K. Whang, and M. Nekovee, “Intracluster characteristics of 28 ghz wireless channel in urban micro street canyon,” in 2016 IEEE Global Communications Conference (GLOBECOM), Dec 2016, pp. 1–6.
 [47] Q. Li, H. ShiraniMehr, T. Balercia, A. Papathanassiou, G. Wu, S. Sun, M. K. Samimi, and T. S. Rappaport, “Validation of a geometrybased statistical mmWave channel model using raytracing simulation,” in Proc. of IEEE Vehicular Technology Conference (VTC Spring), May 2015, pp. 1–5.
 [48] A. Alkhateeb and R. W. Heath, “Frequency selective hybrid precoding for limited feedback millimeter wave systems,” IEEE Transactions on Communications, vol. 64, no. 5, pp. 1801–1818, May 2016.
 [49] F. Sohrabi and W. Yu, “Hybrid digital and analog beamforming design for largescale MIMO systems,” in Proc. of the IEEE International Conf. on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, Apr. 2015.
 [50] F. BranchaudCharron, F. Rahman, and T. Lee, “Keras.” [Online]. Available: https://github.com/kerasteam
 [51] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: A system for largescale machine learning.” in OSDI, vol. 16, 2016, pp. 265–283.
 [52] S. Hur, T. Kim, D. J. Love, J. V. Krogmeier, T. A. Thomas, and A. Ghosh, “Multilevel millimeter wave beamforming for wireless backhaul,” in Proc. of 2011 IEEE GLOBECOM Workshops (GC Wkshps), Houston, TX, 2011, pp. 253–257.
 [53] J. Lee, G.T. Gil, and Y. Lee, “Exploiting spatial sparsity for estimating channels of hybrid MIMO systems in millimeter wave communications,” in Proc. of IEEE Global Communications Conference (GLOBECOM), Dec 2014, pp. 3326–3331.
 [54] K. Venugopal, A. Alkhateeb, N. González Prelcic, and R. W. Heath, “Channel Estimation for Hybrid ArchitectureBased Wideband Millimeter Wave Systems,” in IEEE Journal on Selected Areas in Communications, vol.35, pp. 1996–2009, 2017.