I Introduction
Due to the huge bandwidth, millimeter wave (mmWave) communications have been recognized as one of the key technologies to meet the demand for unprecedentedly high data rate transmission in the future mobile networks [1]. By equipping largescale antenna arrays, massive multipleinput multipleoutput (MIMO) can provide sufficiently large array gains for spatial multiplexing and beamforming [2]. MmWave massive MIMO communications can obtain the merits of both of them and thus have attracted significant interest [3]. However, the expensive and powerhungry hardwares used in mmWave bands become the main obstacle to equipping a dedicated radio frequency (RF) chain for each antenna. The mainstream solution for this problem is to use the twostage hybrid architecture, where a large number of antennas are connected to much fewer RF chains via phase shifters [4], [5].
Ia Related Work
For mmWave massive MIMO systems with the hybrid architecture, both the analog and digital processing should be carefully designed to achieve the comparable performance to the fullydigital systems. In [4], a lowcomplexity hybrid precoding scheme at the base station (BS) has been proposed for the massive MIMO downlink with singleantenna users. The hybrid architecture has been further introduced to the user side in [6], where hybrid block diagonalization (HBD) has been used for the analog and digital processing design. By exploiting the sparsity of mmWave channels, the hybrid precoding and combining at both the transmitter and receiver have been optimized in [7]
. The heuristic hybrid beamforming design in
[8] can approach the performance of the fullydigital architecture. The alternating minimization algorithms for both fullyconnected and subconnected hybrid architectures in [9] are with low complexity and limited performance loss. In [10], the hybrid processing along with channel estimation has been designed and analyzed for both the sparse and nonsparse channels. The uniform channel decomposition and nonlinear digital processing have been introduced in
[11] for hybrid beamforming design. In the existing works, the hybrid processing matrices at the transmitter and receiver are usually optimized separately due to the intractability of the joint optimization with nonconvex constraints, which makes the further performance improvement possible with joint optimization.Deep learning (DL) has achieved great success in various fields, including computer vision
[12], speech signal processing [13][14], and so on, due to its unique ability in extracting and learning inherent features. It has been recently introduced to wireless communications and shown quite powerful in the optimization of communication systems [15]–[18] and resource allocation [19]–[23]. In [17], DL has been successfully applied in pilotassisted signal detection for orthogonal frequency division multiplexing (OFDM) systems with nonideal transceiver and channel conditions. For wideband mmWave massive MIMO systems in timevarying channels, channel correlation has been exploited by deep convolutional neural network (CNN) in
[24] to improve the accuracy and accelerate the computation for the channel estimation. Deep neural network (DNN) has been utilized in [25]to model the mapping relationship among antennas for reliable channel estimation in massive MIMO systems with mixedresolution ADCs. An autoencoderlike DNN has been developed in
[26] to reduce the overhead for channel state information (CSI) feedback in the frequency duplex division massive MIMO system. In [27], CNN has been utilized in CSI compression and uncompression to significantly improve the recovery accuracy. By combining the residual network and CNN, an efficient channel quantization scheme has been proposed from the perspective of bitlevel in [28]. The DL based endtoend optimization has been developed in [29] and [30] by breaking the block structures at the transceiver. DL has been recently used to design the hybrid processing matrices for massive MIMO systems with various transceiver architectures [31]–[35]. In [31], the analog and digital precoder design has been modeled as the DNN mapping based on geometric mean decomposition. In
[32], DNN has been applied to design the analog precoder for massive multipleinput singleoutput (MISO) systems. Deep CNN has been applied to learn the phases of the analog precoder and combiner for mmWave massive MIMO systems in [33]. For the same system, channel estimation and analog processing have been jointly optimized by DL with reduced pilot overhead in [34]. In [35], deep CNN along with an equivalent channel hybrid precoding algorithm have been proposed to design the hybrid processing matrices.IB Motivation and Contribution
The research on the DL based hybrid processing for mmWave massive MIMO systems is still in the exploratory stage and has many open issues. The existing works have applied DL to design the analog precoder [32], the analog combiner [35], the analog precoder and combiner [33], [34], and the analog and digital precoders [31]. Currently, only partial hybrid processing is designed by DL for the mmWave transceiver. In addition, conventional hybrid processing schemes are usually used to generate label matrices for the DNN to approximate, which limits the performance of the DL based approaches. The problems in the existing works motivate us to propose a general DL based joint hybrid processing framework (DLJHPF) with the following two unique features:

[]

The framework jointly optimizes the analog and digital processing matrices at both the transmitter and receiver in an endtoend manner without predesigned label matrices. By doing this, it can be applied to various types of mmWave transceiver architectures and will have the potential to break through the performance of the existing schemes.
The main contributions of this paper are summarized as follows.

[]

We model the joint analog and digital processing design for the transceiver as a DL based framework, which consists of the NN based hybrid processing designer, signal flow simulator, and NN based signal demodulator. For the sake of practical implementation, it does not break the original block structures at the transceiver but still allows the backpropagation (BP) based endtoend optimization by minimizing the crossentropy loss between recovered and original bits. The trainability of DLJHPF is proved theoretically.

We extend the proposed framework to OFDM systems by simply modifying the structure of the training data. The extension does not complicate the framework architecture and guarantees the relatively short training time even if the number of subcarriers is large.

We verify the effectiveness of the proposed framework by numerical results based on the 3rd Generation Partnership Project (3GPP) channel model that can well depict the real channel environment. The proposed DLJHPF achieves remarkable improvement in biterror rate (BER) performance even with mismatched CSI and channel scenarios. Thanks to the careful design, DLJHPF reduces the runtime significantly by sufficiently exploiting the parallel computing and thus is more suitable for rapidly varying mmWave channels.
The rest of the paper is organized as follows. Section II describes the channel model and signal transmission process for the considered mmWave massive MIMO system. The proposed DLJHPF is elaborated in Section III. Simulation results are provided in Section IV to verify the effectiveness of the proposed framework and finally Section V gives concluding remarks.
Notations
: In this paper, we use upper and lower case boldface letters to denote matrices and vectors, respectively.
, , , and represent the Frobenius norm, transpose, conjugate transpose, and expectation, respectively.represents circular symmetric complex Gaussian distribution with mean
and variance
. and denote the th element of matrix and the th element of vector , respectively. denotes the amplitude of a complex number.Ii System Model
As shown in Fig. 1, we consider a pointtopoint massive MIMO systems working at mmWave bands, where the transmitter and the receiver are with and antennas, respectively. To reduce the hardware cost and power consumption, and RF chains are used at the transmitter and the receiver, respectively, and are connected to the largescale antennas via phase shifters.
Iia Channel Model
Due to the sparse scattering property, the SalehValenzuela channel model has been used to well depict the mmWave propagation environment, where the scattering of multiple rays forms several clusters. According to [7], the channel matrix between the receiver and the transmitter can be represented as
(1) 
where and denote the number of scattering clusters and the number of rays in each cluster, respectively, is the propagation gain of the th path in the th cluster with being the average power gain, and are the azimuth angles of arrival and departure (AoA/AoD) at the receiver and the transmitter, respectively, of the th path in the th cluster.^{1}^{1}1The path gain is the fast fading and varies in the time scale of channel coherence interval. Other parameters, , , , , are slow fading and may be unchanged in a large time scale compared to . The Doppler spread determines how often these channel parameters change. For a uniform linear array with antenna elements and an azimuth angle of , the response vector can be expressed as
(2) 
where and denote the distance between the adjacent antennas and carrier wavelength, respectively.
In the above channel model, we assume the transmitted signal is with narrowband and therefore, channel matrix is independent of frequency. For wideband transmission, OFDM is used to convert a frequencyselective channel into multiple flat fading channels and the corresponding channel matrices will be different at different subcarriers. Accordingly, the design of DLJHPF in Section III will start at the narrowband systems and is then extended to the wideband OFDM systems.
IiB Signal Transmission
The transmitter sends parallel data streams to the receiver through the wireless channel. The bits of each data stream are first mapped to the symbol by the ary modulation. The symbol vector intended for the receiver, with , is successively processed by the digital precoder, , at the baseband and the analog precoder, , through the phase shifters, yielding the transmitted signal
(3) 
where denotes the transmit power. represents the phaseonly modulation by the phase shifters and thus has the constraint of , . is normalized as to satisfy the total power constraint at the transmitter. Then the received signal at the receiver is given by
(4) 
where is additive white Gaussian noise (AWGN) with elements.
The received signal is then processed by the hybrid architecture at the receiver as
(5) 
where and represent the analog combiner and digital combiner, respectively. A hardware constraint is imposed on such that , similar to . Then the detected signal vector, , is demodulated to recover the original bits of data streams.
Since the performance of the digital communication system is ultimately determined by BER, we aim to jointly design , , , and to minimize the BER between the original and demodulated bits, that is
(6)  
(7)  
(8)  
(9) 
The BER in (6) is a complicated nonlinear function of , , , and without closedform expression and the constraints in (7) and (8) are nonconvex, which make this optimization problem intractable to be solved by the traditional approaches. DL is a potential solution by using the BP algorithm and thus we develop DLJHPF to address this problem.
Iii Proposed DLJHPF
In this section, we first briefly review the existing work on the DNN based endtoend communications. Then we propose DLJHPF, where the framework is first described, followed by the details of training, deployment, and testing along with the corresponding complexity analysis. Finally, we extend the framework to OFDM systems over wideband mmWave channels.
Iiia DNN based EndtoEnd Communications
Prior works have shown that DNN based endtoend optimization is an efficient tool to minimize BER. The BP algorithm makes the DNN based endtoend communications over the air possible so long as the optimized performance metric is differentiable [15], [29], [30]. For the DNN based endtoend communication system, the modules at the transmitter and the receiver are replaced by two DNNs, respectively. Specifically, the DNN at the transmitter encodes the original symbols into the transmitted signal and the one at the receiver recovers the original symbols from the output of the wireless channel. In the training stage, the error between the original and recovered symbols is computed and the weights of the two DNNs are adjusted iteratively based on the error gradient propagated from the output layer of the DNN at the receiver to optimize the recovery accuracy.
In this paper, we focus on the DL based joint analog and digital processing design for the transceiver in mmWave massive MIMO systems. Then, the existing DNN based endtoend communication is not suitable for this task since it integrates the modules of the transceiver into two DNNs and thus cannot meet the hardware and power constraints in practical implementation. To address this challenge, we design DLJHPF in the following.
IiiB Framework Description
As shown in Fig. 2, the proposed DLJHPF consists of three parts: hybrid processing designer, signal flow simulator, and NN demodulator, which are elaborated as follows.
Hybrid processing designer: It plays the role of outputting the hybrid processing matrices for the transceiver by using NNs based on the channel matrix. It includes six fullyconnected NNs and is used to generate the analog and digital processing matrices for the transmitter and the receiver based on the channel matrix, . Specifically, is first converted to a realvalued vector.^{2}^{2}2In Fig. 2, only the main process of the framework is shown while the matrix and vector reshaping process is omitted. Then it is input into two NNs, called precoder phase NN (PPNN) and combiner phase NN (CPNN), to generate the corresponding phases, and , respectively, for phase shifters. With and , two complexvalued vectors with constant amplitude elements are generated as
(10) 
(11) 
based on which, and are given by
(12) 
(13) 
where denotes the operation reshaping a vector to a matrix. Then, and along with are used to generate a lowdimensional equivalent channel, i.e.,
(14) 
is converted to a realvalued vector before it is input into four parallel NNs. The first two NNs, corresponding to the real part digital combiner NN (ReDCNN) and the imaginary part digital combiner NN (ImDCNN), output two vectors, , , respectively. Then can be obtained as
(15) 
Another two NNs, corresponding to the real part digital precoder NN (ReDPNN) and the imaginary part digital precoder NN (ImDPNN), output two vectors, , , respectively. Then the unnormalized digital precoder is given by
(16) 
The following normalization utilizes and in (12) to output the final digital precoder as
(17) 
Signal flow simulator: In the training stage, it simulates the process from the original bits, , to the detected signal, , over the channel, , with AWGN, , where with the size of , , and are generated in the simulation environment. It bridges the back propagation of the error gradient from NN demodulator to hybrid processing designer as we will elaborate in Section III.C. In the deployment and testing stage, the signal flow simulator is replaced by the actual transceiver and the actual wireless fading channel. In these two stages, the analog and digital processing matrices at the transceiver are provided by the hybrid processing designer based on the simulated or actual .
NN demodulator: It is a fullyconnected NN, which receives the detected signal, , from the signal flow simulator (in the training stage) or the actual receiver (in the testing stage) and outputs recovered bits with each element lies in the interval . is then reshaped to with the same size as .
Remark 1. The learning of hybrid processing matrices, , , , and , in DLJHPF is embedded into the signal transmission and demodulation process instead of approximating predesigned label matrices. All NNs are optimized jointly sharing the mapping principle from at the transmitter to at the receiver that resembles an autoencoder. By minimizing the error between and , each NN in hybrid processing designer can learn to output the appropriate vectors with specific meaning implicitly, i.e., phases of phase shifters and real and imaginary parts of the digital precoder and combiner. By doing this, DLJHPF will have the potential to break through the performance of the existing schemes.
IiiC Framework Training
The goal of offline training is to determine the weights of the NNs in hybrid processing designer and NN demodulator based on the training samples with the input tuple and the label , where is generated by certain channel model and is generated according to the distribution. By minimizing the endtoend error between the original bits, , and the recovered bits, , the weights of each NN in DLJHPF are adjusted iteratively and the training procedure is elaborated as follows.
The proposed DLJHPF is actually an integrated DNN consisting of neuron layers and custom layers. The training model in Fig.
3 demonstrates the detailed training process of the framework. For each training sample,is converted into a realvalued vector by matrixtovector reshaping and real and imaginary parts stacking, which is input into PPNN and CPNN consisting of dense and batch normalization (BN) layers to generate the corresponding phases,
and , respectively. Then (10) and (11) are executed by the same custom layer. Afterwards, the output vectors are reshaped according to (12) and (13) to generate and , respectively. Next, (14) is executed by a custom layer to generate , followed by matrixtovector reshaping and real and imaginary parts stacking. This vector is input into four NNs consisting of dense and BN layers, i.e., ReDCNN, ImDCNN, ReDPNN, and ImDPNN, respectively. The output vectors of the former two NNs are used to generate through real and imaginary parts combining and vectortomatrix reshaping as (15). Using the same operation, the output vectors of the latter two NNs are used to generate as (16). After obtaining , a custom layer is added to perform the normalization in (17) to generate . Then (5) is executed through a custom layer by using the input tuple and the generated , , , and to yield the detected signal, . After real and imaginary stacking, is converted to a realvalued vector and input into the NN demodulator consisting of dense and BN layers to output the recovered bits, , which is then reshaped to . The binary crossentropy (BCE) loss between and is calculated as(18)  
where denotes the number of training samples, superscript is added to indicate the index of the training sample, and is expressed as the function of the parameter set of all NNs in DLJHPF, i.e., .
Recall the optimization problem in (6), the BER over the training set can be written as
(19) 
where is the binary demodulated bit matrix with for and otherwise. With fixed, minimizing in (18) with respect to yields , which also minimizes in (IIIC). Therefore, DLJHPF can directly minimize the BER over the training set by minimizing the BCE loss and the feasibility is guaranteed by the following theorem.
Theorem 1. The proposed DLJHPF is trainable and can minimize the BCE loss through BP algorithm.
Proof:
Considering the minibatch training, the BCE loss over a batch is written as
(20)  
where denotes the batch size. Then will be updated
times in each epoch.
To prove Theorem 1, we need to show that is differentiable with respect to each parameter in . According to [36], the outputs are differentiable with respect to the corresponding weights and inputs for each NN in DLJHPF. Since DLJHPF can be viewed as an integrated DNN consisting of neuron layers and custom layers, the proof can be further simplified to prove that
is differentiable with respect to the outputs of each NN due to chain rule. In the following, we prove the differentiability of
with respect to the outputs of each NN by incorporating the custom layers.NN demodulator: From (20), is differentiable with respect to .
Re/ImDCNN: As mentioned in Section III.B, and are the outputs of ReDCNN and ImDCNN, respectively. Without loss of generality, we will prove that is differentiable with respect to and . According to (5) and (15), is the function of and , that is
(21)  
where and denotes the component of independent of and with the subscripts ‘re’ and ‘im’ indicating the real and imaginary parts, respectively. Since and are a part of inputs of NN demodulator, is differentiable with respect to and . Then we have
(22)  
(23)  
Re/ImDPNN: Since and are the outputs of ReDPNN and ImDPNN, respectively, we also aim to prove that is differentiable with respect to and . Considering the normalization in (17), we first calculate the derivatives of with respect to the real and imaginary parts of , i.e., and , which can be obtained similarly to (22) and (23). According to (17), we have
(24) 
(25) 
where with and independent of and . Then we can find that and are differentiable with respect to and , which leads to
PPNN: We still aim to prove that is differentiable with respect to that is one of the output of PPNN and generates and as
(28)  
From (5) and (14), and influence the values of and . According to the previous proof, is differentiable with respect to the real and imaginary parts of each element in and . , , , and , are also differentiable with respect to and . Resorting to chain rule, we have
By considering (28)(IIIC), we arrive at
(31)  
CPNN: The proof is similar to that of PPNN and thus is omitted for simplicity.
Now we have shown is differentiable with respect to each parameter in , which completes the proof.
It can be seen that the proposed DLJHPF is abstracted into an integrated DNN, where the hybrid processing matrices, , , , and , are essentially the trainable weights therein. From the proof of Theorem 1, each weight of this integrated DNN can be optimized iteratively through BP algorithm by minimizing the BCE loss. Therefore, the optimal precoding and combining matrices on training set are obtained.
For the NNs in Fig. 3
, each dense layer is with rectified linear unit (ReLU) activation function and followed by a BN layer to avoid gradient diffusion and overfitting. The number of dense layers and the number of neurons in each dense layer need to be adjusted according to the input and output dimensions. Since the outputs of the NNs will be used for hybrid processing at the transmitter and the reciever, the activation functions of the output layers should be carefully designed and are elaborated as follows.
PPNN and CPNN: The two NNs generate the phases for and , respectively. Since (10) and (11) are periodic functions, ReLU activation function is used in the output layer to provide the unbiased output for all possible phases. We may also use Sigmoid or hyperbolic tangent as the activation function, after which the outputs are multiplied by or to obtain the final phases with the range of or . According to the simulation trails, ReLU and hyperbolic tangent achieve almost the same performance while Sigmoid performs worse. Therefore, ReLU is preferable since it is simple and free of the operation of exponential functions.
Re/ImDPNN and Re/ImDCNN: The four NNs generate the real and imaginary parts for and , respectively. Since can be normalized by (17) while has no constraint, the output layers do not apply any activation function to impose constraints and directly output the values that are input into the neurons.
NN demodulator: This NN approximates the original bits, , based on . The approximation for each element in is a binary classification and thus the Sigmoid activation function is used for the output layer of the NN demodulator.
IiiD Deployment and Testing
In this subsection, we elaborate the deployment and testing of the trained DLJHPF for practical implementation, where is assumed to be available at both the transmitter and the receiver.^{3}^{3}3Although only the estimated channel is available in practical implementation, it has been shown by the simulation results that the relatively accurate channel estimate hardly causes performance loss and is almost equivalent to .
The practical deployment of DLJHPF includes the following three parts:
Deployment of hybrid processing designer: PPNN and CPNN will be deployed together at both the transmitter and the receiver to output the analog processing matrices, and , based on which the equivalent channel, , can be generated via (14). ReDPNN and ImDPNN are equipped at the transmitter to generate the digital precoder, , while ReDCNN and ImDCNN are equipped at the receiver to generate the digital combiner, , both based on .
Deployment of signal flow simulator: It is only used for the training stage and will be replaced by the actual transceiver and wireless fading channel in the deployment and testing stage.
Deployment of NN demodulator: It will be deployed at the receiver to output the recovered bits, , based on the detected signal, , after compensating the impact of the fading channel.
When testing the trained DLJHPF in real world, the channel may change rapidly due to the relative motion of the transceiver and scatterers, in which case DLJHPF will be faced new propagation scenarios with different channel statistics from the training stage. This channel scenario discrepancy poses a high requirement on the robustness of DLJHPF. Fortunately, the offline trained framework in Section III.C is quite robust to the new channel scenarios that are not observed before as shown from our simulation results (Figs. 7 and 9). The further online finetuning may only provide marginal performance improvement but requires a relatively large overhead and needs to be performed frequently in the rapidly changed channel scenario. In addition, only the NNs at the receiver can be finetuned and thus the performance after finetuning will still have an intrinsic loss compared to the endtoend training in Section III.C. To sum up, the proposed framework can cope with the mismatch of the channel scenario without relying on the finetuning in most cases.
IiiE Complexity Analysis
In this subsection, we analyze the computational complexity of the proposed DLJHPF in testing stage by using the metric of required number of floating point operations (FLOPs). According to Fig. 3, the total required FLOPs of all neural layers in DLJHPF is given by
(32) 
where denotes the set including all NNs in DLJHPF, and represent the number of neural layers and the number of neurons of the th neural layer of the NN .
In addition, the complexity of matrix multiplications in the framework is given by
(33) 
Then, the total complexity of the proposed DLJHPF can be expressed as
(34) 
It is noted that the NNs can be run efficiently via parallel computing on the graphic processing unit (GPU) and the simple matrix multiplications only cause negligible computational load for the central processing unit (CPU) compared with the existing schemes. Therefore, the proposed DLJHPF is with low complexity and consumes the very limited runtime.
IiiF Extension to OFDM Systems
In this subsection, we extend the proposed DLJHPF to the wideband OFDM systems. Two key issues need to be considered for the extension:

[]

In the OFDM systems, the digital precoder and combiner can be designed independently for different subcarriers while the analog precoder and combiner must be shared by all subcarriers. It is critical to design the unified analog precoder and combiner performing well for all subcarriers.

It is important to maintain the relatively small size, i.e., the number of hidden layers and the number of neurons in each layer in the NNs, and short training time for DLJHPF when the number of subcarriers is large.
In the following, we study how to address the two issues when extending DLJHPF to the OFDM systems.
According to [24], the channel matrix between the receiver and the transmitter of the th subcarrier can be expressed as
(35) 
where , , , and denote the delay of the th cluster, the sampling rate, and the number of OFDM subcarriers, respectively. The signal transmission model in (5) becomes subcarrier dependent and the detected signal of the th subcarrier is given by^{4}^{4}4Although and are also different for different subcarriers, they are independent of the channel and thus the index in them is omitted.
(36)  
In the following, we propose a simple method to design the structure of training data so that the DLJHPF in Section III.C can be flexibly extended to OFDM systems without changing the framework architecture. That is, both the framework size and training time will not be increased. The process of training and testing is detailed as follows.
Training: Compared to the training sample with the input tuple in Section III.C, we modify the input tuple as
Comments
There are no comments yet.