I Introduction
Multipleantenna technology, also known as multipleinput multipleoutput (MIMO), is one of the most important techniques for advanced wireless communications systems. It has already been incorporated into many wireless standards, e.g., 802.11n/ac [1] and LTE 4G [2]. It has also been shown theoretically that MIMO can increase spectrum efficiency linearly with the numbers of transmit and receive antennas [3]. Of much interest are lowcomplexity MIMO functional units that have good performance.
A MIMO transmitter transmits multiple data streams, one on each transmit antenna. A MIMO receiver receives a multiplexed copy of the multiple data streams plus noise on each receive antenna. A MIMO detector demultiplexes and decodes the multiplexed data on all the receive antennas into the originally transmitted multiple data streams plus noise and interference.
To achieve nearcapacity performance, advanced channel coding schemes, such as LDPC and polar codes, have been suggested for 5G systems [4, 5]. These channel codes protect the data streams against channel fading, interference, and noise. The output of a MIMO detector consists of a noisy version of the codeword transmitted by the transmitter. The function of channel decoding is to map the noisy codeword to the original information bits at the transmitter.
For optimal MIMO decoding, MIMO detection and channel decoding need to be performed in a joint manner. The conventional MIMO decoding schemes all use a modelbased approach. However, due to the complex MIMO signal model, the optimal solution to the joint MIMO detection and channel decoding problem (i.e., the maximum likelihood decoding of the transmitted codewords from the received MIMO signals) is computationally infeasible.
As a practical measure, the current modelbased MIMO receivers all use suboptimal MIMO decoding methods with affordable computational complexities. For example, instead of joint MIMO detection and channel decoding, [6, 7, 8] proposed to perform MIMO detection and channel decoding sequentially and separately, where MIMO detection is realized by linear equalizations with zero forcing (ZF) or minimum mean square error (MMSE) criteria. By contrast, [9, 10, 11] proposed to perform MIMO decoding and channel decoding iteratively with soft information exchanges between the two components. Thus, MIMO detection and channel decoding are performed in a joint manner. However, to contain complexity, the original MIMO signal model has been relaxed and replaced by an approximate model (i.e., it separately models the MIMO signal and the channel code). As a result, the solutions are still suboptimal. This leaves a gap for further performance improvement with better MIMO decoder designs.
To narrow the performance gap, this work applies the latest advances in deep learning for the design of MIMO receivers. In particular, we leverage deep neural networks (DNN) with supervised training to solve the joint MIMO detection and channel decoding problem. We show that DNN can be trained to give much better decoding performance than conventional MIMO receivers do. Our simulations show that a DNN implementation consisting of seven hidden layers can outperform conventional modelbased linear or iterative receivers.
Ia Related Work
Many MIMO detection schemes have been proposed [12]. Linear MIMO detection can first be used to cancel multipleantenna interference with low complexities; after that channel decoding is performed [6, 7, 8]. In these schemes, linear MIMO detection and channel decoding operate in a sequential manner. Since linear MIMO detection introduces noise amplification and correlation, such sequential linear MIMO detection and channel decoding schemes typically result in large performance loss due to the mismatch between the noise models at the output of the MIMO detector and the input of the channel decoder.
To enhance the performance of MIMO detection, nonlinear MIMO detectors have also been proposed, e.g., MIMO detectors based on sphere decoding [13, 14, 15], semidefinite relaxation [16, 17], and lattice reduction [18, 15]
. Unfortunately, these nonlinear MIMO detectors can only output hard estimates of channel symbols, making them incompatible with modern channel decoders that require soft input to achieve superior decoding performance.
Sphere decoding and list decoding algorithms were used for soft MIMO detection [19, 9, 10, 11] that produces soft output. This soft information can then be fed to a channel decoding. Moreover, information exchange can be performed iteratively between soft MIMO detection and channel decoding to improve the overall performance of MIMO decoding. Although these iterative MIMO decoding schemes have better performance than the sequential schemes, their solutions are still approximate and suboptimal, due to the mismatch between the noise model of the soft output of the MIMO detector and the assumed noise model at the input of channel decoder. Furthermore, iterative information exchange introduces large decoding latencies.
Unlike the above modelbased approaches, [20] proposed a deep learning approach for MIMO detection. Specifically, the method approximates MIMO detection using deep neural networks (DNN). The method progressively improves the approximation by adjusting the weights of a DNN based on a series of training MIMO signals. Compared with modelbased MIMO detection, deeplearning MIMO detection achieves similar detection accuracies with faster detection speed. However, this deeplearning MIMO detection scheme can only perform hard MIMO detection and cannot be combined with a soft channel decoding scheme.
DNN is used to perform channel decoding for the first time in [21], followed by further work in [22, 23]. It was shown that DNN channel decoding can approach the MAP performance with lower decoding latency than traditional channel decoding. Work [24] employed a neural network constructed by unfolding the factor graph of linear codes to improve the performance of belief propagation decoding when the factor graph of the linear codes contains many samll loops. Work [25] investigated DNNbased joint equalization and channel decoding problem for nonMIMO systems. A survey on the applications of deep learning to wireless systems can be found in [26].
The remainder of this paper is organized as follows. Section II presents the system model of MIMO systems. Section III reviews the existing modelbased MIMO receivers. Section IV presents our deep learning MIMO receiver. Section V provides the simulation results. Finally, Section V concludes the paper.
Ii System Model
This section presents the system model of MIMO systems and the format of the received MIMO signals.
Consider a MIMO system where the transmitter is equipped with antennas and the receiver is equipped with antennas. The channel between each transmitreceive antenna pair is assumed to incur frequencyflat fading and the channel state remains constant within one transmitted packet. We assume and parallel data streams are transmitted, one on each transmit antenna.
Figure 1 shows the block diagram of the MIMO transmitter. At the transmitter side, a vector of
information bits, , is first channelencoded into a codeword vector of length , where is the code rate. The valid set of codewords is denoted by and thus . The coded bits in vector are modulated to a vector of complex data symbols, , where is the number of code bits per complex data symbol. The modulation constellation is scaled so that the modulated symbols in have unit average power. Through serialtoparallel conversion, the vector is partitioned into consecutive data vectors of length , , i.e., we have . Then, pilot vectors of length , , are prepended to the data vectors to form an signal matrix , where is the data matrix that contains the data vectors, and is the pilot matrix that contains the pilot vectors. We assume to facilitate the channel estimation [27]. The signal matrix represents one transmitted packet. The symbols of the th column vector in the signal matrix are simultaneous transmitted on the transmit antennas in the th time slot.At the receiver side, the received signals are written into an matrix, , where the th vector contains the received signals on the receive antennas in the th time slot. The received signal matrix can be written as
(1) 
where is an complex channel matrix with zeromean and
variance independent complex Gaussian entries, and
is the additive white Gaussian noise (AWGN) matrix that has zeromean and unitvariance independent complex Gaussian entries. We also divide the received signal matrix and the AWGN matrix into two subparts: , , where is the matrix that contains the received signal vectors for the transmitted pilot vectors, is the matrix that contains the received signal vector for the transmitted data vectors, and , , are the matrices containing the noise components in , , respectively. The aim of the MIMO receiver is to decode the transmitted information bits in from the received signal matrix .For comparison with our proposed MIMO receiver, in Section 3 we review some conventional modelbased MIMO receivers.
Iii Modelbased MIMO Receivers
Traditional MIMO receivers have been extensively studied in the literature and implemented in real systems. This section gives a brief overview of these MIMO receivers.
A symbolwise optimal MIMO receiver decodes each information bit, , from the received signal matrix
by minimizing the symbol error probability or equivalently maximizing the
a posteriori probability (APP):(2) 
where denotes the estimate of the information bit , and . The problem as expressed in (2) is in fact a joint MIMO detection and channel decoding problem, since data symbol detection and the channel decoding are implicitly performed in (2). We point out that joint MIMO detection and channel decoding as in (2) require the knowledge of the channel matrix . In practice, the channel matrix is typically estimated from the received pilot signals , e.g., the least square (LS) estimate of the channel matrix is given by: [27]; then, the channel matrix estimate is substituted back to (2) to replace the real channel matrix .
Even with the above approximation which replaces by , the exact computation of APP, , is difficult and highly complex. The computation difficulty is due to: i) the correlation among the data symbols introduced by channel encoding; ii) the parallel signal interference caused by the MIMO channel. Therefore, suboptimal MIMO detection and channel decoding schemes with manageable implementation complexities are typically used in practice. We overview two suboptimal schemes in the following.
Iiia Linear MIMO Receivers
One suboptimal MIMO detection and channel decoding approach is to cancel the parallel signal interference with a linear MIMO detection first and then perform channel decoding next. We refer to this approach as linear MIMO receivers. For example, the zeroforcing (ZF) detection [6] removes the interference by
(3) 
where is the postcancellation signals and is the postcancellation noise. Since parallel signal interference is already removed in (3), the postcancellation signals, , can be fed to a traditional channel decoder to recover data symbols. Figure 2 shows the block diagram for this linear MIMO receiver.
There is no loss of information in (3) since one can get back from . The suboptimality in the linear MIMO decoding arises from the fact that the traditional channel decoder assumes the transformed noise is white, but it is actually not after the transformation in (3). Although the complexity of this linear MIMO receiver is low, its performance is far from optimal.
IiiB Iterative MIMO Receivers
The second MIMO detection and channel decoding approach performs iterative softin softout MIMO detection and channel decoding. Using , and the prior information about the data symbols, a soft MIMO detector computes the extrinsic information about the data symbols [9] and delivers the soft information to a soft channel decoder. The soft channel decoder then computes the new extrinsic information about the data symbols and send the computed new extrinsic information back to the soft MIMO detector for further iteration.
In the next round of iteration, the soft MIMO detector replaces the prior information about the data symbols with the information sent from the soft channel decoder and recomputes its extrinsic information about these data symbols again. Several rounds of such iterations are performed to ensure the convergence of the overall MIMO detection and channel decoding process. We refer to such iterative MIMO detection and channel decoding schemes as iterative MIMO receivers. It yields an approximate solution to the joint MIMO detection and channel decoding problem expressed in (2).
Figure 3 shows the block diagram for the iterative MIMO receiver. The soft MIMO detection often used is the sphere algorithm [11] and the soft channel decoding often used is the belief propagation algorithm. The complexity of the iterative MIMO receiver is much higher than that of the linear MIMO receiver. Although the iterative MIMO receiver has better performance than the linear MIMO receiver does, there is still a large performance gap with respect to the optimal MIMO receiver. Moreover, the iterative information exchange introduces large decoding latency.
Iv DeepLearning MIMO Receivers
We propose to employ deep neural networks (DNN) to solve the joint MIMO detection and channel decoding problem stated in (2) with the goal of improving performance. The DNNs are trained under the framework of supervised learning.
We consider the training of DNN at the MIMO receiver after the channel matrix estimate has already computed from the received pilot signals. Using this channel matrix estimate at the MIMO receiver, we generate a set of training signals to train a DNN to solve the joint MIMO detection and channel decoding problem (2) under the framework of supervised learning. The training and deployment framework of DNN for MIMO is illustrated in Figure 4. We describe the associated procedures in the following.
The receiver generates the training data by calling a functional block that mimics the operation at the MIMO transmitter. Specifically, for training purposes, the receiver randomly generates many length binary vectors, , . Each binary vector is transformed into a data matrix using the functional block of the MIMO transmitter as described in Section II. Then, with the channel matrix estimate given by the channel estimator, the receiver generates a training signal by multiplying with followed by adding AWGN:
where is the th training signal and is the corresponding generated AWGN. The training set is given by , where is the th training signal and is the corresponding label for . We emphasize that the training set is dependent on the channel matrix estimate .
We use the generated training set to train a DNN, , that approximates the solution to problem (2), where is the set containing all the weights of the edges in the DNN. When we feed the training signals to the inputs of the DNN, we also feed the channel matrix estimate
to the DNN (as illustrated in Figure 4). We optimize the DNN weights by miming the cross entropy loss function
[28]:(4) 
where is the th target information bit of the th label vector , is the soft estimate of
given by the DNN. The training algorithm used to minimize (4) for DNN is the so called stochastic gradient descent (SGD) algorithm
[28]. After the training is finished, the weights of the DNN are fixed to and we can use the trained DNN to decode the received signals as . We have the following remarks on this DNN for MIMO:
[leftmargin=*,labelsep=5.8mm]

The variables of interests to the DNN are the data symbols in . The size of the variable space is thus , where is the length of (Note that we have the onetoone mapping: ). According to the results shown in [21], if the DNN can see all possible codewords, the decoding performance of the DNN is the best. Like the investigation in [21], we also adopt short codes and train the DNN with all different codewords.

The training of the DNN is quite timeconsuming. Therefore, the training procedure will introduce a large decoding latency and it cannot be deployed for applications with stringent latency requirements, such as voice transmissions; it is, however, suitable for data transmissions with relaxed latency requirements.
V Simulation Results
In this section, we present simulation results for the evaluation of the proposed DNN MIMO receiver. The modulations used are BPSK and QPSK. The channel code used is the polar code [5] with code rate . We assume that that each packet consists of bits in the simulations. The adoption of the short packet length is due to the exponential training complexity when DNN is used to perform channel decoding [21].^{1}^{1}1The extension of extend DNN channel decoding to long packet length can follow the solution of [23]. We will consider how to incorporate the solution of [23] into our DNN joint MIMO detection and channel decoding scheme in future work. Packets of short length are of interest in some practical systems such as the internet of things (IoT). After channel encoding and modulation, information bits are transformed to 32 BPSK symbols or 16 QPSK symbols. Our simulations assume MIMO matrices of dimensions , and .
We implement a DNN consisting of one input layer, six hidden layers and one output layer using the deeplearning software toolkit of Keras. The nonlinear activation function at the neurons of the input layer and the hidden layers is the Rectified linear unit (ReLu) function
[28]. The input layer is a denselyconnected layer. Each hidden layer is a denselyconnected layer with batch normalization (BN) operations before the operations by ReLu. The output layer is a denselyconnected layer with the sigmoid activation functions. The architecture of the DNN is illustrated in Figure 5. We train our NN over several “epochs”. In each epoch, the gradient of the loss function is computed over the entire training set using Adam, a method for stochastic gradient descent optimization
[29]. Our training set contains all different codewords, is the length of information bits. Setting the number of learning epochs to , we train the DNN with datasets of different training SNRs (from 0 dB to 6 dB). After the training is finished, the trained DNN is used to decode the received MIMO signals.For comparison, we treat the following two traditional MIMO receivers as our benchmarks: i) the linear MIMO receiver that employs ZF MIMO detection followed by the MAP polar decoding of [5], ii) the iterative MIMO receiver that iterates between the sphere MIMO detection of [11] and the MAP polar decoding of [5]. We investigate the performance of MIMO receivers with perfect knowledge as well as with imperfect knowledge of the channel matrix. For the latter, we assume LS estimation [27] is used to estimate the channel matrix. For a fixed SNR, we evaluate the average BER results of the MIMO receivers over 100 different MIMO channel realizations.
Figure 6 and Figure 7 show the BER of the MIMO receivers with perfect knowledge of the MIMO channel matrix for BPSK and QPSK, respectively. We can observe that our DNN MIMO receiver can indeed outperform the linear and iterative MIMO receivers in terms of BER. For example, the DNN MIMO receiver has around 1 dB and 3.5 dB SNR gain over the linear and iterative MIMO receivers, respectively, at the BER of for BPSK and MIMO channels.
Figure 8 and Figure 9 show the BER of the MIMO receivers with imperfect knowledge of the MIMO channel matrix for BPSK and QPSK, respectively. For the channel matrix estimation, we place a Hadamard matrix at the beginning of the packets as pilots and use the LS estimation based on the received pilots to estimate the channel matrix at the receivers. In general, the performance trend for the cases of perfect and imperfect channel estimates are the same. The only difference between them is that for the cases of imperfect channel estimates, the gain obtained by our DNN MIMO receiver is even larger. For example, the DNN MIMO receiver now has around 2 dB and 10 dB SNR gain over the linear and iterative MIMO receivers at the BER of for BPSK and MIMO channels.
Vi Conclusions
This work used a deeplearning tool, deep neural network, to develop a new solution to the problem of joint MIMO detection and channel decoding. Conventional MIMO receivers perform MIMO detection and channel decoding in a sequential or an iterative manner. The algorithms of these conventional MIMO receivers relax the signal model of coded MIMO. As a result, they are suboptimal solutions to the joint MIMO detection and channel decoding problem, leaving the possibility for further improvement. Our deep learning solution uses a DNN for joint MIMO detection and channel decoding under the framework of supervised learning. The deeplearning MIMO receiver does not separate the MIMO detection and channel decoding into two parts and does not perform sequential or iterative operations on them. It treats the MIMO detection and channel decoding as a joint decoding process and employs a single DNN to approximate the joint decoding process. This joint process improves the overall decoding performance. In our simulations, we trained a DNN consisting of six hidden layers to decode MIMO signals. The simulation results demonstrate notable gains obtained by our deeplearning MIMO receiver over the conventional linear and iterative MIMO receivers.
A drawback of the current proposed deeplearning MIMO receiver is that the DNN needs to be trained for each different channel matrix, introducing a large decoding latency. In general, to train the same DNN for MIMO decoding with different channel matrices is challenging, since the space of all possible channel matrices is huge. It is impossible to let the DNN see all the channel realizations. In [20], a scheme to construct one DNN for MIMO detection with different channel matrices is given. However, it is not clear how to extend the associated DNN to solve the problem of joint MIMO detection and channel decoding. A DNN for joint MIMO detection and channel decoding that can handle different channel matrices with one training (i.e., no need to train and readjust the weights in the DNN for each different channel matrix) is an interesting direction for further investigations.
References
 [1] O. Bejarano, E. W. Knightly, and M. Park, “Ieee 802.11 ac: from channelization to multiuser mimo,” IEEE Communications Magazine, vol. 51, no. 10, pp. 84–90, 2013.
 [2] A. Ghosh and R. Ratasuk, Essentials of lte and ltea. Cambridge University Press, 2011.
 [3] A. Goldsmith, S. A. Jafar, N. Jindal, and S. Vishwanath, “Capacity limits of mimo channels,” IEEE Journal on selected areas in Communications, vol. 21, no. 5, pp. 684–702, 2003.
 [4] T. Richardson and S. Kudekar, “Design of lowdensity parity check codes for 5g new radio,” IEEE Communications Magazine, vol. 56, no. 3, pp. 28–34, 2018.
 [5] E. Arikan, “Channel polarization: A method for constructing capacityachieving codes for symmetric binaryinput memoryless channels,” IEEE Transactions on Information Theory, vol. 55, no. 7, pp. 3051–3073, 2009.
 [6] T. Haustein, C. Von Helmolt, E. Jorswieck, V. Jungnickel, and V. Pohl, “Performance of mimo systems with channel inversion,” in Vehicular Technology Conference, 2002. VTC Spring 2002. IEEE 55th, vol. 1. IEEE, 2002, pp. 35–39.
 [7] K. R. Kumar, G. Caire, and A. L. Moustakas, “Asymptotic performance of linear receivers in mimo fading channels,” arXiv preprint arXiv:0810.0883, 2008.
 [8] A. Hedayat and A. Nosratinia, “Outage and diversity of linear receivers in flatfading mimo channels,” IEEE Transactions on Signal Processing, vol. 55, no. 12, pp. 5868–5873, 2007.
 [9] B. M. Hochwald and S. Ten Brink, “Achieving nearcapacity on a multipleantenna channel,” IEEE transactions on communications, vol. 51, no. 3, pp. 389–399, 2003.
 [10] E. Witte, F. Borlenghi, G. Ascheid, R. Leupers, and H. Meyr, “A scalable vlsi architecture for softinput softoutput depthfirst sphere decoding,” IEEE Transactions on Circuits and Systems II: Express Briefs, 2010.
 [11] C. Studer and H. Bolcskei, “Soft–input soft–output single treesearch sphere decoding,” IEEE Transactions on Information Theory, vol. 56, no. 10, pp. 4827–4842, 2010.
 [12] E. G. Larsson, “Mimo detection methods: How they work [lecture notes],” IEEE signal processing magazine, vol. 26, no. 3, 2009.
 [13] E. Viterbo and J. Boutros, “A universal lattice code decoder for fading channels,” IEEE Transactions on Information theory, vol. 45, no. 5, pp. 1639–1642, 1999.
 [14] L. G. Barbero and J. S. Thompson, “Fixing the complexity of the sphere decoder for mimo detection,” IEEE Transactions on Wireless Communications, vol. 7, no. 6, 2008.
 [15] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, “Closest point search in lattices,” IEEE transactions on information theory, vol. 48, no. 8, pp. 2201–2214, 2002.
 [16] P. H. Tan and L. K. Rasmussen, “The application of semidefinite programming for detection in cdma,” IEEE journal on selected areas in communications, vol. 19, no. 8, pp. 1442–1449, 2001.
 [17] B. Steingrimsson, Z.Q. Luo, and K. M. Wong, “Soft quasimaximumlikelihood detection for multipleantenna wireless channels,” IEEE Transactions on Signal Processing, vol. 51, no. 11, pp. 2710–2719, 2003.
 [18] C. Windpassinger and R. F. Fischer, “Lowcomplexity nearmaximumlikelihood detection and precoding for mimo systems using lattice reduction,” in Information Theory Workshop, 2003. Proceedings. 2003 IEEE. IEEE, 2003, pp. 345–348.
 [19] E. G. Larsson and J. Jalden, “Fixedcomplexity soft mimo detection via partial marginalization,” IEEE transactions on Signal Processing, vol. 56, no. 8, pp. 3397–3407, 2008.
 [20] N. Samuel, T. Diskin, and A. Wiesel, “Deep mimo detection,” in IEEE 18th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). IEEE, 2017, pp. 1–5.
 [21] T. Gruber, S. Cammerer, J. Hoydis, and S. ten Brink, “On deep learningbased channel decoding,” in Information Sciences and Systems (CISS), 2017 51st Annual Conference on. IEEE, 2017, pp. 1–6.

[22]
J. Seo, J. Lee, and K. Kim, “Decoding of polar code by using deep feedforward neural networks,” in
2018 International Conference on Computing, Networking and Communications (ICNC). IEEE, 2018, pp. 238–242.  [23] S. Cammerer, T. Gruber, J. Hoydis, and S. ten Brink, “Scaling deep learningbased decoding of polar codes via partitioning,” in GLOBECOM 20172017 IEEE Global Communications Conference. IEEE, 2017, pp. 1–6.
 [24] E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein, and Y. Be’ery, “Deep learning methods for improved decoding of linear codes,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 1, pp. 119–131, 2018.
 [25] H. Ye and G. Y. Li, “Initial results on deep learning for joint channel equalization and decoding,” in Vehicular Technology Conference (VTCFall), 2017 IEEE 86th. IEEE, 2017, pp. 1–5.
 [26] T. Wang, C.K. Wen, H. Wang, F. Gao, T. Jiang, and S. Jin, “Deep learning for wireless physical layer: Opportunities and challenges,” China Communications, vol. 14, no. 11, pp. 92–111, 2017.
 [27] M. Biguesh and A. B. Gershman, “Trainingbased mimo channel estimation: a study of estimator tradeoffs and optimal training signals,” IEEE transactions on signal processing, vol. 54, no. 3, pp. 884–893, 2006.
 [28] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
 [29] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
Comments
There are no comments yet.