Massive MIMO is foreseen to be one of the key enablers for the future generation communication system, in which the spectral efficiency is expected to be several-order higher than that in the current one . Scaling up the number of the antennas in the BS can offer numerous advantages  while exhibiting the gigantic energy consumptions. To alleviate this predicament, equipping the BS with low-resolution ADCs/DACs can greatly reduce energy consumptions [3, 4, 5, 6, 7] in the massive MIMO system.
Last few years have witnessed the publication of many technical results reporting the performance of the massive MIMO system with low-resolution ADCs. In this work, we will focus on the data equalization problem. For the frequency-flat channel, the authors of  show that the performance gap between the quantized MIMO system and ideal one can be approached by increasing the number of antennas. Similar results have been reported in the frequency-selective case, i.e., single-cell sparse broadband massive MIMO  and then multi-cell millimeter wave massive MIMO .
Most of these works are based on the Bussgang theorem , with which the severe nonlinearity introduced by the ADCs is approximately expressed by a linear combination of the input and a distortion term. The Bussgang theorem based data equalization methods have low computational complexity comparable to the classical linear data equalization methods. On the other hand, they have to suffer from the performance saturation pain in the mid-to-high SNR regime [12, 13, 14]and attest to a low degree of adaptability in practical 5G millimeter-wave massive MIMO systems .
It is hard to understand the nonlinearity introduced by the ADCs with a tractable mathematical model. As a result, the best approach for data equalization problem is unclear. Alternatively, it would be interesting to comprehend (or learn) the nonlinear structure of the employed ADCs leveraging the potential of the deep learning based methods.
Recently, the deep learning based detection methods have been proposed for MIMO-OFDM systems with ideal DACs [17, 18]. It has been shown that deep learning based method has comparable performance with the minimum mean-square error receiver and shows robustness in the case of nonlinear clipping noise. The authors of 
use a recurrent neural network based approach to detect data sequences in molecular communication systems with blind channel state information. Besides, supervised-learning-aided estimator has been proposed for the frequency-flat MIMO system with low-resolution ADCs.
In this paper, instead of relying solely on deep learning methods, we propose a new deep neural network optimized equalization framework, jointly exploiting structural knowledge from MIMO systems and harnessing the power of unsupervised deep learning, for MIMO-OFDM systems with low-resolution ADCs.
The remainder of this paper is structured as follows. Section II introduces the signal model for the MIMO-OFDM systems with low-resolution ADCs, in which the Bussgang theorem based linearized data equalization method is discussed. In Section III
, we detail the design of the proposed method. A coarse deep neural network based equalizer, following the structure of the widely-used supervised learning method, is firstly proposed. To enhance the generalization of proposed equalizer, a fine deep neural network based equalizer is then developed by leveraging the knowledge from the unsupervised learning. Furthermore, numerical case studies in SectionIV are provided to evaluate the performance of the proposed approach. Lastly, the conclusion and acknowledgement of this paper are given in Section V and Section VI, respectively.
Throughout this paper, vectors and matrices are given in lowercase and uppercase boldface letters, e.g.,and , respectively. We use to denote the conjugate transpose of . The th row and th column element of is denoted by . We use , , and to represent the real part, the imaginary part, -norm of vector .
Ii-a MIMO-OFDM System Model
We consider a MIMO-OFDM uplink system  with receive antennas at the BS, serving simultaneously single-antenna user terminals (UTs) over subcarriers. We use to denote the channel impulse response of taps between the -th receive antenna and the -th transmit antenna, for , . Let be a sequence of OFDM symbols to be sent from the -th antenna. After removing the cyclic prefix and applying an IFFT, the received signal at the -th receive antenna reads
where denotes the circulant channel convolution matrix, is the
discrete Fourier transform (DFT) matrix of dimension and the elements of
are assumed to be identically independent distributed zero-mean Gaussian variables with variance.
For the MIMO-OFDM uplink system with low-resolution ADCs, it is assumed that the real and imaginary parts of received signals are quantized separately by a -bit symmetric uniform quantizer , denoted by
Specially, the real-valued quantizer maps the input to a set of labels , which are determined by the set of thresholds , such that . For a -bit ADC with step size , the thresholds and quantization labels are respectively given by
In the case of 1 bit ADCs, the output set reduces to .
Ii-C Data Equalization and Linearized Solution
The serve nonlinearity of (2) makes the channel station information almost
to the BS, resulting in a challenging work for data equalization in quantized frequency-selective MIMO systems. Recently, as shown in the Section I, many researchers proposed to represent the nonlinearity of a Gaussian signal by the sum of a linear transformation and an uncorrelated quantization error term, according to the Bussgang theorem. Following the Bussgang theorem, (5) can be approximately represented by
where is a diagonal matrix and
is referred to the quantization error. The circulant channel matrix has an eigenvalue decomposition
where is the Bussgang decomposition factor. Then the Bussgang theorem based data equalization problem can be formulated as
where is the set of the constellation points. Discarding the nonconvex constraint , one can use the minimum mean squared error (MMSE) based method  for the channel estimation and then data equalization. Unluckily, such a linearized data equalization method has to suffer from the performance saturation in the mid-to-high SNR regime. Specially, the BER curves achievable with linearized data equalization methods saturate at certain finite SNR, above which no further improvement can be obtained. As a result, it is vital to understanding the nonlinearity of (2) by the new techniques, i.e., the deep neural networks based methods.
Iii LEMO: Learn to Equalize for MIMO-OFDM
For the coarsely quantized MIMO-OFDM system, the true channel state information is almost completely unknown, the best method to data equalization becomes unclear. As shown in Fig. 1, we propose a deep learning based method to train a data-driven detector to determine transmitted symbols from pilots.
Iii-a Coarse Deep Neural Network based Equalizer
We start by applying the supervised deep learning  for data equalization in quantized MIMO-OFDM system. The employed method in subsection is referred to the coarse deep neural network (CDNN) based equalizer.
Iii-A1 Data Preprocessing
Generally, the supervised deep learning based approach has two parts: Offline training and online test. Let be the number of collected data samples. For , we use
to represent data collected at -th sampling. In this paper, the elements of and are the subcarrier-wise real-valued received signals (outputs of low-resolution DACs) and the transmitted signals, respectively.
Iii-A2 Network Parameters Optimization
At the offline training stage, the pilot data set:
is used to find the optimal set of parameters of a -layer neural network by solving the optimization problem:
for , is the activation function and are weight matrix at -th layer. is a specially designed activation which will be explained latter. The parameters are updated to minimize the expected loss (10) by using a stochastic descent based method , i.e., stochastic gradient decent (SGD), or Adam.
The deep learning based equalizer described above tries to generate a nonlinear relationship between the transmitted signals and the quantized received signals according to the supervised task, hopefully aiming at understanding (or learning) the true channel matrix information that has been almost by the use of low-resolution ADCs.
However, solely training a supervised neural network is likely to handle a problem with under-constrained configurations, and will find a solution that can well fit the training data but can not generalize well , especially in the case of neural discrete representation learning (the data equalization problem is itself a discrete optimization problem ). These limitations motivate us to design a fine deep neural network (FDNN) based equalizer .
Iii-B Fine Deep Neural Network based Equalizer
Developing a deep neural network with high generalization ability plays a key role for the data equalization of the quantized frequency-selective MIMO systems. Generally, increasing the number of the layers and the number of neurons in a layer can help improve the generalization ability of deep neural networks
. However, training a very deep neural network has to face up with the vanishing or exploding gradient problem. Besides, the real time requirement prevents the widespread use of the very deep neural network.
Iii-B1 The Skew-symmetric Weights Matrix
We stat with a simple example. Let and be two complex-valued vectors and a complex-valued matrix. The equation
In this work, we focus on the real-valued deep neural network, whose weights structure should be accommodated to complex-valued data. Let
be the weight matrix in -th layer. In the training stage, we set and , which means only half of the weight matrix will be updated. This simple operation not only makes the proposed network suitable for complex-valued OFDM symbols training but also helps to reduce computational complexity.
Iii-B2 Activation Function Design
As shown in (9), the data equalization problem itself is a non-convex optimization problem. Discarding the nonconvex constraint yields the suboptimal solution. In this work, we design a new activation function to help us fully exploit the stochastic gradient descent method for the discrete optimization problem. Specially, in the case of QPSK signaling and as shown in Fig. 2, we propose to use the activation function:
where is the magnitude of the transmitted signal. is a trainable parameter that controls the layer’s outputs, which will be sufficiently close to the constellation points. Since the data in the higher-order modulation can be represented by a linear combination of the QPSK data , we leave out the design for the higher-order modulation.
Iii-B3 Generalization Enhancement
In this paper, motivated by the pioneer work , we propose to improve the representation of the employed neural network by adding an unsupervised loss to (10). Specially, the proposed neural network based equalizer tries to update the neural network parameters by minimizing
, and is the penalty parameter that balances and . We use an unsupervised loss to promote the generalization of the proposed FDNN equalizer for three reasons:
It has been shown in previous studies that learning multi-task jointly can improve the generalization error bounds . In our case, we jointly minimize and .
Iv Case Studies
In this section, using the Tensorflow platform, we evaluate the performance of the proposed equalizers via numerical simulations on a PC with an Intel Core i7-7700K CPU and two NVIDIA GTX 1080 Ti GPUs.
Iv-a System and Network Parameters Setup
We consider a QPSK modulated MIMO-OFDM system with 128 BS antennas, simultaneously serving 8 users. Following the IEEE 802.11a standard, we use 64 subcarriers in an OFDM block. The channel tap length is . It is assumed that the coherence time is . A typical urban environment with max delay 16 is considered.
|This layer is connected to layer as shown in Fig. 1.|
means the Gaussian matrix whose elements have zero mean and variance 0.01. IM means the identity matrix. The parameterin the proposed activation function will be updated from 1 to 100 to find the optimal on every mini-batch training.
Iv-B Effect of the Quantization Level
In Fig. 3, we investigate the BER performance of the proposed equalizer with different quantization levels. The benchmark equalizer considered is the MMSE equalizer with infinite-resolution ADCs (MMSE-inf.bit). It can be observed that the performance of proposed FDNN equalizer improve as a result of a increased bits of ADCs. The performance gap between the MMSE-inf.bit equalizer and the proposed equalizer is negligible when the number of the bits of ADCs is no less than 2. Interestingly, the proposed FDNN equalizer with infinite-resolution ADCs (FDNN-inf.bit) slightly outperforms the MMSE-inf.bit equalizer when the SNR is over 10 dB. These results above demonstrate that the proposed network can nicely approximate the nonlinearity introduced by the employed ADCs.
Iv-C The Effect on the Distribution of the Channel Taps
In Fig. 4111For the case of GP, the taps are assumed be uncorrelated zero-mean Gaussion variables with unit variance. For the Poisson channel model, a widely used model in the molecular and optical communication systems , the poisson parameter is set to . . Fig. 4 shows BER curves of the MMSE equalizer do not decease distinctly as the SNR increases when the SNR is above 10 dB. On the other hand, the CDNN and FDNN equalizers have robust performance in a wide range of SNR. Besides, it is seen that the proposed FDNN equalizer outperforms the CDNN and MMSE equalizer under scenarios where GD pilots and PD pilots are used. The FDNN equalizer is more robust than the CDNN equalizer in the case of PD pilots. Such an phenomena may be explained by the enhanced generalization of the FDNN.
Iv-D The Effect of Pilots Numbers
The feedbacks (in this paper, we only consider the number of the pilots) are limited in a real communication system, which reveals another challenge in the design of an effective neural network. In Fig. 5, we show the results of the effect of the pilot numbers.
From Fig. 5, the larger numbers of pilots lead to better performance of all compared equalizers and they have comparable performance when 32 pilots are used. In the case that when only 8 pilots are used, the proposed FDNN equalizer significantly outperforms the MMSE equalizer. The proposed FDNN equalizer and the MMSE equalizer can respectively achieve a target BER of with 8 pilots and 16 pilots, demonstrating the potential for pilots saving.
V Conclusions and Future Work
In this paper, the deep neural network based equalizer has been proposed for the MIMO-OFDM systems with low-resolution ADCs. The experimental results show that the proposed equalizer is robust to different channel taps (i.e., Gaussian, and Poisson) and significantly outperforms the linearized MMSE equalizer. In addition, given a target of bit error rate, the proposed learning architecture with an unsupervised loss is more efficient in terms of the number of pilots, when compared to the learning architecture without such a design or the linearized MSSE equalizer.
In our future work, we will provide theoretical analysis, i.e., based on the Rademacher complexity analysis , for the generalization improvement in the fine deep neural network based equalizer. Besides, it is interesting to investigate how many parameters are needed for training a neural network based method for data equalization problem in MIMO-OFDM systems with low-resolution ADCs.
The authors would like to thank Dr. Yanwen Fan, from the EECS department of the University of Tennessee, Knoxville, for fruitful discussions in the neural network algorithm implementation.
-  L. Lu, G. Y. Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang, “An overview of massive MIMO: Benefits and challenges,” IEEE J. Sel. Topics Signal Process., vol. 8, no. 5, pp. 742–758, 2014.
-  F. Rusek, D. Persson, B. K. Lau, and E. G. Larsson, “Scaling up MIMO: Opportunities and challenges with very large arrays,” IEEE Signal Process. Mag., vol. 30, no. 1, pp. 40–60, 2012.
-  S. Wang, Y. Li, and W. Jing, “Multiuser detection in massive spatial modulation MIMO with low-resolution ADCs,” IEEE Transactions on Wireless Communications, vol. 14, no. 4, pp. 2156–2168, 2015.
-  S. Wang and Z. Lin, “Signal processing in massive MIMO with iq imbalances and low-resolution ADCs,” IEEE Transactions on Wireless Communications, vol. PP, no. 99, pp. 1–1, 2016.
-  F. Wang, J. Fang, H. Li, and S. Li, “Quantization design and channel estimation for massive MIMO systems with one-bit ADCs,” IEEE Transactions on Vehicular Technology, vol. PP, no. 99, 2017.
-  L. Chu, W. Fei, L. Lily, and Q. Robert, “Efficient nonlinear precoding for massive MU-MIMO downlink systems with 1-bit dacs,” Submitted to IEEE Transactions on Wireless Communications, 2018. [Online]. Available: https://arxiv.org/abs/1804.08839
-  L. Chu, F. Wen, and Q. Robert, “Robust precoding design for coarsely quantized MU-MIMO under channel uncertainties,” in IEEE International Conference on Communications, 2019.
-  S. Jacobsson, G. Durisi, M. Coldrey, U. Gustavsson, and C. Studer, “Throughput analysis of massive MIMO uplink with low-resolution ADCs,” IEEE Transactions on Wireless Communications, vol. 16, no. 6, pp. 4038–4051, 2017.
-  A. Mezghani and A. L. Swindlehurst, “Blind estimation of sparse broadband massive MIMO channels with ideal and one-bit ADCs,” IEEE Transactions on Signal Processing, vol. PP, no. 99, pp. 1–1, 2017.
-  J. Xu, W. Xu, H. Zhang, G. Y. Li, and X. You, “Performance analysis of multi-cell millimeter wave massive MIMO networks with low-precision ADCs,” IEEE Transactions on Communications, pp. 1–1, 2018.
-  J. J. Bussgang, “Crosscorrelation functions of amplitude-distorted Gaussian signals,” MIT Res. Lab. Electron., vol. 216, 1952.
-  Y.-S. Jeon, N. Lee, S.-N. Hong, and R. W. Heath, “One-bit sphere decoding for uplink massive MIMO systems with one-bit ADCs,” IEEE Transactions on Wireless Communications, vol. 17, no. 7, pp. 4509–4521, 2018.
-  N. J. Myers and R. W. Heath, “Message passing-based joint cfo and channel estimation in millimeter wave systems with one-bit ADCs.” IEEE Transactions on Wireless Communications, pp. 1–1, 2019.
-  C. K. Wen, C. J. Wang, J. Shi, K. K. Wong, and P. Ting, “Bayes-optimal joint channel-and-data estimation for massive MIMO with low-precision ADCs,” IEEE Transactions on Signal Processing, vol. 64, no. 10, pp. 2541–2556, 2016.
-  J. Zhang, L. Dai, L. Xu, L. Ying, and L. Hanzo, “On low-resolution ADCs in practical 5g millimeter-wave massive MIMO systems,” IEEE Communications Magazine, vol. PP, no. 99, pp. 2–8, 2018.
-  Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning.” Nature, vol. 521, no. 7553, p. 436, 2015.
-  H. Ye, G. Y. Li, and B. H. Juang, “Power of deep learning for channel estimation and signal detection in OFDM systems,” IEEE Wireless Communications Letters, vol. 7, no. 1, pp. 114–117, 2018.
-  H. He, C. Wen, S. Jin, and G. Y. Li, “A model-driven deep learning network for MIMO detection,” in 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2018, pp. 584–588.
-  N. Farsad and A. Goldsmith, “Neural network detection of data sequences in communication systems,” IEEE Transactions on Signal Processing, vol. 66, no. 21, pp. 5663–5678, 2018.
-  Y. S. Jeon, S. N. Hong, and N. Lee, “Supervised-learning-aided communication framework for MIMO systems with low-resolution ADCs,” IEEE Transactions on Vehicular Technology, vol. PP, no. 99, pp. 1–1, 2018.
-  A. Goldsmith, Wireless Communications. New York, NY, USA: Cambridge University Press, 2005.
-  H. V. Poor, An Introduction to Signal Detection and Estimation (2Nd Ed.). Berlin, Heidelberg: Springer-Verlag, 1994.
-  C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning requires rethinking generalization,” CoRR, vol. abs/1611.03530, 2017. [Online]. Available: http://arxiv.org/abs/1611.03530
-  B. Hanin, “Which neural net architectures give rise to exploding and vanishing gradients?” in Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., 2018, pp. 582–591.
-  Z. Mao, X. Wang, and X. Wang, “Semidefinite programming relaxation approach for multiuser detection of qam signals,” Wireless Communications IEEE Transactions on, vol. 6, no. 12, pp. 4275–4279, 2007.
D. Erhan, Y. Bengio, A. C. Courville, P. A. Manzagol, and S. Bengio, “Why does
unsupervised pre-training help deep learning?”
Journal of Machine Learning Research, vol. 11, no. 3, pp. 625–660, 2010.
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. A. Manzagol, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,”Journal of Machine Learning Research, vol. 11, no. 12, pp. 3371–3408, 2010.
-  A. Maurer and M. Pontil, “Excess risk bounds for multitask learning with trace norm regularization,” Journal of Machine Learning Research, vol. 30, pp. 55–76, 2013.
-  M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, and M. Isard, “Tensorflow: a system for large-scale machine learning,” in 12th USENIX Symposium on Operating Systems Design and Implementation, 2016, pp. 265–283.
-  P. Bartlett and S. Mendelson, “Rademacher and gaussian complexities: Risk bounds and structural results,” Journal of Machine Learning Research, vol. Nov., no. 3, pp. 224–240, 2002.