I Introduction
Massive MIMO is foreseen to be one of the key enablers for the future generation communication system, in which the spectral efficiency is expected to be severalorder higher than that in the current one [1]. Scaling up the number of the antennas in the BS can offer numerous advantages [2] while exhibiting the gigantic energy consumptions. To alleviate this predicament, equipping the BS with lowresolution ADCs/DACs can greatly reduce energy consumptions [3, 4, 5, 6, 7] in the massive MIMO system.
Last few years have witnessed the publication of many technical results reporting the performance of the massive MIMO system with lowresolution ADCs. In this work, we will focus on the data equalization problem. For the frequencyflat channel, the authors of [8] show that the performance gap between the quantized MIMO system and ideal one can be approached by increasing the number of antennas. Similar results have been reported in the frequencyselective case, i.e., singlecell sparse broadband massive MIMO [9] and then multicell millimeter wave massive MIMO [10].
Most of these works are based on the Bussgang theorem [11], with which the severe nonlinearity introduced by the ADCs is approximately expressed by a linear combination of the input and a distortion term. The Bussgang theorem based data equalization methods have low computational complexity comparable to the classical linear data equalization methods. On the other hand, they have to suffer from the performance saturation pain in the midtohigh SNR regime [12, 13, 14]and attest to a low degree of adaptability in practical 5G millimeterwave massive MIMO systems [15].
It is hard to understand the nonlinearity introduced by the ADCs with a tractable mathematical model. As a result, the best approach for data equalization problem is unclear. Alternatively, it would be interesting to comprehend (or learn) the nonlinear structure of the employed ADCs leveraging the potential of the deep learning based methods
[16].Recently, the deep learning based detection methods have been proposed for MIMOOFDM systems with ideal DACs [17, 18]. It has been shown that deep learning based method has comparable performance with the minimum meansquare error receiver and shows robustness in the case of nonlinear clipping noise. The authors of [19]
use a recurrent neural network based approach to detect data sequences in molecular communication systems with blind channel state information. Besides, supervisedlearningaided estimator has been proposed for the frequencyflat MIMO system with lowresolution ADCs
[20].In this paper, instead of relying solely on deep learning methods, we propose a new deep neural network optimized equalization framework, jointly exploiting structural knowledge from MIMO systems and harnessing the power of unsupervised deep learning, for MIMOOFDM systems with lowresolution ADCs.
The remainder of this paper is structured as follows. Section II introduces the signal model for the MIMOOFDM systems with lowresolution ADCs, in which the Bussgang theorem based linearized data equalization method is discussed. In Section III
, we detail the design of the proposed method. A coarse deep neural network based equalizer, following the structure of the widelyused supervised learning method, is firstly proposed. To enhance the generalization of proposed equalizer, a fine deep neural network based equalizer is then developed by leveraging the knowledge from the unsupervised learning. Furthermore, numerical case studies in Section
IV are provided to evaluate the performance of the proposed approach. Lastly, the conclusion and acknowledgement of this paper are given in Section V and Section VI, respectively.Throughout this paper, vectors and matrices are given in lowercase and uppercase boldface letters, e.g.,
and , respectively. We use to denote the conjugate transpose of . The th row and th column element of is denoted by . We use , , and to represent the real part, the imaginary part, norm of vector .Ii Preliminaries
Iia MIMOOFDM System Model
We consider a MIMOOFDM uplink system [21] with receive antennas at the BS, serving simultaneously singleantenna user terminals (UTs) over subcarriers. We use to denote the channel impulse response of taps between the th receive antenna and the th transmit antenna, for , . Let be a sequence of OFDM symbols to be sent from the th antenna. After removing the cyclic prefix and applying an IFFT, the received signal at the th receive antenna reads
(1) 
where denotes the circulant channel convolution matrix, is the
discrete Fourier transform (DFT) matrix
[21] of dimension and the elements ofare assumed to be identically independent distributed zeromean Gaussian variables with variance
.IiB Quantization
For the MIMOOFDM uplink system with lowresolution ADCs, it is assumed that the real and imaginary parts of received signals are quantized separately by a bit symmetric uniform quantizer , denoted by
(2) 
Specially, the realvalued quantizer maps the input to a set of labels , which are determined by the set of thresholds , such that . For a bit ADC with step size , the thresholds and quantization labels are respectively given by
(3) 
and
(4) 
In the case of 1 bit ADCs, the output set reduces to .
IiC Data Equalization and Linearized Solution
The serve nonlinearity of (2) makes the channel station information almost
to the BS, resulting in a challenging work for data equalization in quantized frequencyselective MIMO systems. Recently, as shown in the Section I, many researchers proposed to represent the nonlinearity of a Gaussian signal by the sum of a linear transformation and an uncorrelated quantization error term, according to the Bussgang theorem
[11]. Following the Bussgang theorem, (5) can be approximately represented by(6) 
where is a diagonal matrix and
is referred to the quantization error. The circulant channel matrix has an eigenvalue decomposition
(7) 
where is a diagonal matrix and . Substituting (7) into (IIC) can give the subcarrierwise inputoutput relationship:
(8) 
where is the Bussgang decomposition factor. Then the Bussgang theorem based data equalization problem can be formulated as
(9) 
where is the set of the constellation points. Discarding the nonconvex constraint , one can use the minimum mean squared error (MMSE) based method [22] for the channel estimation and then data equalization. Unluckily, such a linearized data equalization method has to suffer from the performance saturation in the midtohigh SNR regime. Specially, the BER curves achievable with linearized data equalization methods saturate at certain finite SNR, above which no further improvement can be obtained. As a result, it is vital to understanding the nonlinearity of (2) by the new techniques, i.e., the deep neural networks based methods.
Iii LEMO: Learn to Equalize for MIMOOFDM
For the coarsely quantized MIMOOFDM system, the true channel state information is almost completely unknown, the best method to data equalization becomes unclear. As shown in Fig. 1, we propose a deep learning based method to train a datadriven detector to determine transmitted symbols from pilots.
Iiia Coarse Deep Neural Network based Equalizer
We start by applying the supervised deep learning [16] for data equalization in quantized MIMOOFDM system. The employed method in subsection is referred to the coarse deep neural network (CDNN) based equalizer.
IiiA1 Data Preprocessing
Generally, the supervised deep learning based approach has two parts: Offline training and online test. Let be the number of collected data samples. For , we use
to represent data collected at th sampling. In this paper, the elements of and are the subcarrierwise realvalued received signals (outputs of lowresolution DACs) and the transmitted signals, respectively.
IiiA2 Network Parameters Optimization
At the offline training stage, the pilot data set:
is used to find the optimal set of parameters of a layer neural network by solving the optimization problem:
(10) 
where
for , is the activation function and are weight matrix at th layer. is a specially designed activation which will be explained latter. The parameters are updated to minimize the expected loss (10) by using a stochastic descent based method [16], i.e., stochastic gradient decent (SGD), or Adam.
The deep learning based equalizer described above tries to generate a nonlinear relationship between the transmitted signals and the quantized received signals according to the supervised task, hopefully aiming at understanding (or learning) the true channel matrix information that has been almost by the use of lowresolution ADCs.
However, solely training a supervised neural network is likely to handle a problem with underconstrained configurations, and will find a solution that can well fit the training data but can not generalize well [23], especially in the case of neural discrete representation learning (the data equalization problem is itself a discrete optimization problem [22]). These limitations motivate us to design a fine deep neural network (FDNN) based equalizer .
IiiB Fine Deep Neural Network based Equalizer
Developing a deep neural network with high generalization ability plays a key role for the data equalization of the quantized frequencyselective MIMO systems. Generally, increasing the number of the layers and the number of neurons in a layer can help improve the generalization ability of deep neural networks
[16]. However, training a very deep neural network has to face up with the vanishing or exploding gradient problem
[24]. Besides, the real time requirement prevents the widespread use of the very deep neural network.IiiB1 The Skewsymmetric Weights Matrix
We stat with a simple example. Let and be two complexvalued vectors and a complexvalued matrix. The equation
equals to
In this work, we focus on the realvalued deep neural network, whose weights structure should be accommodated to complexvalued data. Let
be the weight matrix in th layer. In the training stage, we set and , which means only half of the weight matrix will be updated. This simple operation not only makes the proposed network suitable for complexvalued OFDM symbols training but also helps to reduce computational complexity.
IiiB2 Activation Function Design
As shown in (9), the data equalization problem itself is a nonconvex optimization problem. Discarding the nonconvex constraint yields the suboptimal solution. In this work, we design a new activation function to help us fully exploit the stochastic gradient descent method for the discrete optimization problem. Specially, in the case of QPSK signaling and as shown in Fig. 2, we propose to use the activation function:
(11) 
where is the magnitude of the transmitted signal. is a trainable parameter that controls the layer’s outputs, which will be sufficiently close to the constellation points. Since the data in the higherorder modulation can be represented by a linear combination of the QPSK data [25], we leave out the design for the higherorder modulation.
IiiB3 Generalization Enhancement
In this paper, motivated by the pioneer work [26], we propose to improve the representation of the employed neural network by adding an unsupervised loss to (10). Specially, the proposed neural network based equalizer tries to update the neural network parameters by minimizing
(12) 
where
, and is the penalty parameter that balances and . We use an unsupervised loss to promote the generalization of the proposed FDNN equalizer for three reasons:

It has been shown in previous studies that learning multitask jointly can improve the generalization error bounds [28]. In our case, we jointly minimize and .

A nonlinear autoender (12
) have been indeed found to be helpful for key feature extraction
[27, 16]. In our case, it helps to represent of highdimension from of lowdimension (the number of users is much less than the number of received antennas).
Iv Case Studies
In this section, using the Tensorflow platform
[29], we evaluate the performance of the proposed equalizers via numerical simulations on a PC with an Intel Core i77700K CPU and two NVIDIA GTX 1080 Ti GPUs.Iva System and Network Parameters Setup
We consider a QPSK modulated MIMOOFDM system with 128 BS antennas, simultaneously serving 8 users. Following the IEEE 802.11a standard, we use 64 subcarriers in an OFDM block. The channel tap length is . It is assumed that the coherence time is . A typical urban environment with max delay 16 is considered.
Network Parameters  
Layer  Activation  Weights  
Input      
FullConnected  relu  GM  
FullConnected  relu  GM  
FullConnected  relu  GM  
FullConnected  relu  GM  
FullConnected  relu  GM  
FullConnected  relu  GM  
FullConnected  IM  
This layer is connected to layer as shown in Fig. 1. 
Tab. I illustrates the parameters in the proposed network i.e., the layer size, layer initializations, and activation functions. The notation GM in Tab. I
means the Gaussian matrix whose elements have zero mean and variance 0.01. IM means the identity matrix. The parameter
in the proposed activation function will be updated from 1 to 100 to find the optimal on every minibatch training.IvB Effect of the Quantization Level
In Fig. 3, we investigate the BER performance of the proposed equalizer with different quantization levels. The benchmark equalizer considered is the MMSE equalizer with infiniteresolution ADCs (MMSEinf.bit). It can be observed that the performance of proposed FDNN equalizer improve as a result of a increased bits of ADCs. The performance gap between the MMSEinf.bit equalizer and the proposed equalizer is negligible when the number of the bits of ADCs is no less than 2. Interestingly, the proposed FDNN equalizer with infiniteresolution ADCs (FDNNinf.bit) slightly outperforms the MMSEinf.bit equalizer when the SNR is over 10 dB. These results above demonstrate that the proposed network can nicely approximate the nonlinearity introduced by the employed ADCs.
IvC The Effect on the Distribution of the Channel Taps
In Fig. 4
, we study the performance of the proposed equalizers (CDNN and FDNN) under two kinds of pilots: Gaussian distributed pilots (GP) and Poisson distributed pilots (PP)
^{1}^{1}1For the case of GP, the taps are assumed be uncorrelated zeromean Gaussion variables with unit variance. For the Poisson channel model, a widely used model in the molecular and optical communication systems [19], the poisson parameter is set to . . Fig. 4 shows BER curves of the MMSE equalizer do not decease distinctly as the SNR increases when the SNR is above 10 dB. On the other hand, the CDNN and FDNN equalizers have robust performance in a wide range of SNR. Besides, it is seen that the proposed FDNN equalizer outperforms the CDNN and MMSE equalizer under scenarios where GD pilots and PD pilots are used. The FDNN equalizer is more robust than the CDNN equalizer in the case of PD pilots. Such an phenomena may be explained by the enhanced generalization of the FDNN.IvD The Effect of Pilots Numbers
The feedbacks (in this paper, we only consider the number of the pilots) are limited in a real communication system, which reveals another challenge in the design of an effective neural network. In Fig. 5, we show the results of the effect of the pilot numbers.
From Fig. 5, the larger numbers of pilots lead to better performance of all compared equalizers and they have comparable performance when 32 pilots are used. In the case that when only 8 pilots are used, the proposed FDNN equalizer significantly outperforms the MMSE equalizer. The proposed FDNN equalizer and the MMSE equalizer can respectively achieve a target BER of with 8 pilots and 16 pilots, demonstrating the potential for pilots saving.
V Conclusions and Future Work
In this paper, the deep neural network based equalizer has been proposed for the MIMOOFDM systems with lowresolution ADCs. The experimental results show that the proposed equalizer is robust to different channel taps (i.e., Gaussian, and Poisson) and significantly outperforms the linearized MMSE equalizer. In addition, given a target of bit error rate, the proposed learning architecture with an unsupervised loss is more efficient in terms of the number of pilots, when compared to the learning architecture without such a design or the linearized MSSE equalizer.
In our future work, we will provide theoretical analysis, i.e., based on the Rademacher complexity analysis [30], for the generalization improvement in the fine deep neural network based equalizer. Besides, it is interesting to investigate how many parameters are needed for training a neural network based method for data equalization problem in MIMOOFDM systems with lowresolution ADCs.
Vi Acknowledgment
The authors would like to thank Dr. Yanwen Fan, from the EECS department of the University of Tennessee, Knoxville, for fruitful discussions in the neural network algorithm implementation.
References
 [1] L. Lu, G. Y. Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang, “An overview of massive MIMO: Benefits and challenges,” IEEE J. Sel. Topics Signal Process., vol. 8, no. 5, pp. 742–758, 2014.
 [2] F. Rusek, D. Persson, B. K. Lau, and E. G. Larsson, “Scaling up MIMO: Opportunities and challenges with very large arrays,” IEEE Signal Process. Mag., vol. 30, no. 1, pp. 40–60, 2012.
 [3] S. Wang, Y. Li, and W. Jing, “Multiuser detection in massive spatial modulation MIMO with lowresolution ADCs,” IEEE Transactions on Wireless Communications, vol. 14, no. 4, pp. 2156–2168, 2015.
 [4] S. Wang and Z. Lin, “Signal processing in massive MIMO with iq imbalances and lowresolution ADCs,” IEEE Transactions on Wireless Communications, vol. PP, no. 99, pp. 1–1, 2016.
 [5] F. Wang, J. Fang, H. Li, and S. Li, “Quantization design and channel estimation for massive MIMO systems with onebit ADCs,” IEEE Transactions on Vehicular Technology, vol. PP, no. 99, 2017.
 [6] L. Chu, W. Fei, L. Lily, and Q. Robert, “Efficient nonlinear precoding for massive MUMIMO downlink systems with 1bit dacs,” Submitted to IEEE Transactions on Wireless Communications, 2018. [Online]. Available: https://arxiv.org/abs/1804.08839
 [7] L. Chu, F. Wen, and Q. Robert, “Robust precoding design for coarsely quantized MUMIMO under channel uncertainties,” in IEEE International Conference on Communications, 2019.
 [8] S. Jacobsson, G. Durisi, M. Coldrey, U. Gustavsson, and C. Studer, “Throughput analysis of massive MIMO uplink with lowresolution ADCs,” IEEE Transactions on Wireless Communications, vol. 16, no. 6, pp. 4038–4051, 2017.
 [9] A. Mezghani and A. L. Swindlehurst, “Blind estimation of sparse broadband massive MIMO channels with ideal and onebit ADCs,” IEEE Transactions on Signal Processing, vol. PP, no. 99, pp. 1–1, 2017.
 [10] J. Xu, W. Xu, H. Zhang, G. Y. Li, and X. You, “Performance analysis of multicell millimeter wave massive MIMO networks with lowprecision ADCs,” IEEE Transactions on Communications, pp. 1–1, 2018.
 [11] J. J. Bussgang, “Crosscorrelation functions of amplitudedistorted Gaussian signals,” MIT Res. Lab. Electron., vol. 216, 1952.
 [12] Y.S. Jeon, N. Lee, S.N. Hong, and R. W. Heath, “Onebit sphere decoding for uplink massive MIMO systems with onebit ADCs,” IEEE Transactions on Wireless Communications, vol. 17, no. 7, pp. 4509–4521, 2018.
 [13] N. J. Myers and R. W. Heath, “Message passingbased joint cfo and channel estimation in millimeter wave systems with onebit ADCs.” IEEE Transactions on Wireless Communications, pp. 1–1, 2019.
 [14] C. K. Wen, C. J. Wang, J. Shi, K. K. Wong, and P. Ting, “Bayesoptimal joint channelanddata estimation for massive MIMO with lowprecision ADCs,” IEEE Transactions on Signal Processing, vol. 64, no. 10, pp. 2541–2556, 2016.
 [15] J. Zhang, L. Dai, L. Xu, L. Ying, and L. Hanzo, “On lowresolution ADCs in practical 5g millimeterwave massive MIMO systems,” IEEE Communications Magazine, vol. PP, no. 99, pp. 2–8, 2018.
 [16] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning.” Nature, vol. 521, no. 7553, p. 436, 2015.
 [17] H. Ye, G. Y. Li, and B. H. Juang, “Power of deep learning for channel estimation and signal detection in OFDM systems,” IEEE Wireless Communications Letters, vol. 7, no. 1, pp. 114–117, 2018.
 [18] H. He, C. Wen, S. Jin, and G. Y. Li, “A modeldriven deep learning network for MIMO detection,” in 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2018, pp. 584–588.
 [19] N. Farsad and A. Goldsmith, “Neural network detection of data sequences in communication systems,” IEEE Transactions on Signal Processing, vol. 66, no. 21, pp. 5663–5678, 2018.
 [20] Y. S. Jeon, S. N. Hong, and N. Lee, “Supervisedlearningaided communication framework for MIMO systems with lowresolution ADCs,” IEEE Transactions on Vehicular Technology, vol. PP, no. 99, pp. 1–1, 2018.
 [21] A. Goldsmith, Wireless Communications. New York, NY, USA: Cambridge University Press, 2005.
 [22] H. V. Poor, An Introduction to Signal Detection and Estimation (2Nd Ed.). Berlin, Heidelberg: SpringerVerlag, 1994.
 [23] C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning requires rethinking generalization,” CoRR, vol. abs/1611.03530, 2017. [Online]. Available: http://arxiv.org/abs/1611.03530
 [24] B. Hanin, “Which neural net architectures give rise to exploding and vanishing gradients?” in Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. CesaBianchi, and R. Garnett, Eds., 2018, pp. 582–591.
 [25] Z. Mao, X. Wang, and X. Wang, “Semidefinite programming relaxation approach for multiuser detection of qam signals,” Wireless Communications IEEE Transactions on, vol. 6, no. 12, pp. 4275–4279, 2007.

[26]
D. Erhan, Y. Bengio, A. C. Courville, P. A. Manzagol, and S. Bengio, “Why does
unsupervised pretraining help deep learning?”
Journal of Machine Learning Research
, vol. 11, no. 3, pp. 625–660, 2010. 
[27]
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. A. Manzagol, “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,”
Journal of Machine Learning Research, vol. 11, no. 12, pp. 3371–3408, 2010.  [28] A. Maurer and M. Pontil, “Excess risk bounds for multitask learning with trace norm regularization,” Journal of Machine Learning Research, vol. 30, pp. 55–76, 2013.
 [29] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, and M. Isard, “Tensorflow: a system for largescale machine learning,” in 12th USENIX Symposium on Operating Systems Design and Implementation, 2016, pp. 265–283.
 [30] P. Bartlett and S. Mendelson, “Rademacher and gaussian complexities: Risk bounds and structural results,” Journal of Machine Learning Research, vol. Nov., no. 3, pp. 224–240, 2002.
Comments
There are no comments yet.