I Introduction
Deploying large antenna arrays have been considered as one of the key ingredients for future communications such as massive MIMO for sub6GHz systems [1, 2] and millimeter wave communications [3, 4]. Excessive power consumption, however, has arisen as a bottleneck of realizing such systems due to the large number of highprecision ADCs at receivers. In this regard, employing lowprecision ADCs have been widely studied to reduce the power consumption at receivers [5, 6, 7, 8]. As an extreme case of lowresolution ADCs, onebit ADC systems have attracted large attention by significantly simplifying analog processing of receivers [9, 10, 11, 12, 13, 14, 15, 16].
Stateoftheart detection methods were developed for onebit ADC systems [11, 12, 13]. In [11], an iterative multiuser detection was proposed by using message passing dequantization algorithm. In [12], a highcomplexity onebit ML detection and lowcomplxity zeroforcing (ZF)type detection methods were developed for a quantized distributed reception scenario. Converting the ML estimation problem in [12] to convex optimization, an efficient near ML detection method was proposed in [13]. Such detection methods, however, require the estimation of channel state information (CSI) with onebit quantized signals. Although highperformance channel estimation techniques were developed for onebit ADC systems [14, 13, 15], channel estimation with onebit quantized signals still suffers degradation in estimation accuracy compared to highprecision ADC systems. In this regard, we investigate a learningbased detection approach which replaces onebit channel estimation with a probability learning process.
Recently, such a learningbased detection approach was studied [17, 18]. Since the primary challenge of such learningbased detection is large dependency on the training length, different detection techniques were developed such as empirical MLlike detection and minimumcenterdistance detection in [17] to overcome the challenge. In [18], however, a channel estimation was used to initialize likelihood functions for ML detection, and a learningbased likelihood function was used for post update of the likelihood functions. Unlike previous approaches [17] that focused on developing robust detection methods, we rather focus on developing robust learning methods of likelihood functions to overcome the large dependency of the learning process on the training length.
In this paper, we investigate a learningbased ML detection approach which replaces onebit channel estimation with a robust probability learning process as shown in Fig. 1. We propose a biasedlearning algorithm which sets the minimum probability for each likelihood function with a small probability to prevent zeroprobability likelihood functions from wiping out the obtained information through training. With the knowledge of the SNR, we further propose a ditheringandlearning technique to infer likelihood functions from dithered signals: we first add a dithering signal to the quantization input and then estimate the true likelihood function from the dithered quantized signals. The proposed method allows to estimate the likelihood probability with a reasonable training length by drawing the change of signs in the sequence of the quantized signals within the training phase. Accordingly, the BS can directly perform the ML detection which is optimal in minimizing the probability of detection error for equallyprobable transmit symbols. The likelihood probability can be further updated by utilizing correctly decoded data symbols as pilot symbols. Simulation results demonstrate that unlike the conventional learningbased onebit ML detection, the proposed detection techniques show robust detection performance in terms of symbol error rate, achieving comparable performance to the optimal onebit ML detection that requires the estimation of the CSI.
Ii System Model
We consider uplink multiuser MIMO communications in which users each with a single antenna transmit signals to the BS with antennas. We assume the number of receive antennas is much larger than that of users, . The uplink transmission is composed of a pilot transmission phase and data transmission phase: the users first transmit pilot symbols during symbol time, and then, transmit data symbols during
symbol time. The total number of pilot symbol vectors
is denoted as , , and each pilot symbol vector is transmitted times during the pilot transmission phase, i.e., .Let , , denote a data symbol vector at time . Then, the received signal vector at time is
(1) 
where denotes the average user transmit power, is the channel matrix, and represents the additive noise vector at time
that follows a complex Gaussian distribution of zero mean and variance
, . Here,represents the identity matrix with proper dimensions. Each user symbol
is generated from the set of symbols, and assumed to have zero mean and unit variance, i.e., and , where denotes the th element of . We assume a block fading narrowband channel ^{1}^{1}1Although we assume a narrowband channel model for convenience, the proposed methods can be applicable to any block fading channel model. where the channel is invariant during the transmission of symbol time. We define the signaltonoise ratio (SNR) as .The received signals in (1) are quantized at onebit ADCs. Accordingly, each real and imaginary parts of the received signals are quantized with onebit ADCs, thereby outputting only the sign of the quantization input, i.e., either 1 or 1. The quantized signal can be represented as
(2) 
where is a elementwise quantizer, and and denote the real and imaginary parts of a complex vector , respectively. The received signal in the complexvector form can be rewritten in a realvector form as
(3) 
where
(4)  
(5) 
Accordingly, we can rewrite the quantized signal in a realvector form as
(6) 
and each element is quantized to be if or otherwise.
Iii Robust OneBit ML Detection
In this section, we propose robust learningbased ML detection methods for onebit ADC systems to achieve the ML detection performance without estimating channels. Being identical to the maximum a posteriori estimation, the ML estimation is optimal in minimizing the probability of detection error when all possible transmit symbols have an equal probability for being transmitted. We first introduce the conventional onebit ML detection with the CSI in the following subsection.
Iiia OneBit ML Detection with CSI
Let be the th pilot symbol of pilot symbol vectors in a realvector form, which is the th element in the set of all possible symbol vectors . The likelihood probability of the quantized signal vector for a given channel and transmit symbol vector can be approximated as
(7) 
where denotes the likelihood function for the th element when the symbol is transmitted for the given channel, and it is defined as
(8) 
Here, where is the th row of , and
is the cumulative distribution function (CDF) of a standard Gaussian distribution. Note that (
7) becomes an exact representation of when all elements in are independent to each other. Based on (7), the ML detection rule is given as(9) 
where we define the index set of all possible symbol vectors as with . Here, denotes the cardinality of the set . The detection rule in (9) is computed by using (8) with the CSI.
IiiB Robust OneBit ML Detection without CSI
In this subsection, we introduce a learningbased onebit ML detection approach which does not require channel estimation and propose robust learning techniques with respect to the training length . During the pilot transmission of length , each pilot symbol vector is transmitted times and the BS learns likelihood functions by measuring the frequency of and during the transmission as
(10) 
where and is an indicator function which is if is true or otherwise. After learning the likelihood functions by using (10), the BS has the estimate of the likelihood probability for the quantized signal vector in the data transmission phase as
(11) 
and can perform the ML detection in (9).
With a limited length of training , however, the empirical likelihood function for may have probability of zero after learning through transmissions if the change of signs of quantized output sequences , , is not observed during transmissions for the symbol . The likelihood functions with zero probability make the likelihood probability of the observed signal in (11) zero for many candidate symbols which may include the desired symbol. Consequently, the zeroprobability likelihood functions wipe out the entire information obtained during the pilotbased learning phase, thereby severely degrading the detection performance. Note that it is even more likely to have zero probability in the high SNR.
Fig. 2 shows the symbol error rate (SER) performance of the learningbased ML detection with the number of training for receive antennas, users and QAM modulation. The optimal onebit ML detection introduced in Section IIIA is also evaluated. Note that the optimal onebit ML detection requires the CSI. As discussed, the SER increases as the number of training for each symbol decreases. In addition, the gaps between the optimal onebit ML case and learingbased onebit ML cases become larger as the SNR increase, which also corresponds to the intuition. Accordingly, the primary challenge of such learningbased detection is to make it robust to the training length over any SNR ranges. Therefore, we propose robust learning methods for onebit ML detection with respect to the training length .
IiiB1 Robust LearningBased OneBit ML without SNR
To address this challenge without requiring the SNR knowledge as well as the CSI, we propose a biasedlearning ML detection approach, which is simple but highly robust to the length of the training . Then, we will extend the proposed detection approach to the case with the SNR knowledge to improve learning performance. In this approach, we limit the minimum likelihood function to be . The bias probability needs to be as no change in the sign of quantization output sequences is observed within transmissions for . The proposed biasedlearning ML detection approach is summarized as

In the pilot transmission phase, the BS computes the likelihood functions in (10).

If zero probability is observed for any likelihood function , the BS sets and .
Although the proposed biasedlearning ML detection approach prevents the loss of information obtained from the measurement during the pilot transmission, it cannot capture the variance of probabilities among the zeroprobability likelihood functions. In addition, the number of likelihood functions with zero probability tends to increase as the SNR increases and the proposed biasedlearning ML detection needs to replace a large number of the zeroprobability likelihood functions with the bias probability . Accordingly, although the proposed biasedlearning ML detection improves the detection performance, the large dependency on the bias probability in the high SNR may not be desirable. To resolve such challenges, we further propose a ditheringandlearning onebit ML detection method with the presence of the SNR knowledge.
IiiB2 Robust LearningBased OneBit ML with SNR
Now, we propose a ditheringandlearning onebit ML detection under the SNR knowledge at the BS, where the noise variance is known to the BS as well as the transmit power . As shown in Fig. 1, the BS adds dithering signals to the quantization inputs during only the pilot transmission phase to draw the change in the sign of output sequences . After dithering, the quantization input in the realvector form becomes
(12)  
(13) 
We use which follows a Gaussian distribution with zero mean and variance of , i.e., , and the variance is known to the BS. Then, the dithered and quantized signal becomes
(14) 
Now, the BS computes the estimated likelihood function for the dithered signals as in (10) for . Let . Then, as shown in (8), is theoretically derived as
(15) 
Since and are known to the BS and is estimated from (10), the BS can estimate by using (15):
(16) 
Finally, the BS utilizes the estimated and known to estimate the true (nondithered) likelihood function by using (8). Since the likelihood function of the dithered signal in (15) is much less likely to have zero probability than the nondithered case, the BS can learn majority of the likelihood functions with a reasonable training length .
Likelihood functions with zero probability can still exist even after dithering when is insufficient. If the BS observes zeroprobability likelihood functions, it can also apply the biasedlearning approach to the likelihood functions. Unlike conventional dithering approaches [19, 20], the dithered signal is not removed after quantization in the ditheringandlearning onebit ML detection method since the dithering is used only for drawing the change in the signs of the sequence of received signals within . In addition, the BS only needs to know the variance of the dithering distribution, and the dithering is not used during the data transmission phase.
The estimation of the SNR —equivalently noise variance in this paper—can be performed by offline training as shown in Fig. 3(a). The offline training first collects training data and measures the SNR by estimating channels. Then, the BS obtains data sets of over tested SNR values, where denotes the average number of zeroprobability likelihood functions out of for the SNR . Using the collected data sets, the offline training provides the mapping between the average number of likelihood functions with zero probability and the SNR level, providing the estimated value of
. As a mapping function, popular supervised learning methods can be used such as linear regression and neural network
[21]. Fig. 3 shows the example of offline training with 5th order linear regression, which we use for simulations.IiiC Post Update of Likelihood Functions
The performance of the proposed algorithms can be further improved by adopting the post update approach which exploits the correctly decoded data symbols to update the initially estimated likelihood functions [18]. To this end, the BS divides the data transmission into subframes of length , i.e., , and appends cyclicredundancycheck (CRC) bits at each data subframe. Then, when each data subframe is correctly decoded, which can be determined by checking CRC, the BS uses the decoded symbols as pilot symbols to update the initial likelihood functions.
Using the correctly decoded symbols, the likelihood functions for the biasedlearning approach can directly be updated: after decoding each data subframe , the likelihood functions are updated as (10) by counting the number of out of , where denotes the number of cases where the decoded data is in the successfully decoded data subframes during the first data subframes, . For the ditheringandlearning method, the likelihood functions are updated after decoding each data subframe as shown in [18].
(17) 
where is the number of successfully decoded data frames during the first data subframes, is the update rate for after decoding the th data subframe, is the initially estimated likelihood function from the training phase, and is the likelihood function for the candidate symbol vector at the th quantized signal learned from the correctly decoded data subframes. The optimal value of the parameter , however, needs to be empirically determined. Accordingly, such update approach can provide more benefit to the biasedlearning method than the ditheringandlearning method.
Iv Simulation Results
In this section, we evaluate the performance of the proposed learningbased algorithms in terms of the SER. In simulations, we compare the following learningbased detection methods which does not require the channel estimation:

Learning 1bit ML: conventional learningbased ML

empirical ML Detection (eMLD) in [17]

MinimumMeanDistance (MMD) in [17]

MinimumCenterDistance (MCD) in [17]

Biased learning 1bit ML (proposed)

Dithered learning 1bit ML (proposed) with perfect SNR knowledge and with estimated SNR.
In addition, we also evaluate onebit ADC detection methods that require the channel estimation to provide reference performance: onebit zero forcing (ZF) detection in [12] and optimal onebit ML introduced in Section IIIA. We consider receive antennas, users with QAM transmission, Rayleigh channels whose each element follows , and bias probability for simulations. In addition, we use the proposed offline training with th order linear regression to estimate the SNR for the ditheringandlearning method.
Fig. 5 shows the SER for (a) and (b) with the dithering noise variance of . We note that the proposed algorithms closely follow the SER performance of the optimal onebit ML case over the considered SNR range in both Fig. 5(a) and Fig. 5(b). Although the onebit ZF detection show the better performance than the other methods in the low SNR, it shows the large performance degradation in the medium to high SNR. The proposed methods outperform the onebit ZF detection and the other learningbased methods with the same such as conventional learningbased onebit ML, eMLD, MMD, and MCD in most cases. In particular, as the number of training increases, the performance gap between the proposed methods and the other learningbased methods increases.
The performance improvement is achieved because the proposed methods provide robust likelihood function learning with the same , and thus, the ML detection can be directly performed, which is optimal for certain cases. We further note that the ditheringandlearning ML method with the estimated SNR achieves similar performance to the perfect SNR case, which shows the effectiveness of the offline learning and robustness of the proposed detection method to the SNR estimation error. Accordingly, the proposed learningbased detection methods achieve a near optimal detection performance over the low to high SNR regime, providing the robust performance with respect to the training length. In addition, the training length can be reduced to half of the desired length by utilizing a symmetric property of constellation mapping and quantization [18].
Fig. 6 shows the average number of zeroprobability likelihood functions versus the SNR level for the nondithering case and dithering case with training and dithering noise variance. As the SNR increases, the number of zero probability likelihood functions for the nondithering case rapidly increases, and more than out of (about ) likelihood functions have zero probability in the high SNR. For the dithering case, however, the number of zeroprobability likelihood functions slowly increases with the SNR and converges to about (about ) due to the dithering effect. Accordingly, the dithering case provides about nonzero likelihood functions while the nondithering case offers only about nonzero likelihood functions in the high SNR. Therefore, with dithering, the proposed algorithm can estimate much more likelihood functions— in this case, thereby increasing the detection accuracy. This corresponds to the discussion provided in Section IIIB.
The proposed method with the post likelihood function update approach is also evaluated with , , , and bit CRC in Fig. 7. In the high SNR regime where most subframes can be correctly decoded, the biasedlearning method shows noticeable SER improvement and outperforms the ditheringandlearning method while the ditheringandlearning method shows marginal or no improvement. This corresponds to the intuition that the post update approach provides more opportunity for the biasedlearning onebit ML detection to improve its detection accuracy. The ditheringandlearning method, however, still shows high detection accuracy and robustness to the training length and the SNR level. Therefore, the proposed algorithms provide near optimal onebit ML detection performance with a reasonable training length.
V Conclusion
In this paper, we proposed robust learningbased onebit ML detection methods for uplink massive MIMO communications. Since the performance of a learningbased onebit detection approach can be severely degraded when the number of training symbols is insufficient, the proposed methods addressed such challenge by adopting bias probability and a dithering technique. Without requiring the channel knowledge, the biasedlearning method and the ditheringandlearning method perform ML detection through learning likelihood functions, which is robust to the number of training symbols. Simulation results demonstrate the detection performance of the proposed methods in terms of symbol error rate. Therefore, the proposed robust learningbased onebit ML detection methods can potentially achieve the improved performancepower tradeoff for onebit massive MIMO systems.
References
 [1] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy and spectral efficiency of very large multiuser MIMO systems,” IEEE Trans. on Commun., vol. 61, no. 4, pp. 1436–1449, 2013.
 [2] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “Massive MIMO for next generation wireless systems,” IEEE Commun. Mag., vol. 52, no. 2, pp. 186–195, 2014.
 [3] Z. Pi and F. Khan, “An introduction to millimeterwave mobile broadband systems,” IEEE Commun. Mag., vol. 49, no. 6, 2011.
 [4] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. Soong, and J. C. Zhang, “What will 5G be?” IEEE Journal on Sel. Areas in Commun., vol. 32, no. 6, pp. 1065–1082, 2014.
 [5] C.K. Wen, C.J. Wang, S. Jin, K.K. Wong, and P. Ting, “Bayesoptimal joint channelanddata estimation for massive MIMO with lowprecision ADCs,” IEEE Trans. on Signal Process., vol. 64, no. 10, pp. 2541–2556, 2016.
 [6] C. Studer and G. Durisi, “Quantized massive mumimoofdm uplink,” IEEE Trans. on Commun., vol. 64, no. 6, pp. 2387–2399, 2016.
 [7] J. Choi, B. L. Evans, and A. Gatherer, “Resolutionadaptive hybrid MIMO architectures for millimeter wave communications,” IEEE Trans. on Signal Process., vol. 65, no. 23, pp. 6201–6216, 2017.
 [8] J. Choi, J. Sung, B. L. Evans, and A. Gatherer, “Antenna Selection for LargeScale MIMO Systems with LowResolution ADCs,” IEEE Int. Conf. on Acoustics, Speech, and Signal Process., 2018.
 [9] A. Mezghani and J. A. Nossek, “On ultrawideband MIMO systems with 1bit quantized outputs: Performance analysis and input optimization,” in IEEE Int. Symposium on Inform. Theory. IEEE, 2007, pp. 1286–1289.
 [10] J. Mo and R. W. Heath, “Capacity analysis of onebit quantized MIMO systems with transmitter channel state information,” IEEE Trans. on Signal Process., vol. 63, no. 20, pp. 5498–5512, 2015.
 [11] S. Wang, Y. Li, and J. Wang, “Multiuser detection in massive spatial modulation MIMO with lowresolution ADCs,” IEEE Trans. on Wireless Commun., vol. 14, no. 4, pp. 2156–2168, 2015.
 [12] J. Choi, D. J. Love, D. R. Brown III, and M. Boutin, “Quantized Distributed Reception for MIMO Wireless Systems Using Spatial Multiplexing,” IEEE Trans. Signal Process., vol. 63, no. 13, pp. 3537–3548, 2015.
 [13] J. Choi, J. Mo, and R. W. Heath, “Near maximumlikelihood detector and channel estimator for uplink multiuser massive MIMO systems with onebit ADCs,” IEEE Trans. on Commun., vol. 64, no. 5, pp. 2005–2018, 2016.
 [14] J. Mo, P. Schniter, N. G. Prelcic, and R. W. Heath, “Channel estimation in millimeter wave MIMO systems with onebit quantization,” in Asilomar Conf. on Signals, Systems and Comp., 2014, pp. 957–961.
 [15] Y. Li, C. Tao, G. SecoGranados, A. Mezghani, A. L. Swindlehurst, and L. Liu, “Channel estimation and performance analysis of onebit massive MIMO systems,” IEEE Trans. Signal Process., vol. 65, no. 15, pp. 4075–4089, 2017.
 [16] C. Mollén, J. Choi, E. G. Larsson, and R. W. Heath Jr, “Uplink Performance of Wideband Massive MIMO With OneBit ADCs.” IEEE Trans. Wireless Commun., vol. 16, no. 1, pp. 87–100, 2017.
 [17] Y.S. Jeon, S.N. Hong, and N. Lee, “SupervisedLearningAided Communication Framework for MIMO Systems with LowResolution ADCs,” IEEE Trans. on Veh. Technol., 2018.

[18]
Y.S. Jeon, M. So, and N. Lee, “Reinforcementlearningaided ML detector for uplink massive MIMO systems with lowprecision ADCs,” in
IEEE Wireless Commun. and Networking Conf., 2018, pp. 1–6.  [19] L. Schuchman, “Dither signals and their effect on quantization noise,” IEEE Trans. on Commun. Technol., vol. 12, no. 4, pp. 162–165, 1964.
 [20] R. M. Gray and T. G. Stockham, “Dithered quantizers,” IEEE Trans. on Inform. Theory, vol. 39, no. 3, pp. 805–812, 1993.
 [21] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, 1989.
Comments
There are no comments yet.