1 Introduction
Quantization of signals of interest is an integral part of all modern digital signal processing applications such as sensing, communication, and inference. In an ideal hardware implementation of a quantization system, a highresolution analogtodigital converter (ADC) with bit resolution and sampling frequency of samples the original analog signal and maps the obtained samples into a discrete state space of size . Generally, a large number of bits is required to obtain an accurate digital representation of the analog signal. In such a case, the quantization process has negligible impact on the performance of algorithms which were typically developed on the assumptions of infinite precision samples, and thus, the highresolution (in terms of amplitude) quantization process can be directly modeled as an additive noise source. However, a crucial obstacle with modern ADCs is that their power consumption, manufacturing cost, and chip area grows exponentially with their resolution [1, 2, 3].
The required high sampling data rate of ADCs used in next generataion communications systems is another obstacle that must be tackled in such applications. For instance, the promising millimeter wave (mmWave) multipleinput multiple output (MIMO) communication technology requires a very large bandwidth, and the corresponding sampling rate of the ADCs must increase accordingly. However, manufacturing ADCs with highresolution (e.g., more than 8 bits) and high sampling rate are extremely costly and may not be available. Moreover, in other applications such as spectral sensing and cognitive radio, which require extremely high sampling rates, the cumulative cost and power consumption of using highresolution and extremely fast ADCs are typically prohibitive and impractical. Hence, when signals across a wide frequency band are of interest, a fundamental tradeoff between sampling rate, amplitude quantization precision, cost, and power consumption is encountered. An immediate solution to such challenges is to use lowresolution, and specifically onebit, ADCs. The use of onebit signed measurements, and more specifically onebit ADCs, allows for an extremely high sampling rate at a low cost and low power consumption. From a sampling viewpoint, the most extreme case of quantization is to use only one bit per sample. More precisely, onebit sampling can be seen as a process through which we repeatedly compare the amplitude of a signal (at each sample) to some reference threshold level and use only one bit to convey whether the signal amplitude resides above or below that threshold. Due to its appealing sampling properties, the problem of recovering a signal from its onebit measurements has attracted a great deal of interest over the past few years [4, 5, 6, 7, 8]. Therefore, it is vital to develop algorithms that can deal with lowresolution samples for different applications.
The fields of machine learning (ML), and more particularly deep learning, are impacting various fields of engineering and have recently attracted a great deal of attention in tackling longstanding signal processing problems. The advent of lowcost specialized powerful computing resources (e.g., GPUs, and more recently TPUs) and the continually increasing amount of massive data generated by the human population and machines, in conjunction with the new optimization and learning methods, have paved the way for deep neural networks (DNNs) and machine learningbased models to prove their effectiveness in many engineering areas (see, e.g., [9, 10, 11] and the references therein).
The main advantage of the deep learningbased model herein is that it employs several nonlinear transformations to obtain an abstract representation of the underlying data. Modelbased machine learning frameworks (e.g., probabilistic graphical models) incorporate prior knowledge of the system parameters into the inference process. A recent promising approach in bridging the gap between deep learningbased and modelbased methods is the paradigm of
deep unfolding [12]. Particularly, iterations of a conventional recursive algorithm, such as fast iterative soft thresholding algorithm (FISTA), projected gradient descent, and approximate message passing (AMP), can be used as a baseline to design the architecture of a deep network with trainable parameters specifically customized to the problem of interest. Such a methodology results in an improvement in accuracy, and computational efficiency of the original framework. The deep unfolding method has already shown remarkable performance improvement in a wide range of applications such as MIMO communications [13, 14], multichannel source separation [15], and sparse inverse problems [16, 17].In this paper, we consider the general problem of highdimensional signal recovery from random onebit measurements. Specifically, we propose an efficient signal recovery framework based on the deep unfolding technique that has the advantage of lowcomplexity and nearoptimal performance compared to traditional methods. Our proposed inference framework has a wide range of applications in the areas of wireless communications, detection and estimation, and sensing.
2 Problem Formulation
We begin by considering a general linear signal acquisition and onebit quantization model with timevarying thresholds, described as follows:
Signal Model:  (1)  
Quantization Model:  (2) 
where
denotes the vector of onebit quantization thresholds,
denotes the received signal prior to quantization, denotes the sensing matrix, denotes the multidimensional unknown vector to be recovered, and denotes the zeromean Gaussian noise with a known covariance matrix . Furthermore, denotes the signum function applied elementwise on the vector argument.The above model covers a wide range of applications. For instance, the described model (1)(2) can be used in MIMO communication systems in which is the channel matrix, is the signal sent by the transmitter, is the additive Gaussian noise in the system, and the base station is equipped with onebit ADCs, where the goal is to recover the transmitted symbols from .
2.1 Maximum Likelihood Estimator Derivation
Given the knowledge of the sensing matrix , noise covariance , and the corresponding quantization thresholds , our goal is to recover the original (likely highdimensional) signal from the onebit random measurements . In such a scenario, each binary observation
follows a Bernoulli distribution with parameter
, given by:(3) 
where with
representing the cumulative distribution function (cdf) of a standard Gaussian distribution and
denotes the th row of the matrix. In particular, the probability mass function (pmf) of each binary observation can be compactly expressed as:
(4) 
where . Therefore, the loglikelihood of the quantized observations given the unknown vector can be expressed as:
(5)  
(6) 
where denotes the natural logarithm. As a result, the maximum likelihood (ML) estimation of can be obtained as
(7) 
Observe that the maximum likelihood estimator has to satisfy the following condition:
(8) 
where the gradient of the loglikelihood function with respect to the unknown vector can be derived as follows:
(9) 
where . It can be observed from (9) that the gradient of the loglikelihood function is a linear combination of the rows of the sensing matrix . Let be a nonlinear function defined as follows:
(10) 
where the functions , , and the division, are applied elementwise on the vector argument . In addition, let be a diagonal matrix containing the onebit observations and be the semiwhitened version of the onebit matrix . Then, the gradient of the likelihood function in (9) can be compactly written as follows:
(11) 
Recall that the ML estimator must satisfy the condition given in (8), i.e.,
(12) 
Other than certain lowdimensional cases, finding a closedform expression for that satisfies (12) is a difficult task [18, 19, 20]. Therefore, we resort to iterative methods in order to find the ML estimate, i.e., to solve (7).
In this paper, the wellknown gradient ascent method is employed to iteratively solve (7). Namely, given an initial point , the update equation at each iteration is given by:
(13)  
(14) 
where is the step size at the th iteration. The obtained maximum likelihood estimator derived from the signal model, and the corresponding optimization steps, can be unfolded into a multilayer deep neural network, which improves the accuracy and computational effciency of the original framework.
In the next section, we unfold the above iterations into the layers of a deep neural network where each layer denotes one iteration of the above optimization method. Interestingly, we fix the complexity budget of the inference framework (via fixing the number of layers), and apply the gradient descent method to yield the most accurate estimation of the parameter in at most iterations.
3 Signal Recovery via Deep Unfolding
Conventionally, firstorder optimization methods, such as gradient descent algorithms, have slow convergence rate, and thus take a large number of iterations to converge to a solution. Herein, we are interested in finding a good solution under the condition that the complexity of the inference algorithm is fixed. This is important since, via unfolding the optimization algorithm, we fix the computational complexity of the inference model (a DNN with layers in such a case) and optimize the parameters of the network to find the best possible estimator with a fixedcomplexity constraint. Below, we introduce DeepRec, our deep learning based signal recovery framework which is designed based on the iterations of the form (14), to find the maximum likelihood estimation of the unknown parameter.
—The DeepRec Architecture. The construction of DeepRec involves the unfolding of iterations each of which are of the form (14), as the layers of a deep neural network. Particularly, each step of the gradient descent method depends on the previous signal estimate , the step size , the scaled onebit matrix , the sensing matrix , and the threshold vector . In addition, the form of the gradient vector (11) makes it convenient and insightful to unfold the iterations onto the layers of a DNN in that each iteration of the gradient descent method is a linear combination of the system paramteres followed by a nonlinearity. The th layer of DeepRec can be characterized via the following operations and variables:
(15)  
(16)  
(17)  
(18) 
where ,
denotes a nonlinear activation function where in this work we consider
, and the goal is to optimize the DNN parameters, described as follows:(19) 
The proposed DeepRec architecture with layers can be interpreted as a class of estimator functions parametrized by to estimate the unknown parameter given the system parameters. In order to find the best estimator function
associated with our problem, we conduct a learning process via minimizing a loss function
, i.e.,(20) 
In this work, we employ the following least squares (LS) loss function:
(21) 
where during the training phase, we synthetically generate the system parameters according to their statistical model.
4 Numerical Results
We now demonstrate the performance of the proposed DeepRec framework for the problem of onebit signal recovery. The proposed framework was implemented using the TensorFlow library
[21], with the ADAM stochastic optimizer [22] and an exponential decaying step size. In the learning process of the network, we employed the batch training method with a batch size ofat each epoch and we performed the training for
epochs. In all of the simulations, we assumed , i.e., , and we used the normalized mean square error (NMSE) defined as , for the performance metric.The training was performed based on the data generated via the following model. Each element of the vector
is assumed to be i.i.d and uniformly distributed, i.e.,
. The sensing matrix is assumed to be fixed and follow a Normal distribution, where we consider
. The quantization thresholds were also generated from a uniform distribution, , where the lower and upper bound of the distribution is chosen in a fashion that at least covers the domain of. The noise is assumed to be independent from one sample to another and follows a Normal distribution, where the variance of each corresponding noise element is different, e.g., the noise covariance
, with . Note that we trained the network over a wide range of noise powers in order to make the DeepRec network more robust to noise.Fig. 1(a) demonstrates the performance of the DeepRec network for different numbers of layers . It can be observed that the averaged NMSE decreases dramatically as the number of layers increases. Such a result is also expected as each layer corresponds to one iteration of originial optimization algorithm. Thus, as the number of layers increases, the output of the network will converge to a better estimation as well.
Fig. 1(b) demonstrates the performance of the proposed DeepRec architecture and the original Gradient Descent method of (14) in terms of averaged NMSE for different numbers of onebit samples . In this simulation, we implemented the DeepRec network with layers. It can be clearly seen from Fig. 1(b) that the proposed deep recovery architecture (DeepRec) significantly outperforms the original optimization method in terms of accuracy and provides a considerably better estimation than that of the gradient descent method for the same number of iterations/layers. As a fair comparison, we also assumed a fixedstep size of for the gradient descent method.
Fig. 1(c) shows a comparison of the computational cost (machine runtime) between the gradient descent method and the proposed DeepRec network for different numbers of onebit samples . It can be seen that our proposed method (DeepRec) has a significantly lower computational cost than that of the original optimization algorithm for our problem. Hence, making the DeepRec a good candidate for realtime signal processing or big data applications (the results were obtained on a standard PC with a quadcore 2.30GHz CPU and 4 GB memory).
5 Conclusion
We have considered the application of modelbased machine learning, and specifically the deep unfolding technique, in the problem of recovering a highdimensional signal from its onebit quantized noisy measurements via random thresholding. We proposed a novel deep architecture, which we refer to as DeepRec, that was able to accurately perform the task of onebit signal recovery. Our numerical results show that the proposed DeepRec network significantly improves the performance of traditional optimization methods both in terms of accuracy and efficiency.
References
 [1] Bin Le, Thomas W Rondeau, Jeffrey H Reed, and Charles W Bostian, “Analogtodigital converters,” IEEE Signal Processing Magazine, vol. 22, no. 6, pp. 69–77, 2005.
 [2] Nir Shlezinger, Yonina C Eldar, and Miguel RD Rodrigues, “Hardwarelimited taskbased quantization,” arXiv preprint arXiv:1807.08305, 2018.
 [3] Alon Kipnis, Yonina C Eldar, and Andrea J Goldsmith, “Fundamental distortion limits of analogtodigital compression,” IEEE Transactions on Information Theory, vol. 64, no. 9, pp. 6013–6033, 2018.
 [4] Christopher Gianelli, Luzhou Xu, Jian Li, and Petre Stoica, “Onebit compressive sampling with timevarying thresholds: Maximum likelihood and the cramérrao bound,” in 2016 50th Asilomar Conference on Signals, Systems and Computers. IEEE, 2016, pp. 399–403.
 [5] Naveed Naimipour and Mojtaba Soltanalian, “Graph clustering using onebit comparison data,” in 2018 IEEE Asilomar Conference on Signals, Systems and Computers. IEEE, 2018.
 [6] Yongzhi Li, Cheng Tao, Gonzalo SecoGranados, Amine Mezghani, A Lee Swindlehurst, and Liu Liu, “Channel estimation and performance analysis of onebit massive MIMO systems,” IEEE Trans. Signal Process, vol. 65, no. 15, pp. 4075–4089, 2017.
 [7] Shahin Khobahi and Mojtaba Soltanalian, “Signal recovery from 1bit quantized noisy samples via adaptive thresholding,” in 2018 IEEE Asilomar Conference on Signals, Systems and Computers. IEEE, 2018.
 [8] Fangqing Liu, Heng Zhu, Jian Li, Pu Wang, and Philip V Orlik, “Massive MIMO channel estimation using signed measurements with antennavarying thresholds,” in 2018 IEEE Statistical Signal Processing Workshop (SSP). IEEE, 2018, pp. 188–192.
 [9] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436, 2015.

[10]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,
“Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification,”
inProceedings of the IEEE international conference on computer vision
, 2015, pp. 1026–1034.  [11] Li Deng, Dong Yu, et al., “Deep learning: methods and applications,” Foundations and Trends® in Signal Processing, vol. 7, no. 3–4, pp. 197–387, 2014.
 [12] John R Hershey, Jonathan Le Roux, and Felix Weninger, “Deep unfolding: Modelbased inspiration of novel deep architectures,” arXiv preprint arXiv:1409.2574, 2014.
 [13] Hengtao He, ChaoKai Wen, Shi Jin, and Geoffrey Ye Li, “A modeldriven deep learning network for MIMO detection,” arXiv preprint arXiv:1809.09336, 2018.
 [14] Neev Samuel, Tzvi Diskin, and Ami Wiesel, “Learning to detect,” arXiv preprint arXiv:1805.07631, 2018.
 [15] Scott Wisdom, John Hershey, Jonathan Le Roux, and Shinji Watanabe, “Deep unfolding for multichannel source separation,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 121–125.
 [16] Karol Gregor and Yann LeCun, “Learning fast approximations of sparse coding,” in Proceedings of the 27th International Conference on Machine Learning. Omnipress, 2010, pp. 399–406.
 [17] Mark Borgerding and Philip Schniter, “Onsagercorrected deep learning for sparse linear inverse problems,” in 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 2016, pp. 227–231.
 [18] Michel T Ivrlac and Josef A Nossek, “On MIMO channel estimation with singlebit signalquantization,” in ITG Smart Antenna Workshop, 2007.
 [19] Amine Mezghani, Felix Antreich, and Josef A Nossek, “Multiple parameter estimation with quantized channel output,” in 2010 International ITG Workshop on Smart Antennas (WSA). IEEE, 2010, pp. 143–150.
 [20] Jianhua Mo, Philip Schniter, Nuria González Prelcic, and Robert W Heath, “Channel estimation in millimeter wave MIMO systems with onebit quantization,” in 2014 48th Asilomar Conference on Signals, Systems and Computers. IEEE, 2014, pp. 957–961.
 [21] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al., “Tensorflow: a system for largescale machine learning.,” in OSDI, 2016, vol. 16, pp. 265–283.
 [22] Diederik P Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
Comments
There are no comments yet.