Deep Signal Recovery with One-Bit Quantization

Machine learning, and more specifically deep learning, have shown remarkable performance in sensing, communications, and inference. In this paper, we consider the application of the deep unfolding technique in the problem of signal reconstruction from its one-bit noisy measurements. Namely, we propose a model-based machine learning method and unfold the iterations of an inference optimization algorithm into the layers of a deep neural network for one-bit signal recovery. The resulting network, which we refer to as DeepRec, can efficiently handle the recovery of high-dimensional signals from acquired one-bit noisy measurements. The proposed method results in an improvement in accuracy and computational efficiency with respect to the original framework as shown through numerical analysis.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

12/10/2018

Signal Recovery From 1-Bit Quantized Noisy Samples via Adaptive Thresholding

In this paper, we consider the problem of signal recovery from 1-bit noi...
02/05/2021

LoRD-Net: Unfolded Deep Detection Network with Low-Resolution Receivers

The need to recover high-dimensional signals from their noisy low-resolu...
07/04/2019

Deep Coupled-Representation Learning for Sparse Linear Inverse Problems with Side Information

In linear inverse problems, the goal is to recover a target signal from ...
11/27/2019

Model-Aware Deep Architectures for One-Bit Compressive Variational Autoencoding

Parameterized mathematical models play a central role in understanding a...
12/10/2019

Deep One-bit Compressive Autoencoding

Parameterized mathematical models play a central role in understanding a...
02/10/2021

Impact of Bit Allocation Strategies on Machine Learning Performance in Rate Limited Systems

Intelligent entities such as self-driving vehicles, with their data bein...
07/20/2016

Onsager-corrected deep learning for sparse linear inverse problems

Deep learning has gained great popularity due to its widespread success ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Quantization of signals of interest is an integral part of all modern digital signal processing applications such as sensing, communication, and inference. In an ideal hardware implementation of a quantization system, a high-resolution analog-to-digital converter (ADC) with -bit resolution and sampling frequency of samples the original analog signal and maps the obtained samples into a discrete state space of size . Generally, a large number of bits is required to obtain an accurate digital representation of the analog signal. In such a case, the quantization process has negligible impact on the performance of algorithms which were typically developed on the assumptions of infinite precision samples, and thus, the high-resolution (in terms of amplitude) quantization process can be directly modeled as an additive noise source. However, a crucial obstacle with modern ADCs is that their power consumption, manufacturing cost, and chip area grows exponentially with their resolution [1, 2, 3].

The required high sampling data rate of ADCs used in next generataion communications systems is another obstacle that must be tackled in such applications. For instance, the promising millimeter wave (mmWave) multiple-input multiple output (MIMO) communication technology requires a very large bandwidth, and the corresponding sampling rate of the ADCs must increase accordingly. However, manufacturing ADCs with high-resolution (e.g., more than 8 bits) and high sampling rate are extremely costly and may not be available. Moreover, in other applications such as spectral sensing and cognitive radio, which require extremely high sampling rates, the cumulative cost and power consumption of using high-resolution and extremely fast ADCs are typically prohibitive and impractical. Hence, when signals across a wide frequency band are of interest, a fundamental trade-off between sampling rate, amplitude quantization precision, cost, and power consumption is encountered. An immediate solution to such challenges is to use low-resolution, and specifically one-bit, ADCs. The use of one-bit signed measurements, and more specifically one-bit ADCs, allows for an extremely high sampling rate at a low cost and low power consumption. From a sampling viewpoint, the most extreme case of quantization is to use only one bit per sample. More precisely, one-bit sampling can be seen as a process through which we repeatedly compare the amplitude of a signal (at each sample) to some reference threshold level and use only one bit to convey whether the signal amplitude resides above or below that threshold. Due to its appealing sampling properties, the problem of recovering a signal from its one-bit measurements has attracted a great deal of interest over the past few years [4, 5, 6, 7, 8]. Therefore, it is vital to develop algorithms that can deal with low-resolution samples for different applications.

The fields of machine learning (ML), and more particularly deep learning, are impacting various fields of engineering and have recently attracted a great deal of attention in tackling long-standing signal processing problems. The advent of low-cost specialized powerful computing resources (e.g., GPUs, and more recently TPUs) and the continually increasing amount of massive data generated by the human population and machines, in conjunction with the new optimization and learning methods, have paved the way for deep neural networks (DNNs) and machine learning-based models to prove their effectiveness in many engineering areas (see, e.g., [9, 10, 11] and the references therein).

The main advantage of the deep learning-based model herein is that it employs several non-linear transformations to obtain an abstract representation of the underlying data. Model-based machine learning frameworks (e.g., probabilistic graphical models) incorporate prior knowledge of the system parameters into the inference process. A recent promising approach in bridging the gap between deep learning-based and model-based methods is the paradigm of

deep unfolding [12]. Particularly, iterations of a conventional recursive algorithm, such as fast iterative soft thresholding algorithm (FISTA), projected gradient descent, and approximate message passing (AMP), can be used as a baseline to design the architecture of a deep network with trainable parameters specifically customized to the problem of interest. Such a methodology results in an improvement in accuracy, and computational efficiency of the original framework. The deep unfolding method has already shown remarkable performance improvement in a wide range of applications such as MIMO communications [13, 14], multi-channel source separation [15], and sparse inverse problems [16, 17].

In this paper, we consider the general problem of high-dimensional signal recovery from random one-bit measurements. Specifically, we propose an efficient signal recovery framework based on the deep unfolding technique that has the advantage of low-complexity and near-optimal performance compared to traditional methods. Our proposed inference framework has a wide range of applications in the areas of wireless communications, detection and estimation, and sensing.

2 Problem Formulation

We begin by considering a general linear signal acquisition and one-bit quantization model with time-varying thresholds, described as follows:

Signal Model: (1)
Quantization Model: (2)

where

denotes the vector of one-bit quantization thresholds,

denotes the received signal prior to quantization, denotes the sensing matrix, denotes the multidimensional unknown vector to be recovered, and denotes the zero-mean Gaussian noise with a known covariance matrix . Furthermore, denotes the signum function applied element-wise on the vector argument.

The above model covers a wide range of applications. For instance, the described model (1)-(2) can be used in MIMO communication systems in which is the channel matrix, is the signal sent by the transmitter, is the additive Gaussian noise in the system, and the base station is equipped with one-bit ADCs, where the goal is to recover the transmitted symbols from .

2.1 Maximum Likelihood Estimator Derivation

Given the knowledge of the sensing matrix , noise covariance , and the corresponding quantization thresholds , our goal is to recover the original (likely high-dimensional) signal from the one-bit random measurements . In such a scenario, each binary observation

follows a Bernoulli distribution with parameter

, given by:

(3)

where with

representing the cumulative distribution function (cdf) of a standard Gaussian distribution and

denotes the -th row of the matrix

. In particular, the probability mass function (pmf) of each binary observation can be compactly expressed as:

(4)

where . Therefore, the log-likelihood of the quantized observations given the unknown vector can be expressed as:

(5)
(6)

where denotes the natural logarithm. As a result, the maximum likelihood (ML) estimation of can be obtained as

(7)

Observe that the maximum likelihood estimator has to satisfy the following condition:

(8)

where the gradient of the log-likelihood function with respect to the unknown vector can be derived as follows:

(9)

where . It can be observed from (9) that the gradient of the log-likelihood function is a linear combination of the rows of the sensing matrix . Let be a non-linear function defined as follows:

(10)

where the functions , , and the division, are applied element-wise on the vector argument . In addition, let be a diagonal matrix containing the one-bit observations and be the semi-whitened version of the one-bit matrix . Then, the gradient of the likelihood function in (9) can be compactly written as follows:

(11)

Recall that the ML estimator must satisfy the condition given in (8), i.e.,

(12)

Other than certain low-dimensional cases, finding a closed-form expression for that satisfies (12) is a difficult task [18, 19, 20]. Therefore, we resort to iterative methods in order to find the ML estimate, i.e., to solve (7).

In this paper, the well-known gradient ascent method is employed to iteratively solve (7). Namely, given an initial point , the update equation at each iteration is given by:

(13)
(14)

where is the step size at the -th iteration. The obtained maximum likelihood estimator derived from the signal model, and the corresponding optimization steps, can be unfolded into a multi-layer deep neural network, which improves the accuracy and computational effciency of the original framework.

In the next section, we unfold the above iterations into the layers of a deep neural network where each layer denotes one iteration of the above optimization method. Interestingly, we fix the complexity budget of the inference framework (via fixing the number of layers), and apply the gradient descent method to yield the most accurate estimation of the parameter in at most iterations.

3 Signal Recovery via Deep Unfolding

Conventionally, first-order optimization methods, such as gradient descent algorithms, have slow convergence rate, and thus take a large number of iterations to converge to a solution. Herein, we are interested in finding a good solution under the condition that the complexity of the inference algorithm is fixed. This is important since, via unfolding the optimization algorithm, we fix the computational complexity of the inference model (a DNN with layers in such a case) and optimize the parameters of the network to find the best possible estimator with a fixed-complexity constraint. Below, we introduce DeepRec, our deep learning based signal recovery framework which is designed based on the iterations of the form (14), to find the maximum likelihood estimation of the unknown parameter.
The DeepRec Architecture. The construction of DeepRec involves the unfolding of iterations each of which are of the form (14), as the layers of a deep neural network. Particularly, each step of the gradient descent method depends on the previous signal estimate , the step size , the scaled one-bit matrix , the sensing matrix , and the threshold vector . In addition, the form of the gradient vector (11) makes it convenient and insightful to unfold the iterations onto the layers of a DNN in that each iteration of the gradient descent method is a linear combination of the system paramteres followed by a non-linearity. The -th layer of DeepRec can be characterized via the following operations and variables:

(15)
(16)
(17)
(18)

where ,

denotes a non-linear activation function where in this work we consider

, and the goal is to optimize the DNN parameters, described as follows:

(19)
(a) (b) (c)
Figure 1: The performance of DeepRec: (a) demonstrates the NMSE performance of the DeepRec network for different numbers of layers . (b) shows the performance of the proposed DeepRec architecture and the original gradient descent method of (14) in terms of averaged NMSE for different numbers of one-bit samples . (c) shows a comparison of the computational cost between the gradient descent method and the proposed DeepRec network for different numbers of one-bit samples .

The proposed DeepRec architecture with layers can be interpreted as a class of estimator functions parametrized by to estimate the unknown parameter given the system parameters. In order to find the best estimator function

associated with our problem, we conduct a learning process via minimizing a loss function

, i.e.,

(20)

In this work, we employ the following least squares (LS) loss function:

(21)

where during the training phase, we synthetically generate the system parameters according to their statistical model.

4 Numerical Results

We now demonstrate the performance of the proposed DeepRec framework for the problem of one-bit signal recovery. The proposed framework was implemented using the TensorFlow library

[21], with the ADAM stochastic optimizer [22] and an exponential decaying step size. In the learning process of the network, we employed the batch training method with a batch size of

at each epoch and we performed the training for

epochs. In all of the simulations, we assumed , i.e., , and we used the normalized mean square error (NMSE) defined as , for the performance metric.

The training was performed based on the data generated via the following model. Each element of the vector

is assumed to be i.i.d and uniformly distributed, i.e.,

. The sensing matrix is assumed to be fixed and follow a Normal distribution, where we consider

. The quantization thresholds were also generated from a uniform distribution, , where the lower and upper bound of the distribution is chosen in a fashion that at least covers the domain of

. The noise is assumed to be independent from one sample to another and follows a Normal distribution, where the variance of each corresponding noise element is different, e.g., the noise covariance

, with . Note that we trained the network over a wide range of noise powers in order to make the DeepRec network more robust to noise.

Fig. 1(a) demonstrates the performance of the DeepRec network for different numbers of layers . It can be observed that the averaged NMSE decreases dramatically as the number of layers increases. Such a result is also expected as each layer corresponds to one iteration of originial optimization algorithm. Thus, as the number of layers increases, the output of the network will converge to a better estimation as well.

Fig. 1(b) demonstrates the performance of the proposed DeepRec architecture and the original Gradient Descent method of (14) in terms of averaged NMSE for different numbers of one-bit samples . In this simulation, we implemented the DeepRec network with layers. It can be clearly seen from Fig. 1(b) that the proposed deep recovery architecture (DeepRec) significantly outperforms the original optimization method in terms of accuracy and provides a considerably better estimation than that of the gradient descent method for the same number of iterations/layers. As a fair comparison, we also assumed a fixed-step size of for the gradient descent method.

Fig. 1(c) shows a comparison of the computational cost (machine runtime) between the gradient descent method and the proposed DeepRec network for different numbers of one-bit samples . It can be seen that our proposed method (DeepRec) has a significantly lower computational cost than that of the original optimization algorithm for our problem. Hence, making the DeepRec a good candidate for real-time signal processing or big data applications (the results were obtained on a standard PC with a quad-core 2.30GHz CPU and 4 GB memory).

5 Conclusion

We have considered the application of model-based machine learning, and specifically the deep unfolding technique, in the problem of recovering a high-dimensional signal from its one-bit quantized noisy measurements via random thresholding. We proposed a novel deep architecture, which we refer to as DeepRec, that was able to accurately perform the task of one-bit signal recovery. Our numerical results show that the proposed DeepRec network significantly improves the performance of traditional optimization methods both in terms of accuracy and efficiency.

References

  • [1] Bin Le, Thomas W Rondeau, Jeffrey H Reed, and Charles W Bostian, “Analog-to-digital converters,” IEEE Signal Processing Magazine, vol. 22, no. 6, pp. 69–77, 2005.
  • [2] Nir Shlezinger, Yonina C Eldar, and Miguel RD Rodrigues, “Hardware-limited task-based quantization,” arXiv preprint arXiv:1807.08305, 2018.
  • [3] Alon Kipnis, Yonina C Eldar, and Andrea J Goldsmith, “Fundamental distortion limits of analog-to-digital compression,” IEEE Transactions on Information Theory, vol. 64, no. 9, pp. 6013–6033, 2018.
  • [4] Christopher Gianelli, Luzhou Xu, Jian Li, and Petre Stoica, “One-bit compressive sampling with time-varying thresholds: Maximum likelihood and the cramér-rao bound,” in 2016 50th Asilomar Conference on Signals, Systems and Computers. IEEE, 2016, pp. 399–403.
  • [5] Naveed Naimipour and Mojtaba Soltanalian, “Graph clustering using one-bit comparison data,” in 2018 IEEE Asilomar Conference on Signals, Systems and Computers. IEEE, 2018.
  • [6] Yongzhi Li, Cheng Tao, Gonzalo Seco-Granados, Amine Mezghani, A Lee Swindlehurst, and Liu Liu, “Channel estimation and performance analysis of one-bit massive MIMO systems,” IEEE Trans. Signal Process, vol. 65, no. 15, pp. 4075–4089, 2017.
  • [7] Shahin Khobahi and Mojtaba Soltanalian, “Signal recovery from 1-bit quantized noisy samples via adaptive thresholding,” in 2018 IEEE Asilomar Conference on Signals, Systems and Computers. IEEE, 2018.
  • [8] Fangqing Liu, Heng Zhu, Jian Li, Pu Wang, and Philip V Orlik, “Massive MIMO channel estimation using signed measurements with antenna-varying thresholds,” in 2018 IEEE Statistical Signal Processing Workshop (SSP). IEEE, 2018, pp. 188–192.
  • [9] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436, 2015.
  • [10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun,

    “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,”

    in

    Proceedings of the IEEE international conference on computer vision

    , 2015, pp. 1026–1034.
  • [11] Li Deng, Dong Yu, et al., “Deep learning: methods and applications,” Foundations and Trends® in Signal Processing, vol. 7, no. 3–4, pp. 197–387, 2014.
  • [12] John R Hershey, Jonathan Le Roux, and Felix Weninger, “Deep unfolding: Model-based inspiration of novel deep architectures,” arXiv preprint arXiv:1409.2574, 2014.
  • [13] Hengtao He, Chao-Kai Wen, Shi Jin, and Geoffrey Ye Li, “A model-driven deep learning network for MIMO detection,” arXiv preprint arXiv:1809.09336, 2018.
  • [14] Neev Samuel, Tzvi Diskin, and Ami Wiesel, “Learning to detect,” arXiv preprint arXiv:1805.07631, 2018.
  • [15] Scott Wisdom, John Hershey, Jonathan Le Roux, and Shinji Watanabe, “Deep unfolding for multichannel source separation,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 121–125.
  • [16] Karol Gregor and Yann LeCun, “Learning fast approximations of sparse coding,” in Proceedings of the 27th International Conference on Machine Learning. Omnipress, 2010, pp. 399–406.
  • [17] Mark Borgerding and Philip Schniter, “Onsager-corrected deep learning for sparse linear inverse problems,” in 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE, 2016, pp. 227–231.
  • [18] Michel T Ivrlac and Josef A Nossek, “On MIMO channel estimation with single-bit signal-quantization,” in ITG Smart Antenna Workshop, 2007.
  • [19] Amine Mezghani, Felix Antreich, and Josef A Nossek, “Multiple parameter estimation with quantized channel output,” in 2010 International ITG Workshop on Smart Antennas (WSA). IEEE, 2010, pp. 143–150.
  • [20] Jianhua Mo, Philip Schniter, Nuria González Prelcic, and Robert W Heath, “Channel estimation in millimeter wave MIMO systems with one-bit quantization,” in 2014 48th Asilomar Conference on Signals, Systems and Computers. IEEE, 2014, pp. 957–961.
  • [21] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al., “Tensorflow: a system for large-scale machine learning.,” in OSDI, 2016, vol. 16, pp. 265–283.
  • [22] Diederik P Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.