# Deep One-bit Compressive Autoencoding

Parameterized mathematical models play a central role in understanding and design of complex information systems. However, they often cannot take into account the intricate interactions innate to such systems. On the contrary, purely data-driven approaches do not need explicit mathematical models for data generation and have a wider applicability at the cost of interpretability. In this paper, we consider the design of a one-bit compressive autoencoder, and propose a novel hybrid model-based and data-driven methodology that allows us to not only design the sensing matrix for one-bit data acquisition, but also allows for learning the latent-parameters of an iterative optimization algorithm specifically designed for the problem of one-bit sparse signal recovery. Our results demonstrate a significant improvement compared to state-of-the-art model-based algorithms.

## Authors

• 11 publications
• 4 publications
• 13 publications
11/27/2019

### Model-Aware Deep Architectures for One-Bit Compressive Variational Autoencoding

Parameterized mathematical models play a central role in understanding a...
12/21/2020

### Unfolded Algorithms for Deep Phase Retrieval

Exploring the idea of phase retrieval has been intriguing researchers fo...
11/30/2018

### Deep Signal Recovery with One-Bit Quantization

Machine learning, and more specifically deep learning, have shown remark...
12/15/2020

### Model-Based Deep Learning

Signal processing, communications, and control have traditionally relied...
02/05/2021

### LoRD-Net: Unfolded Deep Detection Network with Low-Resolution Receivers

The need to recover high-dimensional signals from their noisy low-resolu...
10/20/2021

### Robust lEarned Shrinkage-Thresholding (REST): Robust unrolling for sparse recover

In this paper, we consider deep neural networks for solving inverse prob...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In the past decade, compressive sensing (CS) has shown significant potential in enhancing signal sensing and recovery performance with simpler hardware resources, and thus, has attracted noteworthy attention among researchers. CS is a method of signal acquisition which ensures the exact or almost exact reconstruction of certain classes of signals using far less number of samples than that is needed in Nyquist sampling [4472240]—where the signals are typically reconstructed by finding the sparsest solution of an under-determined system of equations using various available means.

Note that in a practical settings, each measurement needs to be digitized into finite-precision values for further processing and storage purposes, which inevitably introduces a quantization error. This error is generally dealt with as measurement noise possessing limited energy; an approach that does not perform well in extreme cases. One-bit CS is one such extreme case where the quantizer is a simple sign comparator and each measurement is represented using only one bit, i.e., or [4558487, 5955138, 6418031, 6404739, 6178284, zhang2014efficient]. One-bit quantizers are not only low-cost and low-power hardware components, but also much faster than traditional scalar quantizers, accompanied by great reduction in the complexity of hardware implementation. Several algorithms have been introduced in the literature for efficient reconstruction of sparse signals in one-bit CS scenarios (e.g., see [4558487, 5955138, 6418031, 6404739, 6178284, zhang2014efficient] and the references therein).

### 1.1 Relevant Prior Art

The current one-bit CS recovery algorithms generally exploit the consistency principle, which assumes that the element-wise product of the sparse signal and the corresponding measurement is always positive [4558487]. In [4558487], the authors introduce a regularizer based recovery algorithm called renormalized fixed point iteration (RFPI) using a convex barrier function as a regularizer for the consistency principle. We discuss the formulation of RFPI in Section 2 in details. Another such reconstruction algorithm can be found in [5955138], referred to as restricted step shrinkage (RSS), for which a nonlinear barrier function is used as the regularizer. Compared to RFPI algorithm, RSS has three important advantages: provable convergence, improved consistency, and feasible performance [Li2018]. Ref. [6418031] introduces a penalty-based robust recovery algorithm, called binary iterative hard thresholding (BIHT), in order to enforce the consistency principle. Contrary to RFPI algorithm, BIHT needs the sparsity level of the signal as input. Both RFPI and BIHT, however, perform very poorly in the presence of measurement noise, when bit flips occur. In [6404739] and [6178284], authors proposed modified versions of RFPI and BIHT, referred to as noise-adaptive renormalized fixed point iteration (NARFPI) and

adaptive outlier pursuit with sign flips

(AOP-f), that are robust against bit flips in the measurement vector.

We note that model-based algorithms, such as the ones discussed above, often do not take into account the intricate interactions and the latent-parameters innate to complex signal processing systems. Therefore, there has recently been a high demand for developing effective real-time signal processing algorithms that use the data to achieve improved performance [7780424, ILIADIS20189, 7447163]

. In particular, the data-driven approaches relying on deep neural architectures such as convolutional neural networks

[7780424], deep fully connected networks [ILIADIS20189]

, and stacked denoising autoencoders

[7447163], have been studied for sparse signal recovery in generic quantized CS setting. Such data-driven approaches do not need explicit mathematical models for data generation and have a wider applicability. On the other hand, they lack the interpretability and trustability that comes with model-based signal processing techniques. The advantages associated with both model-based and data-driven methods show the need for developing approaches that enjoy the benefits of both frameworks [hershey2014deep, 8683876].

In this paper, we bridge the gap between the data-driven and model-based approaches in the one-bit CS area and propose a specialized, yet hybrid, methodology for the purpose of sparse signal recovery from one-bit measurements.

## 2 Problem Formulation

In a one-bit CS scenario, the dynamics of the data acquisition process (i.e., the encoder module) can be formulated as:

 Encoder Module: r=sign(Φx), (1)

where denotes the sensing matrix, and is assumed to be a -sparse signal. Having the one-bit measurements of the form (1), one can pose the problem of sparse signal recovery from one-bit measurements by solving the following non-convex program:

 minx∥x∥0,s.t.r=sign(Φx), (2)

where the constraint in (2) is imposed to ensure a consistent reconstruction from the one-bit information. Further note that, the one-bit measurement consistency principle can be equivalently expressed as , where and is an element-wise matrix inequality operator. Inspired by the CS literature, the above non-convex optimization problem can be further reformulated as a non-convex -minimization program on the unit sphere:

 minx∥x∥1,s.t.RΦx⪰0,∥x∥2=1, (3)

where the -norm acts as a sparsity inducing function. The intuition behind finding the sparsest signal on the unit-sphere (i.e., fixing the energy of the recovered signal) is two-fold. First, it significantly reduces the feasible set of the optimization problem, and second, it avoids the the trivial solution of . There exists an extensive body of research on solving the above non-convex optimization problem (e.g., see [4558487, 5955138, 6404739, Plan2013, 6638799, zhang2014efficient], and the references therein). The most notable methods utilize a regularization term to enforce the consistency principle as a penalty term for the -objective function, viz.

 ^x=argminx∥x∥1+αR(RΦx),s.t.∥x∥2=1, (4)

where is the penalty factor. In this paper, we build upon the work done in [4558487] and its proposed RFPI algorithm as a base-line to design the decoder function of the proposed one-bit compressive autoencoder (AE). In particular, we unfold the iterations of a renormalized fixed-point algorithm onto the layers of a neural network in a fashion that each layer of the proposed deep architecture mimics the behavior of one iteration of the base-line algorithm. Next, we perform an end-to-end learning approach by utilizing the back-propagation method to tune the parameters of both the decoder (i.e., parameters of the RFP iterations) and the encoder (i.e., the sensing matrix ) functions of the proposed compressive AE.

Let . In order to enforce the first constraint in (4), the RFPI algorithm utilizes the regularization term . Note that the function can be expressed in terms of the well-known

function extensively used by the deep learning research community, i.e.

. The RFPI algorithm is a first-order optimization method (gradient-based) that operates as follows: Given an initial point on the unit-sphere, the gradient step-size and a shrinkage thresholds , at each iteration

, the estimated signal

is obtained using the following update steps:

 (5a) ti=(1+δdTixi−1)xi−1−δdi, (5b) vi=sign(ti)⊙ReLU(|ti|−(δ/α)1), (5c) xi=vi∥vi∥2. (5d)

After the descent in (5a)-(5b), the update step in (5c) corresponds to a shrinkage step. More precisely, any element of the vector that is below the threshold will be pulled down to zero (leading to enhanced sparsity). Finally, the algorithm projects the obtained vector on the unit sphere to produce the latest estimation of the signal.

While effective in signal reconstruction, there exist several drawbacks in using the RFPI method. For instance, it is required to use the algorithm on several problem instances, while increasing the value of the penalty factor at each outer iteration of the algorithm, and to use the previously obtained solution as the initial point for tackling the recovery problem for any new problem instance. Moreover, it is not straight-forward how to choose the fixed step-size and the shrinkage threshold, that may depend on the latent-parameters in the information system. In fact, it is evident that by carefully tuning the step-sizes and the shrinkage threshold , one can significantly boost the performance of the algorithm, and further alleviate the mentioned drawbacks of this method. In the next section, we show how this tuning can be done by learning from the data. In particular, we slightly modify and over-parameterize the above updating steps of the RFPI algorithm and unfold them onto the layers of a deep neural network, and define a decoder function based on the unfolded iterations, and seek to jointly learn the parameters of the proposed AE.

## 3 The Proposed One-Bit Compressive AutoEncoder

We pursue the design of a novel one-bit compressive sensing-based autoencoder architecture that allows us to jointly design the parameters of both the encoder and the decoder module when one-bit quantizers are employed in the data acquisition process (i.e., the encoding module) for a -sparse input signal . Briefly speaking, an AE is a generative model comprised of an encoder and a decoder module that are sequentially connected together. The purpose of an AE is to learn an abstract representation of the input data, while providing a powerful data reconstruction system through the decoder module. The input to such system is a set of signals following a certain distribution, i.e. , and the output is the recovered signal from the decoder module . Hence, the goal is to jointly learn an abstract representation of the underlying distribution of the signals through the encoder module, and simultaneously, learning a decoder module allowing for reconstruction of the compressed signals from the obtained abstract representations. Therefore, an AE can be defined by two functionals: i) an encoder function , parameterized on a set of variables that maps the input signal into a new vector space, and ii) a decoder function parameterized on , which maps the output of the encoder module back into the original signal space. Hence, the governing dynamics of a general AE can be expressed as , where denotes the reconstructed signal.

In light of the above, we seek to interpret a one-bit CS system as an AE module facilitating not only the design of the sensing matrix that best captures the information of a -sparse signal when one-bit quantizers are employed, but also to learn the parameters of an iterative optimization algorithm specifically designed for the task of signal recovery. To this end, we modify and unfold the iterations of the form (5a)-(5d) onto the layers of a deep neural network and later use the deep-learning tools to tune the parameters of the proposed one-bit compressive AE. In particular, we define the encoder module of the proposed AE as follows:

 fEncoderΥ1(x)=~sign(Φx), (6)

where denotes the set of parameters of the encoder function, and , for some choice of ( was set to in numerical investigations). Note that we replaced the original function with a smooth approximation of it based on the hyperbolic tangent function. The reason for such a replacement is that the

function is not continuous and its gradient is zero everywhere except at the origin, and hence, the use of it would cripple any stochastic gradient-based optimization method (used in backpropagation method in deep learning). As for the decoder function, define

as follows:

 gϕi(z;Φ,R)=v∥v∥2,with (7a) v=~sign(t)⊙ReLU(|t|−τi), (7b) t=(1+δidTz)z−δid, (7c) d=−(RΦ)Tρ(RΦz), (7d)

where represents the parameters of the function , and denotes the sparsity inducing thresholds vector. Next, we define the proposed composite decoder function as follows:

 fDecoderΥ2(z0)=gϕL−1∘gϕL−2∘⋯∘gϕ1∘gϕ0(z0;Φ,R),

where represents the learnable (tunable) parameters of the decoder function. Note that we have over-parameterized the iterations of the RFPI algorithm by introducing a new variable at each iteration for the sparsity inducing step (i.e., Eq. (7b)). Moreover, in contrast with the original iterations, we have introduced a new step-size at each step of the iteration as well. Hence, the above decoder function can be interpreted as performing iterations of the original RFPI algorithm with an additional degrees of freedom (as compared to the base algorithm) expressed in terms of the set of the shrinkage thresholds and the gradient step-sizes , i.e. . Hence, the proposed decoder function is much more expressive than that of the iterations of RFPI algorithm.

### 3.1 Loss Function Characterization and Training

The training of an AE should be carried out by defining a proper loss function

that provides a measure of the similarity between the input and the output of the AE. The goal is to minimize the distance between the input target signal and the recovered signal according to a similarity criterion. A widely-used option for the loss function is the output MSE loss, i.e.,

. Nevertheless, in deep architectures with a high number of layers and parameters, such a simple choice of the loss function makes it difficult to back-propagate the gradients, and hence, the vanishing gradient problem arises. Therefore, for the training of the proposed AE, a better choice for the loss function is to consider the cumulative MSE loss of all layers. As a result, one can also feed-forward the decoder function for only

layers (a lower complexity decoding), and consider the output of the -th layer as a good approximation of the target signal. For training, one needs to consider the constraint that the gradient step-sizes must be non-negative. By parameterizing the decoder function on the step-sizes and the shrinkage step thresholds, we need to regularize the training loss function ensuring that the network chooses positive step sizes and thresholds at each layer. With this in mind, we suggest the following loss function for training the proposed one-bit compressive AE. Let , and define the loss function for training as

 G(x;^x)= L−1∑i=0wi||x−~gi(xi)||22accumulated MSE loss of all layers+ (8) λL−1∑i=0ReLU(−[δ]i)+λnL−1∑i=0ReLU(−[τ]i)% regularization term for the step-sizes and shrinkage thresholds,

where , , and .

## 4 Numerical Results

In this section, we present the simulation results that investigate the performance of the proposed one-bit compressive AE. For training purposes, we randomly generate -sparse signals of length , on the unit sphere, i.e. , and . Furthermore, we fix the total number of layers of the decoder function to ; equivalent of performing only 30 optimization iterations of the form (7). As for the sensing matrix to be learned, we assume . The results presented here are averaged over realizations of the system parameters. Although we consider the case that , due to the focus of this study on one-bit sampling where usually a large number of one-bit samples are available, as opposed to the usual CS settings.

The proposed one-bit CS AE is implemented using the library [paszke2017automatic]. The Adam algorithm [kingma2014adam] with a learning rate of is utilized for optimization of parameters of the proposed AE. We perform a batch-learning approach with mini-batches of size , and a total number of epochs. For training of the the proposed AE, we fix , and evaluate the performance of the proposed method on target signals with , as well as (which was not shown to the network during the training phase). In all scenarios, the initial starting point of the optimization algorithms are the same.

Fig. 1 illustrates MSE for the recovered signal versus total number of optimization iterations , for , and for sparsity levels (a) and (b) . We compare our algorithm with the RFPI iterations in (5a)-(5d), in the following scenarios:
Case 1: The RFPI algorithm with a randomly generated sensing matrix whose elements are i.i.d and sampled from , and fixed values for , and .
Case 2: The RFPI algorithm where the learned is utilized and the values for and are fixed as the previous case.
Case 3: The RFPI algorithm with a randomly generated (same as Case 1), however, the learned shrinkage thresholds vector is utilized with a fixed step size.
Case 4: The proposed one-bit CS AE method corresponding to the iterations of the form (7a)-(7d), with the learned , , and .

### 4.1 Discussion and Concluding Remarks

It can be seen from Fig. 1 that in both cases of , the proposed method demonstrates a significantly better performance than that of the RFPI algorithm (described in Case 1)—an improvement of times in MSE outcome. Furthermore, the effectiveness of the learned (Case 2), and the learned (Case 3) compared to the base algorithm (Case 1), are clearly evident, as both algorithms with learned parameters significantly outperform the original RFPI. Finally, although we trained the network for sparse signals, it still shows good generalization properties even for (see Fig. 1 (b)). This is presumably due to the fact that the proposed AE is a hybrid model-based data-driven approach that exploits the existing domain knowledge of the problem as well as the available data at hand. Furthermore, note that the proposed method achieves high accuracy very quickly and does not require solving (4) for several instances as opposed to the original RFPI algorithm—thus showing great potential for usage in real-time applications.