Recent flourish of studies on compressed sensing inspires researchers to apply the idea for applications in wireless communications. Fletcher et al.  showed that a detection problem for on-off random access channels is equivalent a sparse signal recovery problem discussed in compressed sensing. Kaneko et al.  used a compressed sensing technique to develop an identification protocol for a passive RFID system. For such an application in wireless communications, we need powerful sparse signal recovery algorithms that are suitable for hardware implementation in order to achieve high speed signal processing and energy efficiency.
Binary (or one-bit) compressed sensing firstly proposed by Boufounos and Baraniuk 
is a variant of the original compressed sensing. In the scenario of binary compressed sensing, a linear observation vector that is quantized to a binary value, typically in the binary alphabet, is given to a receiver. The linear observation vector is the product of a sensing matrix and a hidden sparse signal vector. A receiver then tries to reconstruct the original sparse signal. This process is called a sparse signal recovery process. It is known that power consumption of an AD converter is closely related to the number of its quantization levels and the sampling frequency, namely, the power consumption of an AD converter increases as the sampling frequency increases. When we pursue to develop an extremely low power consumption device for battery-powered sensors or to develop a digital signal processing device operating at very high sampling frequency, binary quantization by a comparator is a reasonable choice. Binary compressed sensing is well suited to such a situation. Moreover, since the input to the receiver is restricted to binary values, no gain control is required in the case of binary compressed sensing. This fact further simplifies the hardware needed in the receiver.
The optimal sparse signal recovery for binary compressed sensing can be attained by solving a certain integer programming problem. However, the IP problem is, in general, computationally hard to solve and the approach can handle only small problems. Boufounos and Baraniuk  studied a relaxation method that replaces -norm with -norm and introduced a convex relaxation for integer constraints. Although nonlinearity induced by binary quantization prohibits direct applications of known sparse recovery algorithms for the original compressed sensing, several sparse recovery algorithms for binary compressed sensing has been developed    based on the known iterative methods for the original compressed sensing. Boufounos  proposed a greedy algorithm called Matched Sign Pursuit (MSP) that is a counter part of Orthogonal Matching Pursuit (OMP). The paper  presents Binary Iterative Hard Thresholding (BIHT) algorithm by reforming Iterative Hard Thresholding (IHT) algorithm .
Although the known sparse recovery algorithms exhibit reasonable sparse recovery performance, it may not be suitable for applications in high speed wireless communications. This is because most of algorithms require a number of iterations to achieve reasonable sparse recovery results. Most of known algorithms also require calculations involving matrix-vector products with -computations for each iteration where is the length of the sparse signal vector.
Our approach for sparse signal recovery is to employ feedforward neural networks as building blocks of sparse signal recovery schemes. We expect that an appropriately designed and trained neural networks greatly reduce required computing resources and are well suited to hardware implementation. In this paper, we will propose the majority voting neural networks composed of several independently trained neural networks, which are feedforward 3-layer neural networks employing the sigmoid function as an activation function. As far as we know, there is no previous studies on sparse signal recovery based on neural networks. Our focus is thus not only on the practical aspect of the neural sparse signal recovery but also on studies on the fundamental behavior of the neural networks for sparse signal recovery. Recently, deep neural networks 
have been actively studied because they provide surprisingly excellent performance in the areas of image/speech recognition and natural language processing. Such powerful neural networks can be used in wireless communication as well. This work can be seen as a first attempt to this direction.
Ii Sparse Recovery by Neural Networks
Ii-a Binary compressed sensing
The main problem for binary compressed sensing is to reconstruct an unknown sparse signal vector from the observation signal vector under the condition that these signals satisfy the relationship:
The sign function is defined by
The matrix is called a sensing matrix. We assume that the length of the observation signal vector is smaller than the length of the sparse signal vector , i.e., . This problem setup is similar to that of the original compressed sensing. The notable difference between them is that the observation signal
is binarized in a sensing process of binary compressed sensing. A receiver obtains the observation signaland then it tries to recover the corresponding hidden signal . We here make two assumptions for the signal and the sensing matrix . The first assumption is sparsity of the hidden signal . The original binary signal contains only non-zero elements, where is a positive integer much smaller than , i.e., Hamming weight of should be . We call the set of binary vectors with Hamming weight is -sparse signals. The second assumption is that the receiver completely knows the sensing matrix .
Ii-B Network architecture
When we need an extremely high speed signal processing or an energy-efficient sparse signal processing method for battery powered sensor, it would be reasonable to develop a sparse signal recovery algorithm suitable for the situation. In the sparse signal recovery method based on neural networks to be described in this section requires only several matrix-vector products to obtain an output signal, which is an estimate signal of the sparse vector. Thus, the proposed method needs smaller computational costs than those required by conventional iterative methods.
Our sparse recovery method is based on a 3-layer feedforward neural network illustrated in Fig.1. This architecture is fairly common one; it consists of the input, hidden and output layers. Adjacent layers are connected by weighted edges and each layer includes neural units that can keep real values. As an activation function, we employed the sigmoid function to determine the values of the hidden and output layers. In our problem setting, the observation signal is fed into the input layer from the left in Fig.1. The signal propagates from left to right and the output signal eventually comes out from the output layer. The network should be trained so that the output signal is an accurate estimation of the original sparse signal . The precise description of the network in Fig.1 is given by the following equations:
The function is the sigmoid function defined by In this paper, we will follow a simple convention that represents where . The round function gives the nearest integer from . The equation (3) defines the signal transformation from the input layer to the hidden layer. An affine transformation is firstly applied to the input signal and then the sigmoid function is applied. The weight matrix
and the bias vectordefines the affine transformation. The resulting signal is kept in the units in the hidden layer, which are called hidden units. The parameter thus means the number of hidden units. From the hidden layer to the output layer, the equation (4) governs the second signal transformation. The second transformation to yield the signal consists of the affine transformation, based on the weight matrix and the bias vector , and the nonlinear mapping based on the sigmoid function. The vector emitted from the output layer is finally rounded to a nearest integer vector because we assumed that non-zero elements in the original sparse signal takes the value one. Since the range of the sigmoid function lies in the open interval , an element in the estimate vector should take the value zero or one.
The network in Fig.1 can be seen as a parametrized estimator where is the set of the trainable parameters . It is expected that the trainable parameter
should be adjusted in the training phase so as to minimize the error probability. However, it may be computationally intractable to minimize the error probability directly. Instated of direct minimization of the error probability itself, we will minimize a loss function including a cross entropy-like loss function and an -regularization term. In this subsection, the details of the training process is described.
In the training phase of the network, the parameter should be updated in order to minimize the values of the given loss function. Let be the set of the training data used in the training phase. The signals and relate as . In the training process, we use randomly generated training samples; the sparse vectors are generated uniformly at random from the set of -sparse vectors. The sample is fed into the network and the corresponding sample is used as the supervisory signal.
We here employ stochastic gradient descent (SGD) algorithms to minimize the loss function described later. It is empirically known that SGD and its variations behave very well for non-convex objective functions that are computationally hard to minimize. This is the reason why SGD and related algorithms are widely used for training deep neural networks. In order to use SGD, we need to partition the training set into minibatches. A minibatch is a subset of the training data and the use of minibatches introduces stochastic disturbance in training processes. Such stochastic disturbance helps a search point in an SGD process to escape from a stationary point of the non-convex objective function to be minimized. We divide the training data into a number of minibatches as follows:
In this case, every minibatch contains -pair of samples. We denote -th , minibatch as .
Ii-D Loss function
The choice of the loss function, i.e., the objective function to be minimized in training processes, is crucial to achieve appropriate recovery performance. We introduce a loss function designed for sparse signal recovery. The loss function of -th minibatch is defined by
The vector is given by i.e, the output of our neural network corresponding to the input . The first term of measures closeness between and . This measure is closely related to cross entropy that are often used in supervised classification problems. In the case of a classification problem, is a one-hot vector that can be interpreted as a probability vector. In our case, since contains -ones, thus the first term is not the same as the cross entropy. It has been empirically observed that this term plays an important role for sparse signal recovery. For example, from several numerical experiments indicated that the -distance between and is not suitable for our purpose. The second term of equation (6) is the -regularization term for promoting sparsity of the output . The regularization parameter adjusts effectiveness of regularization. Some experiments showed that the -regularization term is indispensable for obtaining sparse output vector.
The training process of our network can be summarized as follows. We first generate the training data . In the -th update iteration of the parameter , the minibatch is fed to the Adam optimizer  based on the loss function (6). The Adam optimizer is a variant of SGD that provides fast convergence in many cases and it is widely used in learning process for deep neural networks. An iteration corresponding to a process for a minibatch is called a learning step. A training process finishes when all the minibatches are processed.
Iii Numerical Results
As the primal performance measure of sparse signal reconstruction, we adopt the recovery rate which is the probability of the event under the assumption where is chosen uniformly at random from the set of -sparse binary vectors. In this section, we evaluate the sparse signal recovery performance of our feedforward neural networks in Fig. 1.
Iii-a Details on experiments
We used TensorFlow to implement and to train our neural networks. TensorFlow is a framework designed for distributed data flow based numerical calculations that is especially well suited for training of deep neural networks. TensorFlow supports automatic back propagation for computing the gradient vectors required for the parameter updates and it also provides GPU-computing that can significantly accelerate training processes. It is rather straightforward to implement our neural networks and the training process descried in the previous section by using TensorFlow.
The details of parameters used throughout the paper are following. The length of the original sparse signal is set to . The sensing matrix is generated at random, i.e., each element of
is independently generated according to Gaussian distribution with mean
and variance. The sensing matrix is generated just before an experiment and fixed during an experiment. At the beginning of a training process, the weight matrices and at the hidden and output layers are initialized based on pseudo random numbers, namely each element of these matrices follows Gaussian distribution with mean and variance . The bias vectors and at the hidden and output layers are initialized to the zero vector. The number of hidden units is set to and the minibatch size is set to . The initial value of the learning coefficient of Adam optimizer is and the coefficient of regularization term is set to .
Iii-B Initial experiments
As an example of sparse signal reconstruction via our neural network, we show a -sparse vector and the corresponding output of the trained neural network in Fig.2. The parameter is obtained by a training process with learning steps, i.e., minibatches. In this experiment, we set the length of the observation signal to .
From Fig.2, we can observe that the output shows fairly good match with the original signal . For example, the support of components in with the value larger than exactly coincides with the support of . Some of the components have the value pretty close to as well. It is also seen that some of components with small values, e.g., at indices around 110 and 145, incur false positive which means that the corresponding components in the original are zero. This is the reason why we introduced the round function at the final stage of our neural network in (5). It is expected that the round function eliminates the effect of the components with small values that may produce false positive elements in the final estimation .
Fig.3 indicates a relationship between the recovery rate and the learning steps. The parameters are . From Fig.3, we can see that the recovery rate increases as the number of learning steps increases although the progress contains fluctuations. The recovery rate appears to be saturated around steps. In following experiments, the number of learning steps is thus set to based on this observation.
Iii-C Sparse recovery by integer programming
We introduce integer programming (IP)-based sparse signal recovery
as a performance benchmark in the subsequent subsections because it provides the optimal recovery rate. The IP formulation shown here is based on the linear programming formulation in. Although IP-based sparse signal recovery requires huge computer resources, it is applicable to moderate size problems if we employ a recent advanced IP solver. We used IBM CPLEX Optimizer for solving the IP problem shown below. The problem needed to solve is to find a feasible binary vector satisfying the following conditions:
where is the element of the sensing matrix and is the -th element of the observation signal . If a feasible solution satisfying all the above conditions exists, it becomes an estimate . It is clear that these conditions are consistent with our setting of binary compressed sensing.
Iii-D Experimental results
In Fig.4, it is seen that the recovery rate tend to increase as increases for all . The recovery rate is beyond at when the original signal is -sparse. It can be also observed that the recovery rate strongly depends on the sparseness parameter . For example, the recovery rate of comes around at . The IP-based sparse recovery provides the recovery rate at when the original signal is -sparse vector. On the other hand, our neural network yields the recovery rate more than at when the original signal is -sparse. Although computation costs of the neural network in the recovery phase are much smaller than those required for IP-based sparse recovery, the performance gap appears rather huge and the neural-based reconstruction should be further improved.
Iv Majority Voting Neural Networks
In the previous section, we saw that neural-based sparse signal recovery is successful under some parameter setting but there are still much room for improvement in terms of the recovery performance. In this section, we will propose a promising variant of the feedforward neural networks which is called majority voting neural networks. The majority voting neural network consists of several independently trained neural networks. The outputs from these neural network are combined by soft majority voting nodes and the final estimation vector is obtained by rounding the output from the soft majority voting nodes. Combining a several neural networks to obtain improved performance is not a novel idea, e.g., , but it will be shown that the idea is very effective for our purpose.
Iv-a Network architecture
From statistics of reconstruction errors occurred in our computer experiments, we observed that many reconstruction error events (i.e., ) occur due to only one symbol mismatch. In addition to this observation, we also found that independently trained neural networks tend to make symbol errors at distict positions. These observations inspire us to use majority voting to combine several outputs from independently trained neural networks.
Figure 5 presents the architecture of the majority voting neural networks. In this case, the majority voting neural network consists of component feedforward neural networks defined by
where . The output of the component neural networks are aggregated by the soft majority logic nodes and it yields the estimation vector:
where is the the threshold function defined by
In the following experiments, we set . Each component network was trained independently. This means that the training sets were independently generated for each component network.
Iv-B Required computational resources
The simple architecture of the majority voting neural networks is advantageous for both software and hardware implementations. In a case of software implementation, computation time required for matrix-vector products are dominant in a recovery process. For computing , we need approximately basic arithmetic operations such as additions and multiplications. Since there are -components, approximately basic arithmetic operations are required for computing the output. This number appears competitive to known iterative methods  because, in most iterative algorithms, -basic operations are required for each iteration to compute where and . For a case of hardware implementation, parallelism in the architecture of the majority voting neural networks possibly enables us to create high speed sparse recovery circuits on FPGA or ASIC. Note that implementation of neural networks with FPGA is recently becoming a hot research topic .
Iv-C Experimental results
Fig.6 presents comparisons of the recovery rates of the majority voting neural networks. The length of the sparse signal is set to and the sparseness parameter is set to .
From Fig.6, we can observe significant improvement in recovery performance compared with the performance of the single neural network. A single feedforward neural network discussed in the previous section provides recovery rate around at . On the other hand, the majority voting neural networks with 3 component nets achieves at . The majority nets with 5 components shows further improvement to at . This result implies that the soft majority voting process introduced in this section is effective to improve the reconstruction performance. Another implication obtained from this result is that independently trained nets tend to have different estimation error patterns. This property explains the improvement in recovery rate observed in this experiment. We can see that there is still gap between the curves of the IP-based sparse recovery and the majority voting nets with . The gap might be considered as the price we need to pay for obtaining reduction in required computing resources.
Fig.7 also shows the recovery rates when . The length of the original signal is . In Fig.7, we can see the same tendency that has been observed in the previous experimental result in Fig.6. The performance of sparse recovery tends to increase as the number of component networks grows. At , the recovery rate is improved from (with the single network) to (with component networks).
Table I presents statistics on computation time required for sparse recovery of instances for . It can be seen that sparse recovery algorithms based on neural networks runs order of several magnitude faster than the IP sparse recovery method. Of course, computation time depends on the computing environment and implementation but the result can be seen as an evidence that supports our claim that the proposed network structure is advantageous to reduce required computing resources.
The processor is Intel Core i7-3770K CPU(3.50GHz, 8-cores) and the memory size is 7.5 Gbytes.
V Concluding summary
In this paper, we proposed sparse signal recovery schemes based on neural networks for binary compressed sensing. Our empirical study shows a choice of the loss function used for training neural networks is of prime importance to achieve excellent reconstruction performance. We found a loss function suitable for this purpose, which includes a cross entropy like term and an regularized term. The majority voting neural network proposed in this paper is composed from several independently trained feedforward neural networks. From the experimental results, we observed that the majority voting neural network achieves excellent recovery performance, which is approaching the optimal IP-based performance as the number of component nets grows. The simple architecture of the majority voting neural network would be beneficial for both software and hardware implementation. It can be expected that high speed sparse signal recovery circuits based on the neural networks produce novel applications in wireless communications such as multiuser detection in multiple access channels.
The present study was supported by Grant-in-Aid for Scientific Research (B) (grant number 16H02878) from JSPS. We used the optimization problem solver CPLEX Optimizer and the distributed numerical computation framework Tensorflow in this work. We gratefully acknowledge IBM Academic Initiative and Google.
-  A. K. Fletcher, S. Rangan and V. K. Goyal, “On-off random access channels: a compressed sensing framework,” arXiv:0903.1022, 2009.
-  M. Kaneko, W. Hu, K. Hayashi and H. Sakai, “Compressed sensing-based tag identification protocol for a passive RFID system,” IEEE Commun. Lett., vol. 18, no. 11, pp. 2023–2026, 2014.
-  P. Boufounos and R. Baraniuk, “1-bit compressive sensing,” 42nd Annual Conference on Information Sciences and Systems (CISS), pp. 16–21, 2008.
-  Y. Plan and R. Vershynin, “One-bit compressed sensing by linear programming,” Communications on Pure and Applied Mathematics 66.8, pp. 1275–1297, 2013.
-  P. Boufounos, “Greedy sparse signal reconstruction from sign measurements” Asilomar Conf. on Signals Systems and Comput., pp. 1305–1309, 2009.
-  L. Jacques, J. Laska, P. Boufounos, and R. Baraniuk, “Robust 1-bit compressive sensing via binary stable embeddings of sparse vectors,” IEEE Transactions on Information Theory 59.4, 2082–2102, 2013.
-  J. Laska, Z. Wen, W. Yin and R. Baraniuk, “Trust but verify: fast and accurate signal recovery from 1-bit compressive measurements,” IEEE Trans. Signal Process., vol. 59, no. 11, pp. 5289–5301, 2011.
-  T. Blumensath and M. Davies, “Iterative hard thresholding for compressive sensing,” Appl. Comput. Harmon. Anal., vol. 27, no. 3, pp. 265–274, 2009.
-  L. Deng, G. Hinton and B. Kingsbury, “New types of deep neural network learning for speech recognition and related applications: an overview,” International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 8599–8603, 2013.
-  K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556, 2014.
-  D. P. Kingma and J. L. Ba, “Adam: a method for stochastic optimization,” arXiv:1412.6980, 2014.
“TensorFlow: large-scale machine learning on heterogeneous systems,”http://tensorflow.org/ 2015. Software available from tensorflow.org.
-  G. Orchard, J. G. Vogelstein and R. Etienne-Cummings, “Fast neuromimetic object recognition using FPGA outperforms GPU implementation,” IEEE Trans. Neural Networks Learn. Syst., vol. 24, no. 8, pp. 1239–1252, 2013.
-  S. B. Cho and J. H. Kim, “Combining multiple neural networks by fuzzy integral for robust classigication,” IEEE Trans. Systems Man and Cybernetics, vol. 25, no. 2, pp. 380–384, 1995.