Solving inverse problems is at the core of many data acquisition systems, such as magnetic resonance imaging (MRI) and optical coherence tomography (OCT). In many of such systems, through proper quantization in time or space, the measurement system can be modeled as a system of linear equations as follows. The unknown signal to be measured is a high-dimensional signal , which belongs to , a compact subset of . The measured signal can be modeled as . Here , and
denote the sensing matrix, the measurement vector, and the measurement noise, respectively. The goal is often to design an efficient algorithm that recoversfrom the measurements , when . In addition to computational complexity, the efficiency of an algorithm is measured in terms of its required number of measurements, its reconstruction quality, and its robustness to noise.
Compressed sensing, i.e., solving the described ill-posed linear inverse problem, is only possible, if the set is appropriately structured. Initially, the main focus in compressed sensing was on structures, such as sparsity [1, 2]. While many signals of interest are in fact sparse in some transform domain, most of them follow structures beyond sparsity as well. Enabling a recovery algorithm to take advantage of the full structure of the source potentially leads to a higher recovery performance. This has motivated researchers in compressed sensing to explore algorithms that go beyond simple models such as sparsity.
Developing a compressed sensing recovery method involves two major steps: i) studying the class of signals (e.g., natural images, or magnetic resonance imaging (MRI) images) and discovering a common property among them, say property , and ii) devising an efficient algorithm that finds a signal that is consistent with the measurements and also satisfies property . For instance, the well-known iterative hard thresholding algorithm  is an algorithm that is developed for the case where property is defined as sparsity.
In order to move beyond simple structures such as sparsity, one approach is to design learning-based recovery methods that are able to learn complex signal models from training data. In this approach, instead of the structure of the desired class of signals being given to us, it is learned from an available training dataset. One of the most recent approaches in this direction is to use generative models that are based on neural networks.
Generative models are commonly used in machine learning to solve classic problems, such as classification. The role of a generative model is to learn the distribution of a class of signals, such that it is able to generate samples from that class. (Refer to Chapters 4 and 12 in  to learn more about using generative models in classification.) Modern generative methods achieve this goal typically through employing trained neural networks. Generative adversarial nets (GANs) and variational auto-encoders (VAEs) are examples of methods to design complex generative models . The success of such approaches in solving machine learning problems heavily relies on their ability to learn distributions of various complex signals, such as image and audio files. This success has encouraged researchers from areas such as compression, denoising and compressed sensing to also look into the application of such methods, as tools to capture the structures of signals of interest.
Given a class of signals, , consider a corresponding trained generative function , . Assume that is trained by enough samples from , such that it is able to represent signals from appropriately. In this paper, we study the performance of an optimization-based compressed sensing recovery method that employs as a mechanism to capture the structure of signals in . We derive sharp bounds connecting the properties of function (its dimensions, its error in representing the signals in , and its smoothness level) to the performance of the resulting recovery method.
The organization of the paper is as follows. Section II describes the problem of compressed sensing using generative models and states our main result in this domain. Section III describes an efficient algorithm based on projected gradient descent to approximate the solution of the exhaustive search optimization discussed in the prior sections. Section IV reviews some related work from the literature. Section V presents numerical results showing the performance of the proposed algorithm. Section VI presents the proofs of the main results and Section VII concludes the paper.
Vectors are denoted by bold letters, such as and . Sets are denoted by calligraphic letters, such as and . For a set , denotes its cardinality. For and , denotes the bit quantized version of is defined as . For a set and , let denote the set where every member in is quantized in bits, i.e.,
Ii Recovery using generative models
Consider a class of signals defined by a compact set . (For example, can be the set of images of human faces, or the set of MRI images of human brains.) Let function denote a generative function trained to represent signals in set .
Function is said to represent with distortion , if
Consider the standard problem of compressed sensing, where instead of explicitly knowing the structure of signals in , we have access to function , which is known to well-represent signals in . In this setup, signal is measured as , where , and denote the sensing matrix, the measurement vector, and the measurement noise, respectively. The goal is to recover from , typically with , via using the function to define the structure of signals in .
To solve this problem, ideally, we need to find a signal that is i) compatible with the measurements , and ii) representable with function . Hence, ignoring the computational complexity issues, we would like to solve the following optimization problem:
After finding , signal
can be estimated as
The main goal of this paper is to theoretically study the performance of this optimization-based recovery method. We derive bounds that establish a connection between the ambient dimension of the signal , the parameters of the function , and the number of measurements .
To prove such theoretical results, we put some constraints on function . More precisely, consider and let and , where and . Assume that
represents with distortion , where ,
is a bounded subset of .
Define and as in (2) and (3), respectively. The following theorem characterizes the connection between the properties of function (input dimension and Lipschitz constant ), the number of measurements () and the reconstruction distortion ().
The proof of Theorem 1 is presented in Section VI-C. Theorem 1 essentially states that, as long as the number of measurements is larger than , then the optimization in (3) is able to recover the desired signal, with a distortion, which goes to zero, as converges to zero. The proof employs the following two lemmas, which assume that the optimization is done over a quantized space.
Consider and let , where and . Assume that the entries of are i.i.d. and . Given function , define and as
with a probability larger than
The proof of Lemma 1 is presented in Section VI-A. Lemma 1 does not make any assumptions about the properties of function and its ability to describe signals in our desired set . The following lemma assumes some constraints on and also sets the free parameters and and the quantization level , to derive meaningful connections between the number of measurements and the reconstruction quality.
Comparing Lemma 2 with Theorem 1 reveals that there is factor 2 difference between minimum their required number of measurements. Currently, these factor 2 shows up as a price of searching over the continuous space , instead of the discrete space . However, it is not clear to us at this point, if this factor is an artifact of the proof, or a fundamental difference between the two results.
Iii Iterative projected gradient descent
The optimization described in (2) is a challenging non-convex optimization. Given that for generative models based on neural networks, the cost function is differentiable,  proposes applying standard gradient descent to solve this optimization. However, given the non-convexity of the cost function, there is no guarantees that this method succeeds. To address this issue, we propose an efficient algorithm based on projected gradient descent.
For , let
Consider , and , where denotes a compact subset of and . Here, are i.i.d. . Assume that function is -Lipschitz and satisfies (1), for some . Define and as (4) and (5), respectively. Choose free parameters and define , and as
respectively. Assume that
Let , and for , define as (8). Then, for every , if , then, either , or
with a probability larger than
The proof of Theorem 2 is presented in Section VI-D. Theorem 2 proves that while the original optimization is not convex, given enough measurements, the described projected gradient descent algorithm converges, even in the presence of additive white Gaussian noise.
In order to implement the proposed iterative method, described in (8), the step that might seem challenging is the projection step, i.e., . However, one can build (train) a separate neural network that approximates this step. Concatenating this neural network with the the neural network that approximates yields an “auto-encoder” that maps a high-dimensional signal into low-dimensions, and then back to its original dimension. Using this perspective, the last two steps of the algorithm, basically pass through an autor-encoder.
In Section V, we report simulation results showing the performance of the proposed algorithm applied with a training a neural network to implement the generative function .
Iv Related work
As mentioned earlier, researchers in compressed sensing have recently started using neural networks to design novel compressed sensing recovery methods [6, 7]. Various networks have been developed for diverse applications of compressed sensing [8, 9, 10, 11, 12, 13, 14].The hope is to design algorithms with superior performance compared to the existing methods. Similar to this paper, the authors of  also study application of generative models in CS. For the optimization-based method described in Eqs. (2)-(3),  proves that roughly measurements are sufficient for recovery. To compare with our main result, note that our Theorem 1 states that the required number of measurements can get arbitrary close to , regardless of constant and other parameters of the problem.
Another related line of work is the application of compression codes in designing efficient compression-based recovery methods . The goal of such methods it to elevate the scope of structures used by CS algorithms to those used by compression codes. Such an optimization is very similar to Eq. (3). However, the difference between these two approaches is that while a lossy compression code can be represented by a discrete set of codewords, a generative function has a continuous input and therefore it is more challenging to derive theoretical performance bounds.
V Simulation results
We trained a two-layer AutoEncoder  as our generative model, , depicted in Fig. 1, and we use the MNIST  digits data in our training and testing (but different sets). We used 35,000 digits to train our network, where the output of the first layer is a latent variable with a dimension of , recalling the input is of dimension . For the second layer, the input is of dimension , and we vary the dimension of the output from to . This is the dimension of our generative function in our theory. The -loss between the input (of the encoder) and output (of the decoder) are used and we imposed the sparsity the both encoder and decoder.
During our testing, we compressively sample digits using various number of measurements and reconstruct them using the trained AutoEncoder with various number of by employing the iterative projected gradient descent algorithm described in Section III. The Peak-Signal-to-Noise-Ratio (PSNR) is employed as the metric to compare our proposed algorithm with the Lasso  and the Total-Variation (TV)  algorithms. Fig. 2 plots the average PSNR of reconstructed digits against increasing measurements using different algorithms. It can be seen that our proposed algorithm outperforms both Lasso and TV and when the measurement number () is getting larger, the gap between our algorithm and Lasso is getting smaller. This may due to the fact that our trained generative model does not learn a perfect structure. A deeper and more complicated model may perform better. Another interesting finding is that the performance of Lasso is increasing sharply when . This may due to the sparsity of the digits. This also confirms that the Lasso’s performance is bounded by the sparsity of the signal while the performance of our proposed method mostly rely on the dimension of the generative model, , which can be much smaller than the sparsity of the signal.
In Fig. 3, we exploit the performance change of our proposed algorithm with different . It can be seen that results are fluctuating within 0.5dB for increasing from 100 to 1,500. This denotes that the might be sufficient to capture the structure of these digits. From our Theorem 1, might be enough to recover the digits; this corresponds to , which is also a changing point in our PSNR curve in Fig. 2. Therefore, even with some approximation, we can claim that our simulation has verified our theoretical guarantee.
Fig. 4 shows some results of the reconstructed digits by three algorithms, which again verifies the excellent performance of our proposed algorithm.
The following lemma from 
on the concentration of Chi-squared random variables is used in the proof.
Lemma 3 (Chi-squared concentration).
Assume that are i.i.d. . For any we have
and for ,
Consider and such that . Let . Consider matrix with i.i.d. standard normal entries. Then, for any ,
where is a free parameter smaller than . Specifically, for ,
Consider and , where are i.i.d. . Then the distribution of is the same as the distribution of , where is independent of .
Vi-a Proof of Lemma 1
and Since is the minimizer of , over all , we have
But, since ,
From the triangle inequality, and . It follows from (19) that
Vi-B Proof of Lemma 2
with a probability larger than On the other hand,
Since is a compact set, there exist integer numbers and , such that . Therefore, , where . Therefore, since, for , and by assumption, it follows that
where the last line follows because and hence . Therefore, from (28), inserting the value of , we have
Note that only depend on , and and .
Vi-C Proof of Theorem 1
As in Lemma 2, define and . Also, given and , define events and as (21) and (22), respectively. Then, as proved in Lemma 2, for and , choosing , , for and , conditioned on , (27) holds. Also, we proved that
where and comes from and . While this result bounds , to prove our desired result, we need to bound by connecting it to . To derive this, note that
where follows from the assumed Lipschitz property of function and the definition of symbol-by-symbol quantization operator. holds since and . To finish the proof, we need to bound .
Since and , it follows that
Therefore, using (32) and applying the triangle inequality, it follows that
where the last step follows from the Lipschitz property of function and (32). Given , define event as
Recall that . Conditioned on ,
Define event as As shown in ,
where the last line follows because and . Combining Lemma 3 and the union bound, it follows that
Using the same argument as the one used to derive (28), since , and , choosing , we have
where . For , from (38), it follows that