Many application such as computational imaging, and remote sensing fall in the compressive sensing paradigm. Compressive sensing (CS) [Donoho2006, Candes et al.2006] refers to projecting a high dimensional, sparse or sparsifiable signal to a lower dimensional measurement , using a small set of linear, non-adaptive frames. The noisy measurement model is:
where the measurement matrix
is often a random matrix. In this work, we are interested in the problem of recovering the unknown natural signal, from the compressed measurement , given the measurement matrix . Instead of the sparse prior commonly adopted by CS literature, we turn to a learned prior. In [Gregor and LeCun2010, Venkatakrishnan et al.2013, Rick Chang et al.2017, Adler and Öktem2017, Fan et al.2017, Gupta et al.2018]
, the authors explored neural network-based inverse problem solvers. Recently, with the success of generative adversarial network (GAN)[Goodfellow et al.2014, Creswell et al.2018, Zhu et al.2016] in modeling the distribution of data, in [Anirudh et al.2018, Bora et al.2017, Shah and Hegde2018], the authors used a GAN as the prior for natural images.
However [Bora et al.2017] do not have a guarantee on the convergence of their algorithm for solving the non-convex optimization problem, which requires several random initialization. Similarly, in [Shah and Hegde2018], the inner loop is solving a non-convex optimization problem using a gradient descent algorithm with no guarantee of convergence to a global optimum. Furthermore, the conditions imposed in [Shah and Hegde2018] on the random Gaussian measurement matrix for convergence of their outer iterative loop are unnecessarily stringent and cannot be achieved with a moderate number of measurements. Meanwhile, both these methods require expensive computation of i.e. the Jacobian of the differentiable generator with respect to the latent input . Since computing the Jacobian involves back-propagation through at every iteration, these reconstruction algorithms are computationally expensive and even when implemented on a GPU they are slow.
Our contributions: In this paper, we propose a GAN-based projection network to solve compressed sensing recovery problems using projected gradient descent (PGD). We are able to reconstruct the image even with compression ratio (i.e., with less than of a full measurement set) using a random Gaussian measurement matrix. The proposed approach provides superior recovery accuracy over existing methods, simultaneously with a
speed-up, making the algorithm useful for practical applications. We provide theoretical results on the convergence of the cost function as well as the reconstruction error, given that the eigenvalues of measurement matrixsatisfy certain conditions when restricted to the range of the generator. We complement the theory by proposing a method to design a measurement matrix that satisfies these sufficient conditions for guaranteed convergence. We assess these sufficient conditions for both the random Gaussian measurement matrix and the designed matrix for a given image dataset. Both our analysis and reconstruction experiments show that with the designed matrix, fewer measurements suffice for robust recovery.
2 Problem Formulation
Let denote a ground truth image, a fixed measurement matrix, and the noisy measurement, with noise . We assume that the ground truth images lie in a non-convex set , the range of generator
. The maximum likelihood estimator (MLE) of, , can be formulated as follows:
[Bora et al.2017] (whose algorithm we denote by CSGM) solve the following optimization problem
in the latent space (), and set . Their gradient descent algorithm often gets stuck at local optima. Since the problem is non-convex, the reconstruction is strongly dependent on the initialization of and requires several random initializations to converge to a good point. To resolve this problem, [Shah and Hegde2018] proposed a projected gradient descent (PGD)-based method (to which we refer as PGD-GAN) shown in fig.1(a), to find the minimizer in (2). They perform gradient descent in the ambient ()-space and project the gradient update term onto . This projection involves solving another non-convex minimization problem (shown in the second box in fig.1(a)) using the Adam optimizer [Kingma and Ba2014] for 100 iterations from random initialization. No convergence result is given for this iterative algorithm to perform the non-linear projection, and the convergence analysis for the PGD-GAN algorithm that Shah and Hegde provides for the noiseless measurement case only holds if one assumes that the inner loop succeeds in finding the optimum projection.
Our main idea in this paper is to replace this iterative scheme in the inner-loop with a learning-based approach, as it often performs better and does not fall into local optima [Zhu et al.2016]. Besides, both the earlier approaches require expensive computation of the Jacobian of , which is eliminated in the proposed approach.
3 Proposed Method
In this section, we introduce our methodology and architecture to train a projector using a pre-trained generator and how we use this projector to obtain the optimizer in (2).
|(a) PGD with inner-loop|
|(b) Network-based PGD (NPGD)|
3.1 Inner-Loop-Free Scheme
We show that by carefully designing a network architecture with a suitable training strategy, we can train a projector onto , the range of the generator , thereby removing the inner-loop required in the earlier approach. The resulting iterative updates of our network-based PGD (NPGD) algorithm are shown in fig.1(b). This approach eliminates the need to solve the inner-loop non-convex optimization problem, which depends on initialization requires several restarts. Furthermore, our method provides a significant speed-up by a factor of on CelebA dataset because of two major reasons: (i) since there is no inner-loop, the total number of iterations required for convergence is significantly reduced, (ii) doesn’t require computation of i.e. the Jacobian of the generator with respect to the input, . This is very expensive requiring back-propagation through the network for (for [Bora et al.2017]) and (for [Shah and Hegde2018]) where are number of restarts, outer and inner iterations respectively.
3.2 Generator-based Projector
A GAN consists of two networks, generator and discriminator which follow an adversarial training strategy to learn the data distribution. A well-trained generator takes in a random latent variable and produces sharp looking images from the training data distribution in . The goal is to train a network that projects the image onto . The projector, onto a set should satisfy two main properties: Idempotence, for any point , , Least distance, for a point , . Figure 2 shows the network structure we used to train a projector using a GAN. We define the multi-task loss to be:
where is a generator obtained from the GAN trained on a particular dataset. , parameterized by , approximates a non-linear least squares pseudo-inverse of and indicates the noise added to the generator’s output for different so that the projector network denoted by is trained on the points outside the range() and learns to project them onto . The objective function consists of two parts. The first one is similar to standard Encoder-Decoder
framework, however, the loss function is minimized for, parameters of , while keeping the parameters of (obtained by standard GAN training) fixed. This ensures that doesn’t change and is a mapping onto . The second part is used to keep close to true used to generate training image . The second term can be considered a regularizer for training the projector with being the regularization constant.
4 Theoretical Study
4.1 Convergence Analysis
Definition 1 (Restricted Eigenvalue Constraint (REC)).
Let . For some parameters , matrix is said to satisfy the if the following holds for all .
Definition 2 (Approximate Projection using GAN).
A concatenated network is an -approximate projector, if the following holds for all :
The following theorem (proved in the Appendix) provides upper bounds on the cost function and reconstruction error after iterations.
Let matrix satisfy the with , and let the concatenated network be a -approximate projector. Then for every and measurement , executing algorithm 1 with step size , will yield . Furthermore, the algorithm achieves after steps. When ,
By theorem 1 the convergence is linear (”geometric”) with rate determined largely by the ratio , and the reconstruction error at convergence is similarly controlled by . We would like ratio as close to 1 as possible and must have for convergence. It has been shown in [Baraniuk and Wakin2009] that a random matrix
with orthonormal rows will satisfy this condition with high probability forroughly linear in dimension with log factors dependent on the properties of the manifold, in this case, . However, as we demonstrate later (see figure 3), a random matrix often will not satisfy the desired condition for small or moderate
. We propose a fast heuristic method to find a relatively good measurement matrix for an image set, given a fixed .
4.2 Generator-based Measurement Matrix Design
There have been a few attempts to optimize the measurement matrix based on the specific data distribution. [Hegde et al.2015] tried to find a deterministic measurement matrix that satisfies for a finite set , but the time complexity is , with being the size of set . Because the secant set would be of cardinality for a training set of size , with , the time complexity would be infeasible even for fairly small images. Furthermore, the final number of required measurements , which is determined by the algorithm, depends on the isometry constant , and cannot be specified in advance. [Kvinge et al.2018] introduced a heuristic iterative algorithm to find a measurement matrix with orthonormal rows that satisfies the REC with small ratio, but the time complexity is and the space complexity is , which is infeasible for a high-dimensional image dataset. Our method, based on sampling from the secant set, has time complexity , and space complexity , where is a tiny fraction of .
Definition 3 (Secant Set).
The normalized secant set of is defined as follows:
and the associated distribution is denoted as , where
Given , the optimization over A is as follows:
The first inequality is due to an additional constraint on
. This results in the largest singular value ofbeing 1 and hence the numerator term, , is at most 1. Finally, we replace the relaxed objective on the RHS of (9) by its empirical estimate obtained by sampling secants according to :
For and large enough, this designed measurement matrix would satisfy the condition for most of the secants in . Constructing an matrix , the previous optimization problems can be rewritten as:
The optimal from these samples would satisfy , where is the eigenvalue decomposition (EVD) of and is the sub-matrix consisting of first columns in . We compute and its EVD at time complexity and space complexity .
Our approach to the design of is related to one of the steps described by [Kvinge et al.2018], however by using the sampling-based estimates per (8) and (10) rather than the secant set for the entire training set, we reduce the computational cost by orders of magnitude to a modest level.
4.2.1 REC Histogram for A
In figure 3, we show the efficacy of this designed matrix by the histogram of values for the designed and random Gaussian and from the secant set of the MNIST dataset. For random , the support of is clearly wider for few measurements , resulting in . For the designed matrix, the support is more concentrated, hence satisfying the sufficient condition for convergence of the PGD algorithm even with as few as measurements(for MNIST), thus ensuring stable recovery.
Network Architecture: We follow the architecture of deep convolutional generative adversarial networks (DCGAN) [Radford et al.2015]
. DCGAN builds on multiple convolution, deconvolution and ReLU layers, and uses batch normalization and dropout for better generalization. For two datasets, namely MNIST and CelebA, we designed two projection networks of similar structure but different number of layers. The architecture of the modelis similar to that of the discriminator in the GAN and only varies in the final layer, where we add a fully-connected layer with output same as the latent variables dimension . For training , we used the architecture shown in fig. 2 and objective defined in (4), keeping the pre-trained fixed. We found that using , in (4), gave the best performance. The noise used for perturbing the training images follows . We observed that training with low results in a projector similar to an identity operator and hence only projecting close-by points onto , and high makes it violate idempotence. We empirically set . We then obtain a projection network which approximately projects the image lying outside onto the space spanned by .
Figure 4 shows the relative projection error () of images from the MNIST dataset for different latent variable dimension . The projection error slowly decreases with increasing and saturates around , therefore, we pick for the experiments.
We compare the performance of our algorithm with GAN-based priors ([Bora et al.2017, Shah and Hegde2018]) and a sparsity-based method, specifically Lasso with discrete cosine transform (DCT) basis [Tibshirani1996]. Also, we extensively evaluate the reconstruction performance for the random Gaussian and designed measurement matrices.
This dataset [LeCun et al.1998] consists of greyscale images of digits with training and test samples. We pre-train the GAN consisting of transposed convolution layers for and convolution layers in the discriminator using rescaled images lying between . We use as the ’s input. The GAN is trained using the Adam optimizer with learning rate , mini-batch size of for epochs. For training the pseudo-inverse of i.e. , we minimize the objective (4), using samples from , and with the same hyper-parameters used for the GAN.
Recovery with random Gaussian matrix: In this set-up, we used the measurement matrix same as ([Bora et al.2017, Shah and Hegde2018]) i.e. . Figure 5 shows the recovery results for selected MNIST images from the test set. It can be seen that our algorithm performs better than others and avoids local optima. Figure 7(a) provides a quantitative comparison for different .
This dataset [Liu et al.2015] consists of more than celebrity images. We use the aligned and cropped version, which preprocesses each image to a size of and scaled between . We randomly pick images for training the GAN. Images from the held-out set are used for evaluation. The GAN consists of transposed convolution layers in the and convolution layers in . Pre-training of GAN is done for epochs using Adam optimizer with learning rate and mini-batch size . is trained in the same way as for the MNIST dataset.
Recovery with Random Gaussian Matrix: is the measurement matrix with . Figure 6 shows the reconstruction of eight test images. We observe that our algorithm outperforms the other three methods visually as it is able to preserve detailed facial features such as sunglasses, hair and better color tones. Figure 7(c) provides a quantitative comparison for different .
5.3 Comparison: Gaussian and designed matrix
We observe that recovery with the designed is possible for fewer measurements . This corroborates our conjecture based on figure 3 that the designed matrix satisfies the desired REC condition with high probability for most of the secants, for smaller . Figure 7 (a)(c) shows that our algorithm consistently outperforms other approaches in terms of reconstruction error and structural similarity index (SSIM) for a random . Furthermore, with the designed , we are able to get performance on-par with the random matrix using smaller . Figure 7 (b)(d) show the recovered images with the designed and random using our algorithm for different . It demonstrates that recovery with the random requires much bigger than the designed one to achieve similar performance.
5.4 Comparison of Run-time for Recovery
Table 1 compares the execution times of our network-based algorithm NPGD and other recovery algorithms. We record the average execution time to process a single image from its measurements over 10 different images. All three algorithms were run on the same workstation with i7-4770K CPU, 32GB RAM and GeForce Titan X GPU.
In this work, we propose a GAN based projection network for faster recovery in compressed sensing. We show that the method demonstrates superior performance and also provides a speed-up of over existing GAN-based methods, eliminating the expensive computation of the Jacobian matrix every iteration. We provide a theoretical bound on the reconstruction error for a moderately-conditioned measurement matrix. To help design such a matrix, we propose a method which enables recovery using fewer measurements than using a random Gaussian matrix.
7 Appendix: Proof of Theorem 1
By the assumption of -approximate projection,
where from the gradient update step, we have
Substituting into (12) yields
Rearranging the terms we have
where the last two inequalities follow from . Now the LHS can be rewritten as:
and since ,
For simplicity, we substitute in the following:
Defining , it is easy to see that . For convergence, we require . When reaches , we have
Finally, when , we have
- [Adler and Öktem2017] Jonas Adler and Ozan Öktem. Solving ill-posed inverse problems using iterative deep neural networks. Inverse Problems, 33(12):124007, 2017.
- [Anirudh et al.2018] Rushil Anirudh, Jayaraman J Thiagarajan, Bhavya Kailkhura, and Timo Bremer. An unsupervised approach to solving inverse problems using generative adversarial networks. arXiv preprint arXiv:1805.07281, 2018.
- [Baraniuk and Wakin2009] Richard G Baraniuk and Michael B Wakin. Random projections of smooth manifolds. Foundations of computational mathematics, 9(1):51–77, 2009.
- [Bora et al.2017] Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G Dimakis. Compressed sensing using generative models. arXiv preprint arXiv:1703.03208, 2017.
- [Candes et al.2006] Emmanuel J Candes, Justin K Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 59(8):1207–1223, 2006.
- [Creswell et al.2018] Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, and Anil A Bharath. Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 35(1):53–65, 2018.
- [Donoho2006] David L Donoho. Compressed sensing. IEEE Transactions on information theory, 52(4):1289–1306, 2006.
- [Fan et al.2017] Kai Fan, Qi Wei, Lawrence Carin, and Katherine A Heller. An inner-loop free solution to inverse problems using deep neural networks. In Advances in Neural Information Processing Systems, pages 2370–2380, 2017.
- [Goodfellow et al.2014] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
[Gregor and LeCun2010]
Karol Gregor and Yann LeCun.
Learning fast approximations of sparse coding.
Proceedings of the 27th International Conference on International Conference on Machine Learning, pages 399–406. Omnipress, 2010.
- [Gupta et al.2018] Harshit Gupta, Kyong Hwan Jin, Ha Q Nguyen, Michael T McCann, and Michael Unser. Cnn-based projected gradient descent for consistent ct image reconstruction. IEEE transactions on medical imaging, 37(6):1440–1453, 2018.
- [Hegde et al.2015] C. Hegde, A. C. Sankaranarayanan, W. Yin, and R. G. Baraniuk. Numax: A convex approach for learning near-isometric linear embeddings. IEEE Transactions on Signal Processing, 63(22):6109–6121, Nov 2015.
- [Kingma and Ba2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- [Kvinge et al.2018] Henry Kvinge, Elin Farnell, Michael Kirby, and Chris Peterson. A gpu-oriented algorithm design for secant-based dimensionality reduction. In 2018 17th International Symposium on Parallel and Distributed Computing (ISPDC), pages 69–76. IEEE, 2018.
- [LeCun et al.1998] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[Liu et al.2015]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang.
Deep learning face attributes in the wild.
Proceedings of International Conference on Computer Vision (ICCV), December 2015.
- [Radford et al.2015] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
[Rick Chang et al.2017]
JH Rick Chang, Chun-Liang Li, Barnabas Poczos, BVK Vijaya Kumar, and Aswin C
One network to solve them all–solving linear inverse problems using
deep projection models.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5888–5897, 2017.
- [Shah and Hegde2018] Viraj Shah and Chinmay Hegde. Solving linear inverse problems using gan priors: An algorithm with provable guarantees. arXiv preprint arXiv:1802.08406, 2018.
- [Tibshirani1996] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.
- [Venkatakrishnan et al.2013] Singanallur V Venkatakrishnan, Charles A Bouman, and Brendt Wohlberg. Plug-and-play priors for model based reconstruction. In Global Conference on Signal and Information Processing (GlobalSIP), 2013 IEEE, pages 945–948. IEEE, 2013.
- [Zhu et al.2016] Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A Efros. Generative visual manipulation on the natural image manifold. In European Conference on Computer Vision, pages 597–613. Springer, 2016.