GAN-based Projector for Faster Recovery in Compressed Sensing with Convergence Guarantees

02/26/2019 ∙ by Ankit Raj, et al. ∙ University of Illinois at Urbana-Champaign 0

A Generative Adversarial Network (GAN) with generator G trained to model the prior of images has been shown to perform better than sparsity-based regularizers in ill-posed inverse problems. In this work, we propose a new method of deploying a GAN-based prior to solve linear inverse problems using projected gradient descent (PGD). Our method learns a network-based projector for use in the PGD algorithm, eliminating the need for expensive computation of the Jacobian of G. Experiments show that our approach provides a speed-up of 30-40× over earlier GAN-based recovery methods for similar accuracy in compressed sensing. Our main theoretical result is that if the measurement matrix is moderately conditioned for range(G) and the projector is δ-approximate, then the algorithm is guaranteed to reach O(δ) reconstruction error in O(log(1/δ)) steps in the low noise regime. Additionally, we propose a fast method to design such measurement matrices for a given G. Extensive experiments demonstrate the efficacy of this method by requiring 5-10× fewer measurements than random Gaussian measurement matrices for comparable recovery performance.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many application such as computational imaging, and remote sensing fall in the compressive sensing paradigm. Compressive sensing (CS) [Donoho2006, Candes et al.2006] refers to projecting a high dimensional, sparse or sparsifiable signal to a lower dimensional measurement , using a small set of linear, non-adaptive frames. The noisy measurement model is:

(1)

where the measurement matrix

is often a random matrix. In this work, we are interested in the problem of recovering the unknown natural signal

, from the compressed measurement , given the measurement matrix . Instead of the sparse prior commonly adopted by CS literature, we turn to a learned prior. In [Gregor and LeCun2010, Venkatakrishnan et al.2013, Rick Chang et al.2017, Adler and Öktem2017, Fan et al.2017, Gupta et al.2018]

, the authors explored neural network-based inverse problem solvers. Recently, with the success of generative adversarial network (GAN)

[Goodfellow et al.2014, Creswell et al.2018, Zhu et al.2016] in modeling the distribution of data, in [Anirudh et al.2018, Bora et al.2017, Shah and Hegde2018], the authors used a GAN as the prior for natural images.
However [Bora et al.2017] do not have a guarantee on the convergence of their algorithm for solving the non-convex optimization problem, which requires several random initialization. Similarly, in [Shah and Hegde2018], the inner loop is solving a non-convex optimization problem using a gradient descent algorithm with no guarantee of convergence to a global optimum. Furthermore, the conditions imposed in [Shah and Hegde2018] on the random Gaussian measurement matrix for convergence of their outer iterative loop are unnecessarily stringent and cannot be achieved with a moderate number of measurements. Meanwhile, both these methods require expensive computation of i.e. the Jacobian of the differentiable generator with respect to the latent input . Since computing the Jacobian involves back-propagation through at every iteration, these reconstruction algorithms are computationally expensive and even when implemented on a GPU they are slow.
Our contributions: In this paper, we propose a GAN-based projection network to solve compressed sensing recovery problems using projected gradient descent (PGD). We are able to reconstruct the image even with compression ratio (i.e., with less than of a full measurement set) using a random Gaussian measurement matrix. The proposed approach provides superior recovery accuracy over existing methods, simultaneously with a

speed-up, making the algorithm useful for practical applications. We provide theoretical results on the convergence of the cost function as well as the reconstruction error, given that the eigenvalues of measurement matrix

satisfy certain conditions when restricted to the range of the generator. We complement the theory by proposing a method to design a measurement matrix that satisfies these sufficient conditions for guaranteed convergence. We assess these sufficient conditions for both the random Gaussian measurement matrix and the designed matrix for a given image dataset. Both our analysis and reconstruction experiments show that with the designed matrix, fewer measurements suffice for robust recovery.

2 Problem Formulation

Let denote a ground truth image, a fixed measurement matrix, and the noisy measurement, with noise . We assume that the ground truth images lie in a non-convex set , the range of generator

. The maximum likelihood estimator (MLE) of

, , can be formulated as follows:

(2)

[Bora et al.2017] (whose algorithm we denote by CSGM) solve the following optimization problem

(3)

in the latent space (), and set . Their gradient descent algorithm often gets stuck at local optima. Since the problem is non-convex, the reconstruction is strongly dependent on the initialization of and requires several random initializations to converge to a good point. To resolve this problem, [Shah and Hegde2018] proposed a projected gradient descent (PGD)-based method (to which we refer as PGD-GAN) shown in fig.1(a), to find the minimizer in (2). They perform gradient descent in the ambient ()-space and project the gradient update term onto . This projection involves solving another non-convex minimization problem (shown in the second box in fig.1(a)) using the Adam optimizer [Kingma and Ba2014] for 100 iterations from random initialization. No convergence result is given for this iterative algorithm to perform the non-linear projection, and the convergence analysis for the PGD-GAN algorithm that Shah and Hegde provides for the noiseless measurement case only holds if one assumes that the inner loop succeeds in finding the optimum projection.

Our main idea in this paper is to replace this iterative scheme in the inner-loop with a learning-based approach, as it often performs better and does not fall into local optima [Zhu et al.2016]. Besides, both the earlier approaches require expensive computation of the Jacobian of , which is eliminated in the proposed approach.

3 Proposed Method

In this section, we introduce our methodology and architecture to train a projector using a pre-trained generator and how we use this projector to obtain the optimizer in (2).

(a) PGD with inner-loop
(b) Network-based PGD (NPGD)
Figure 1: (a) Block diagram of PGD using inner-loop [Shah and Hegde2018]. represents the outer loop iterators and is the optimizer of obtained by solving the inner-loop using Adam optimizer. (b) Block diagram of our network-based PGD (NPGD) with as a network based projector onto . is the cost function defined in (2)

3.1 Inner-Loop-Free Scheme

We show that by carefully designing a network architecture with a suitable training strategy, we can train a projector onto , the range of the generator , thereby removing the inner-loop required in the earlier approach. The resulting iterative updates of our network-based PGD (NPGD) algorithm are shown in fig.1(b). This approach eliminates the need to solve the inner-loop non-convex optimization problem, which depends on initialization requires several restarts. Furthermore, our method provides a significant speed-up by a factor of on CelebA dataset because of two major reasons: (i) since there is no inner-loop, the total number of iterations required for convergence is significantly reduced, (ii) doesn’t require computation of i.e. the Jacobian of the generator with respect to the input, . This is very expensive requiring back-propagation through the network for (for [Bora et al.2017]) and (for [Shah and Hegde2018]) where are number of restarts, outer and inner iterations respectively.

3.2 Generator-based Projector

Figure 2: Architecture to train a projector onto range()

A GAN consists of two networks, generator and discriminator which follow an adversarial training strategy to learn the data distribution. A well-trained generator takes in a random latent variable and produces sharp looking images from the training data distribution in . The goal is to train a network that projects the image onto . The projector, onto a set should satisfy two main properties: Idempotence, for any point , , Least distance, for a point , . Figure 2 shows the network structure we used to train a projector using a GAN. We define the multi-task loss to be:

(4)

where is a generator obtained from the GAN trained on a particular dataset. , parameterized by , approximates a non-linear least squares pseudo-inverse of and indicates the noise added to the generator’s output for different so that the projector network denoted by is trained on the points outside the range() and learns to project them onto . The objective function consists of two parts. The first one is similar to standard Encoder-Decoder

framework, however, the loss function is minimized for

, parameters of , while keeping the parameters of (obtained by standard GAN training) fixed. This ensures that doesn’t change and is a mapping onto . The second part is used to keep close to true used to generate training image . The second term can be considered a regularizer for training the projector with being the regularization constant.

4 Theoretical Study

4.1 Convergence Analysis

Let denote the loss function of projected gradient descent. Algorithm (1) describes the proposed network-based projected gradient descent (NPGD) to solve equation (2).

Input: loss function ,
Parameter: step size
Output: an estimate

1:  Let .
2:  while  do
3:     
4:     
5:  end while
6:  return
Algorithm 1 Network-based Projected Gradient Descent
Definition 1 (Restricted Eigenvalue Constraint (REC)).

Let . For some parameters , matrix is said to satisfy the if the following holds for all .

(5)
Definition 2 (Approximate Projection using GAN).

A concatenated network is an -approximate projector, if the following holds for all :

(6)

The following theorem (proved in the Appendix) provides upper bounds on the cost function and reconstruction error after iterations.

Theorem 1.

Let matrix satisfy the with , and let the concatenated network be a -approximate projector. Then for every and measurement , executing algorithm 1 with step size , will yield . Furthermore, the algorithm achieves after steps. When ,

By theorem 1 the convergence is linear (”geometric”) with rate determined largely by the ratio , and the reconstruction error at convergence is similarly controlled by . We would like ratio as close to 1 as possible and must have for convergence. It has been shown in [Baraniuk and Wakin2009] that a random matrix

with orthonormal rows will satisfy this condition with high probability for

roughly linear in dimension with log factors dependent on the properties of the manifold, in this case, . However, as we demonstrate later (see figure 3), a random matrix often will not satisfy the desired condition for small or moderate

. We propose a fast heuristic method to find a relatively good measurement matrix for an image set

, given a fixed .

4.2 Generator-based Measurement Matrix Design

There have been a few attempts to optimize the measurement matrix based on the specific data distribution. [Hegde et al.2015] tried to find a deterministic measurement matrix that satisfies for a finite set , but the time complexity is , with being the size of set . Because the secant set would be of cardinality for a training set of size , with , the time complexity would be infeasible even for fairly small images. Furthermore, the final number of required measurements , which is determined by the algorithm, depends on the isometry constant , and cannot be specified in advance. [Kvinge et al.2018] introduced a heuristic iterative algorithm to find a measurement matrix with orthonormal rows that satisfies the REC with small ratio, but the time complexity is and the space complexity is , which is infeasible for a high-dimensional image dataset. Our method, based on sampling from the secant set, has time complexity , and space complexity , where is a tiny fraction of .

Definition 3 (Secant Set).

The normalized secant set of is defined as follows:

(7)

and the associated distribution is denoted as , where

(8)

Given , the optimization over A is as follows:

(9)

The first inequality is due to an additional constraint on

. This results in the largest singular value of

being 1 and hence the numerator term, , is at most 1. Finally, we replace the relaxed objective on the RHS of (9) by its empirical estimate obtained by sampling secants according to :

(10)

For and large enough, this designed measurement matrix would satisfy the condition for most of the secants in . Constructing an matrix , the previous optimization problems can be rewritten as:

(11)

The optimal from these samples would satisfy , where is the eigenvalue decomposition (EVD) of and is the sub-matrix consisting of first columns in . We compute and its EVD at time complexity and space complexity .

Our approach to the design of is related to one of the steps described by [Kvinge et al.2018], however by using the sampling-based estimates per (8) and (10) rather than the secant set for the entire training set, we reduce the computational cost by orders of magnitude to a modest level.

4.2.1 REC Histogram for A

In figure 3, we show the efficacy of this designed matrix by the histogram of values for the designed and random Gaussian and from the secant set of the MNIST dataset. For random , the support of is clearly wider for few measurements , resulting in . For the designed matrix, the support is more concentrated, hence satisfying the sufficient condition for convergence of the PGD algorithm even with as few as measurements(for MNIST), thus ensuring stable recovery.

Figure 3: The distribution of with random matrix (blue) and designed matrix (orange), on MNIST dataset with different .

5 Experiments

Network Architecture: We follow the architecture of deep convolutional generative adversarial networks (DCGAN) [Radford et al.2015]

. DCGAN builds on multiple convolution, deconvolution and ReLU layers, and uses batch normalization and dropout for better generalization. For two datasets, namely MNIST and CelebA, we designed two projection networks of similar structure but different number of layers. The architecture of the model

is similar to that of the discriminator in the GAN and only varies in the final layer, where we add a fully-connected layer with output same as the latent variables dimension . For training , we used the architecture shown in fig. 2 and objective defined in (4), keeping the pre-trained fixed. We found that using , in (4), gave the best performance. The noise used for perturbing the training images follows . We observed that training with low results in a projector similar to an identity operator and hence only projecting close-by points onto , and high makes it violate idempotence. We empirically set . We then obtain a projection network which approximately projects the image lying outside onto the space spanned by .
Figure 4 shows the relative projection error () of images from the MNIST dataset for different latent variable dimension . The projection error slowly decreases with increasing and saturates around , therefore, we pick for the experiments.

Figure 4: Projection error with different on MNIST

We compare the performance of our algorithm with GAN-based priors ([Bora et al.2017, Shah and Hegde2018]) and a sparsity-based method, specifically Lasso with discrete cosine transform (DCT) basis [Tibshirani1996]. Also, we extensively evaluate the reconstruction performance for the random Gaussian and designed measurement matrices.

5.1 Mnist

This dataset [LeCun et al.1998] consists of greyscale images of digits with training and test samples. We pre-train the GAN consisting of transposed convolution layers for and convolution layers in the discriminator using rescaled images lying between . We use as the ’s input. The GAN is trained using the Adam optimizer with learning rate , mini-batch size of for epochs. For training the pseudo-inverse of i.e. , we minimize the objective (4), using samples from , and with the same hyper-parameters used for the GAN.
Recovery with random Gaussian matrix: In this set-up, we used the measurement matrix same as ([Bora et al.2017, Shah and Hegde2018]) i.e. . Figure 5 shows the recovery results for selected MNIST images from the test set. It can be seen that our algorithm performs better than others and avoids local optima. Figure 7(a) provides a quantitative comparison for different .

Figure 5: Reconstruction using Gaussian matrix with . 444Code of Shah et al. (PGD-GAN) for MNIST not available

5.2 CelebA

This dataset [Liu et al.2015] consists of more than celebrity images. We use the aligned and cropped version, which preprocesses each image to a size of and scaled between . We randomly pick images for training the GAN. Images from the held-out set are used for evaluation. The GAN consists of transposed convolution layers in the and convolution layers in . Pre-training of GAN is done for epochs using Adam optimizer with learning rate and mini-batch size . is trained in the same way as for the MNIST dataset.
Recovery with Random Gaussian Matrix: is the measurement matrix with . Figure 6 shows the reconstruction of eight test images. We observe that our algorithm outperforms the other three methods visually as it is able to preserve detailed facial features such as sunglasses, hair and better color tones. Figure 7(c) provides a quantitative comparison for different .

Figure 6: Reconstruction using Gaussian matrix with .

5.3 Comparison: Gaussian and designed matrix

We observe that recovery with the designed is possible for fewer measurements . This corroborates our conjecture based on figure 3 that the designed matrix satisfies the desired REC condition with high probability for most of the secants, for smaller . Figure 7 (a)(c) shows that our algorithm consistently outperforms other approaches in terms of reconstruction error and structural similarity index (SSIM) for a random . Furthermore, with the designed , we are able to get performance on-par with the random matrix using smaller . Figure 7 (b)(d) show the recovered images with the designed and random using our algorithm for different . It demonstrates that recovery with the random requires much bigger than the designed one to achieve similar performance.

(a) (b)
(c) (d)
Figure 7: (a) Relative reconstruction error and SSIM of reconstruction algorithms for MNIST dataset with measurements. (b) Reconstructed images from MNIST dataset with random Gaussian (middle row) and a designed matrix with orthonormal rows based on (bottom row) using different . (c) Relative reconstruction error and SSIM for CelebA dataset with measurements. (d) Reconstructed images from CelebA dataset with random Gaussian (middle row) and a designed matrix with orthonormal rows (bottom row).

5.4 Comparison of Run-time for Recovery

Table 1 compares the execution times of our network-based algorithm NPGD and other recovery algorithms. We record the average execution time to process a single image from its measurements over 10 different images. All three algorithms were run on the same workstation with i7-4770K CPU, 32GB RAM and GeForce Titan X GPU.

m CSGM PGD-GAN NPGD
200 2.9 66 0.09 (32x)
500 3.3 60 0.10 (33x)
1000 4.0 63 0.11 (36x)
2000 5.6 61 0.14 (40x)
Table 1: Comparison of execution time (in seconds) of recovery algorithms on the CelebA dataset. The relative speedup of our NPGD over the CSGM algorithm of Bora et al. is shown in parenthesis.

6 Conclusion

In this work, we propose a GAN based projection network for faster recovery in compressed sensing. We show that the method demonstrates superior performance and also provides a speed-up of over existing GAN-based methods, eliminating the expensive computation of the Jacobian matrix every iteration. We provide a theoretical bound on the reconstruction error for a moderately-conditioned measurement matrix. To help design such a matrix, we propose a method which enables recovery using fewer measurements than using a random Gaussian matrix.

7 Appendix: Proof of Theorem 1

By the assumption of -approximate projection,

(12)

where from the gradient update step, we have

(13)

Substituting into (12) yields

(14)

Rearranging the terms we have

(15)

where the last two inequalities follow from . Now the LHS can be rewritten as:

(16)

Combining (15) and (16), and rearranging the terms, we have:

(17)

and since ,

(18)

For simplicity, we substitute in the following:

(19)

Defining , it is easy to see that . For convergence, we require . When reaches , we have

(20)

Finally, when , we have

(21)

References

  • [Adler and Öktem2017] Jonas Adler and Ozan Öktem. Solving ill-posed inverse problems using iterative deep neural networks. Inverse Problems, 33(12):124007, 2017.
  • [Anirudh et al.2018] Rushil Anirudh, Jayaraman J Thiagarajan, Bhavya Kailkhura, and Timo Bremer. An unsupervised approach to solving inverse problems using generative adversarial networks. arXiv preprint arXiv:1805.07281, 2018.
  • [Baraniuk and Wakin2009] Richard G Baraniuk and Michael B Wakin. Random projections of smooth manifolds. Foundations of computational mathematics, 9(1):51–77, 2009.
  • [Bora et al.2017] Ashish Bora, Ajil Jalal, Eric Price, and Alexandros G Dimakis. Compressed sensing using generative models. arXiv preprint arXiv:1703.03208, 2017.
  • [Candes et al.2006] Emmanuel J Candes, Justin K Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 59(8):1207–1223, 2006.
  • [Creswell et al.2018] Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, and Anil A Bharath. Generative adversarial networks: An overview. IEEE Signal Processing Magazine, 35(1):53–65, 2018.
  • [Donoho2006] David L Donoho. Compressed sensing. IEEE Transactions on information theory, 52(4):1289–1306, 2006.
  • [Fan et al.2017] Kai Fan, Qi Wei, Lawrence Carin, and Katherine A Heller. An inner-loop free solution to inverse problems using deep neural networks. In Advances in Neural Information Processing Systems, pages 2370–2380, 2017.
  • [Goodfellow et al.2014] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
  • [Gregor and LeCun2010] Karol Gregor and Yann LeCun. Learning fast approximations of sparse coding. In

    Proceedings of the 27th International Conference on International Conference on Machine Learning

    , pages 399–406. Omnipress, 2010.
  • [Gupta et al.2018] Harshit Gupta, Kyong Hwan Jin, Ha Q Nguyen, Michael T McCann, and Michael Unser. Cnn-based projected gradient descent for consistent ct image reconstruction. IEEE transactions on medical imaging, 37(6):1440–1453, 2018.
  • [Hegde et al.2015] C. Hegde, A. C. Sankaranarayanan, W. Yin, and R. G. Baraniuk. Numax: A convex approach for learning near-isometric linear embeddings. IEEE Transactions on Signal Processing, 63(22):6109–6121, Nov 2015.
  • [Kingma and Ba2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • [Kvinge et al.2018] Henry Kvinge, Elin Farnell, Michael Kirby, and Chris Peterson. A gpu-oriented algorithm design for secant-based dimensionality reduction. In 2018 17th International Symposium on Parallel and Distributed Computing (ISPDC), pages 69–76. IEEE, 2018.
  • [LeCun et al.1998] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  • [Liu et al.2015] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In

    Proceedings of International Conference on Computer Vision (ICCV)

    , December 2015.
  • [Radford et al.2015] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
  • [Rick Chang et al.2017] JH Rick Chang, Chun-Liang Li, Barnabas Poczos, BVK Vijaya Kumar, and Aswin C Sankaranarayanan. One network to solve them all–solving linear inverse problems using deep projection models. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , pages 5888–5897, 2017.
  • [Shah and Hegde2018] Viraj Shah and Chinmay Hegde. Solving linear inverse problems using gan priors: An algorithm with provable guarantees. arXiv preprint arXiv:1802.08406, 2018.
  • [Tibshirani1996] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.
  • [Venkatakrishnan et al.2013] Singanallur V Venkatakrishnan, Charles A Bouman, and Brendt Wohlberg. Plug-and-play priors for model based reconstruction. In Global Conference on Signal and Information Processing (GlobalSIP), 2013 IEEE, pages 945–948. IEEE, 2013.
  • [Zhu et al.2016] Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A Efros. Generative visual manipulation on the natural image manifold. In European Conference on Computer Vision, pages 597–613. Springer, 2016.