Generative deep neural networks have shown remarkable performance as natural signal priors in imaging inverse problems, such as denoising, inpainting, compressed sensing, blind deconvolution, and phase retrieval. These generative models can be trained from datasets consisting of images of particular natural signal classes, such as faces, fingerprints, MRIs, and more [Karras et al., 2017, Minaee and Abdolrashidi, 2018, Shin et al., 2018, Chen et al., 2018]
. Some such models, including variational autoencoders (VAEs) and generative adversarial networks (GANs)[Goodfellow et al., 2014, Kingma and Welling, 2013, Rezende et al., 2014], learn an explicit low-dimensional manifold that approximates a natural signal class. We will refer to such models as GAN priors. With an explicit parameterization of the natural signal manifold by a low dimensional latent representation, these generative models allow for direct optimization over a natural signal class. Consequently, they can obtain significant performance improvements over non-learning based methods. For example, GAN priors have been shown to outperform sparsity priors at compressed sensing with 5-10x fewer measurements. Additionally, GAN priors have led to theory for signal recovery in the linear compressive sensing and nonlinear phase retrieval problems [Bora et al., 2017, Hand and Voroninski, 2017, Hand et al., 2018], and they have also shown promising results for the nonlinear blind image deblurring problem [Asim et al., 2018].
A significant drawback of GAN priors for solving inverse problems is that they can have representation error or bias due to architecture and training. This can happen for many reasons, including because the generator only approximates the natural signal manifold, because the natural signal manifold is of higher dimensionality than modeled, because of mode collapse, or because of bias in the training dataset itself. As many aspects of generator architecture and training lack clear principles, representation error of GANs may continue to be a challenge even after substantial hand crafting and engineering. Additionally, learning-based methods are particularly vulnerable to the biases of their training data, and training data, no matter how carefully collected, will always contain degrees of bias. As an example, the CelebA dataset [Liu et al., 2015] is biased toward people who are young, who do not have facial hair or glasses, and who have a light skin tone. As we will see, a GAN prior trained on this dataset learns these biases and exhibit image recovery failures because of them.
In contrast, invertible neural networks can be trained as generators with zero representation error. These networks are invertible (one-to-one and onto) by architectural design [Dinh et al., 2016, Gomez et al., 2017, Jacobsen et al., 2018, Kingma and Dhariwal, 2018]. Consequently, they are capable of recovering any image, including those significantly out-of-distribution relative to a biased training set. We call the domain of an invertible generator the latent space, and we call the range of the generator the signal space. These must have equal dimensionality. Flow-based invertible generative models are composed of a sequence of learned invertible transformations. Their strengths include: their architecture allows exact and efficient latent-variable inference, direct log-likelihood evaluation, and efficient image synthesis; they have the potential for significant memory savings in gradient computations; and they can be trained by directly optimizing the likelihood of training images. This paper emphasizes an additional strength: because they lack representation error, invertible models can mitigate dataset bias and improve performance on out-of-distribution data.
In this paper, we study generative invertible neural network priors for imaging inverse problems. We will specifically use the Glow architecture, though our framework could be used with other architectures. A Glow-based model is composed of a sequence of invertible affine coupling layers, 1x1 convolutional layers, and normalization layers. Glow models have been successfully trained to generate high resolution photorealistic images of human faces [Kingma and Dhariwal, 2018].
We present a method for using pretrained generative invertible neural networks as priors for imaging inverse problems. The invertible generator, once trained, can be used for a wide variety of inverse problems, with no specific knowledge of those problems used during the training process. Our method is a standard empirical risk formulation, which we supplement with regularization either by a penalty on the norm of an image’s latent representation or by an initialization at a latent representation of zero or small norm. This regularization promotes images with high likelihood under the invertible model.
We train a generative invertible model using the CelebA dataset. With this fixed model as a signal prior, we study its performance at denoising, compressive sensing, and inpainting. For denoising, it can outperform BM3D [Dabov et al., 2007]. For compressive sensing on test images, it can obtain higher quality reconstructions than Lasso across almost all subsampling ratios, and at similar reconstruction errors can succeed with 10-20x fewer measurements than Lasso. It provides an improvement of about 2-3x fewer linear measurements when compared to [Bora et al., 2017]. Despite being trained on the CelebA dataset, our generative invertible prior can give higher quality reconstructions than Lasso on out-of-distribution images of faces, and, to a lesser extent, unrelated natural images. Our invertible prior outperforms a pretrained DCGAN [Radford et al., 2015] at face inpainting and exhibits qualitatively reasonable results on out-of-distribution human faces. We provide additional experiments in the supplemental materials, including for training on other datasets.
We assume that we have access to a pretrained generative invertible neural network . We write and , where is an image that corresponds to the latent representation . We will consider a that has the Glow architecture introduced in [Kingma and Dhariwal, 2018]
. It can be trained by direct optimization of the likelihood of a collection of training images of a natural signal class, under a standard Gaussian distribution over the latent space. We consider recovering an imagefrom possibly-noisy linear measurements given by ,
where models noise. Given a pretrained generator , we propose the following penalized empirical risk formulation for recovering the image . One can solve
beginning from an initialization
. The estimate for the imageis then given by , where minimizes (1). The penalty term on the norm of is meant to enforce ‘naturalness’ of the resulting image and is the root log likelihood of under a Gaussian prior. Similar performance is observed if the penalization is used instead; as demonstrated in the supplement. In the case of GAN prior and , this formulation reduces to that of [Bora et al., 2017].
All the experiments that follow will be for an invertible model we trained on the CelebA dataset of celebrity faces, as in [Kingma and Dhariwal, 2018]. Similar results for models trained on birds and flowers [Wah et al., 2011, Nilsback and Zisserman, 2008] can be found in the supplemental materials. Due to computational considerations, we run experiments on color images with the pixel values scaled between . The train and test sets contain a total of 27,000 and 3,000 images, respectively. We trained a Glow architecture [Kingma and Dhariwal, 2018]; see the supplementary material for details. Once trained, the Glow prior is fixed for use in each of the inverse problems below. We also trained a DCGAN for the same dataset. We solve (1) using LBFGS, which was found to outperform Adam [Kingma and Ba, 2014]. Unless otherwise stated, all Glow experiments were initialized at , and thus there is no randomness in solving (1
). DCGAN results are reported for an average of 3 runs because we observed some variance due to random initialization.
We consider the denoising problem with and , for images in the CelebA test dataset. We evaluate the performance of a Glow prior, a DCGAN prior, and BM3D for two different noise levels. Figure 2 shows the recovered PSNR values as a function of for denoising by the Glow and DCGAN priors, along with the PSNR by BM3D. The figure shows that the performance of the regularized Glow prior increases with , and then decreases. If is too low, then the network fits to the noise in the image. If is too high, then data fit is not enforced strongly enough. The left panel reveals that an appropriately regularized Glow prior can outperform BM3D by almost 2 dB. The experiments also reveal that appropriately regularized Glow priors outperform the DCGAN prior, which suffers from representation error and is not aided by the regularization. The right panel confirms that with smaller noise levels, less regularization is needed for optimal performance.
A visual comparison of the recoveries at the noise level using Glow, DCGAN priors, and BM3D can be seen in Figure 3. Note that the recoveries with Glow are sharper than BM3D. See the supplementary material for more quantitative and qualitative results.
3.2 Compressed Sensing
In compressed sensing, one is given undersampled linear measurements of an image, and the goal is to recover the image from those measurements. In our notation, with . As the image is undersampled, there is an affine space of images consistent with the measurements, and an algorithm must select which is most ‘natural.’ A common proxy for naturalness in the literature has been sparsity with respect to the DCT or wavelet bases. With a GAN prior, an image is considered natural if it lies in or near the range of the GAN. For an invertible prior, we consider an image to be natural if it has a latent representation of small norm.
We study compressed sensing in the case that is an matrix of i.i.d. entries, and is an image from the CelebA test set. Here, . We consider the case where is standard iid Gaussian random noise normalized such that . We compare Glow, DCGAN, Lasso111The inverse problems with Lasso were solved by using coordinate descent. with respect to the DCT and wavelet bases.
The left panel of Figure 4 shows that when and , the Glow prior outperforms both DCGAN and Lasso in reconstruction quality over all undersampling ratios. In the case of no undersampling, the Glow and Lasso priors are comparable. Replicating the results of [Bora et al., 2017], our experiments demonstrate that the DCGAN prior can achieve comparable reconstruction error as sparsity priors with 5-10x fewer measurements in some contexts. As also demonstrated in that paper, when there are a sufficient number of measurements, the sparsity priors outperform the GAN prior because the DCT and wavelet bases have zero representation error. The Glow prior (1) can result in 15 dB higher PSNRs than DCGAN, and (2) can give comparable recovery errors with 2-3x fewer measurements at high undersampling ratios. This difference is explained by the representation error of DCGAN. Additional plots and visual comparison, available in the supplemental material, show notable improvements in quality of in- and out-of-distribution images using an invertible prior relative to DCGAN and Lasso.
We conducted several additional experiments to understand the regularizing effects of and the initialization . The right panel of Figure 4 shows the PSNRs under multiple initialization strategies: , , , with given by the solution to Lasso with respect to the wavelet basis, and where is perturbed by a random point in the null space of . We observe that setting can noticeably improve recovery error for certain initializations. This behavior is expected because if and is consistent with the measurements, the optimization algorithm will stay stuck at the initialization. With positive and such initializations, the formulation will return an image with data-fit error, but higher PSNR. 222Unnatural initializations consistent with measurements could have arbitrarily low PSNRs. We do not include such experiments because the Glow prior has unstable numerical behavior for highly unnatural images, which are well outside the bounds where the network was trained.
Surprisingly, we observe that if the optimization of (1) for compressed sensing is initialized with a small latent variable, then optimal recovery quality occurs when . As there is no explicit regularization in the objective, this indicates that algorithmic regularization is occurring and that initialization plays a role.333These experiments are presented for a LBFGS solver. See the supplemental material for experiments revealing the same effect for the Adam solver The left panel in Figure 5 shows that as the norm of the latent initialization increases, the norm of the recovered latent representation increases and the PSNR of the recovered image decreases. Moreover, the right panel in Figure 5 shows the norm of the estimated latent representation at each iteration of the optimization. In all our experiments, it monotonically grows versus iteration number. These experiments provide further evidence that smaller latent initializations lead to outputs that are more natural and have smaller latent representations.
Additionally, we remark that the optimization landscape of the Glow prior appears to be smoother than that of the DCGAN prior. In Figure 6, we plot versus where and are scaled to have the same norm as
, the latent representation of a fixed test image. For DCGAN, we plot the loss landscape versus two pairs of random directions. For Glow, we plot the loss landscape versus a pair of random directions and a pair of directions that linearly interpolate in latent space betweenand another test image. With the GAN prior, some random latent directions lead to highly irregular landscapes. In contrast, for both random and interpolating directions in latent space, the Glow prior exhibits a smoother objective. These landscapes help explain the observation that DCGAN is sensitive toward its random initialization, where as the Glow prior is not.
Finally, we observe that the Glow prior is much more robust to out-of-distribution examples than the GAN Prior. Figure 7 shows recovered images using (1) for compressive sensing for images not belonging to the CelebA dataset. DCGAN’s performance reveals biases of the underlying dataset and limitations of low-dimensional modeling. For example, projecting onto the CelebA-trained DCGAN causes incorrect skin tones, genders, and ages. It’s performance on out-of-distribution images is poor. In contrast, the Glow prior mitigates this bias, even demonstrating image recovery for natural images that are not representative of the CelebA training set, including people who are older, have darker skin tones, wear glasses, have a beard, or have unusual makeup. The Glow prior’s performance also extends to significantly out-of-distribution images, such as animated characters and natural images unrelated to faces. See the supplemental material for additional experiments.
In inpainting, one is given a masked image of the form , where is a masking matrix with binary entries and is an n-pixel image. The goal is to find . We could rewrite (1) as
There is an affine space of images consistent with the measurements, and an algorithm must select which is most natural. As before, using the minimizer , the estimated image is given by . Our experiments reveal the same story as for compressed sensing. If the initialization for an invertible prior has a small or zero latent representation, then the empirical risk formulation with exhibits high PSNRs on test images. Algorithmic regularization is again occurring due to initialization. In contrast, DCGAN is limited by its representation error. See Figure 8, and the supplemental materials for more results, including visually reasonable face inpainting, even for out-of-distribution human faces.
4 Related Work
The idea of analyzing inverse problems with invertible neural networks has appeared in [Ardizzone et al., 2018]. The authors study estimation of the complete posterior parameter distribution under a forward process, conditioned on observed measurements. Specifically, the authors approximate a particular forward process by training an invertible neural network. The inverse map is then directly available. In order to cope with information loss, the authors augment the measurements with additional variables. This work differs from our work because it involves training a separate net for every particular inverse problem. For example, inpainting problems with separate masks would require being trained separately, and a net trained for inpainting could not be used for denoising. In contrast, our work studies how to use a pretrained invertible generator of a particular signal class for a variety of inverse problems not known at training time. Training invertible networks is challenging and computationally expensive; hence, it is desirable to separate the training of an off-the-shelf invertible models from potential applications in a broad variety of scientific domains.
We have demonstrated that pretrained generative invertible models can be used as natural signal priors in imaging inverse problems. Their strength is that every desired image is in the range of an invertible model, and the challenge that they overcome is that every undesired image is also in the range of the model. We study a regularization for empirical loss minimization that promotes recovery of images that have high likelihood under the generative model. We demonstrate that the invertible prior can quantitatively and qualitatively outperform BM3D at denoising. Additionally, it has lower recovery errors than Lasso across all levels of undersampling, and it can get comparable errors from 10-20x fewer measurements, which is a 2x reduction from [Bora et al., 2017]. Our invertible model trained, yields significantly better reconstructions than Lasso even on out-of-distribution images, including images with rare features of variation within the training set, and on unrelated natural images.
The lack of representation error of invertible nets presents a significant opportunity for imaging with a learned prior. Any image is potentially recoverable, even if the image is significantly outside of the training distribution. In contrast, methods based on projecting onto an explicit low-dimensional representation of a natural signal manifold will have representation error, perhaps due to modeling assumptions, mode collapse, or bias in a training set. Such methods will see performance prematurely saturate as the number of measurements increases. In contrast, an invertible prior would not see performance saturate. In the extreme case of having a full set of exact measurements, an invertible prior could recover any image exactly.
The lack of representation error of invertible models also presents a challenge for inverse problems. An image to be denoised is already in the range of the model. Similarly, in compressive sensing, there is a space of images that are consistent with undersampled linear measurements and are in the range of the generator. Fortunately, invertible nets come with direct estimates of the likelihood of images, which can be used for regularization. Because of their training, invertible nets attempt to assign high likelihoods to natural images under a Gaussian prior on latent space. Hence, images that are more natural are expected to have smaller norm in latent space.
Experiments verify that natural images have smaller latent representations than unnatural images. In the supplemental materials, we show that adding noise to natural images increases the norm of their latent representations, and that higher noise levels result in larger increases. Additional evidence is that random perturbations in image space induce larger changes in than comparable natural perturbations in image space. Figure 9 shows a plot of the norm of the change in image space, averaged over 1000 test images, as a function of the size of a perturbation in latent space. Natural directions are given by the interpolation between the latent representation of two images. This difference in sensitivity indicates that the optimization algorithm might obtain a larger decrease in by an image modification that reduces unnatural image components than by a correspondingly large modification in a natural direction.
We have demonstrated that invertible generators can be regularized by direct penalization of the norm of latent image representations. This observation is perhaps surprising because when applied to GANs it exacerbates the representation error, as visible in Figure 2 and as reported in [Athar et al., 2018]. We suspect the regularization is effective for invertible priors because of the high dimensionality of latent space and the small fraction of perturbations that correspond to natural image alterations.
Surprisingly, it is possible to regularize an invertible model merely by initialization of an optimization procedure at a latent image representation . Without direct penalization of , the optimization algorithm could in principle find any image consistent with the measurements. Our experiments show that this failure mode does not happen when initialized at small . When solving compressive sensing via (1) with , we observe that larger latent initializations result in larger latent estimates and lower PSNRs, as shown in Figure 5. We note that this effect persists for both the LBFGS and Adam solvers. Over the course of solving (1), the algorithm finds latent representations that are monotonically increasing with respect to iteration number. It appears that a reasonable way to incentivize small is thus to initialize with a small , and in particular .
It is natural to wonder which images can be effectively recovered using an invertible prior trained on a particular signal class. As expected, we see the best reconstruction errors on in-distribution images and performance degrades as images get further out-of-distribution. Nonetheless, we observe that reconstruction errors of unrelated natural images are still of higher quality than with the Lasso. It appears that the invertible generator learns some general attributes of the class of natural images. We suspect that an invertible prior will be effective at recovering any natural image for which the model assigns a higher likelihood than other feasible images. This leads to several questions: when a generative invertible net is trained, how far out-of-distribution can an image be while maintaining a high likelihood? How do invertible nets learn useful statistics of natural images? Is that due primarily to training? Or is there architectural bias toward natural images, as suggested by the Deep Image Prior and Deep Decoder [Ulyanov et al., 2018, Heckel and Hand, 2018]?
The results of this paper provide further evidence that reducing representational error of generators can significantly enhance the performance of generative models for inverse problems in imaging. This idea was also recently explored in [Athar et al., 2018]
, where the authors trained a GAN-like prior with a high-dimensional latent space. The high dimensionality of this space lowers representational error, though it is not zero. In their work, the high-dimensionality latent space had a structure that was difficult to directly optimize, so the authors successfully modeled latent representations as the output of an untrained convolutional neural network whose parameters are estimated at test time. Their paper and ours raises several questions: Which generator architectures provide a good balance between low representation error, ease of training, and ease of inversion? Should a generative model be capable of producing all images in order to perform well on out-of-distribution images of interest? Are there cheaper architectures that perform comparably? These questions are quite important, as solving (1) in our 6464 experiments took 15 GPU-minutes. New developments are needed on architectures and frameworks in between low-dimensional generative priors and fully invertible generative priors. Such methods could leverage the strengths of invertible models while being much cheaper to train and use.
- [Ardizzone et al., 2018] Ardizzone, L., Kruse, J., Wirkert, S., Rahner, D., Pellegrini, E. W., Klessen, R. S., Maier-Hein, L., Rother, C., and Köthe, U. (2018). Analyzing inverse problems with invertible neural networks. arXiv preprint arXiv:1808.04730.
- [Asim et al., 2018] Asim, M., Shamshad, F., and Ahmed, A. (2018). Blind image deconvolution using deep generative priors. arXiv preprint arXiv:1802.04073.
- [Athar et al., 2018] Athar, S., Burnaev, E., and Lempitsky, V. (2018). Latent convolutional models. arXiv preprint arXiv:1806.06284.
[Bora et al., 2017]
Bora, A., Jalal, A., Price, E., and Dimakis, A. G. (2017).
Compressed sensing using generative models.
Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 537–546. JMLR. org.
[Chen et al., 2018]
Chen, Y., Shi, F., Christodoulou, A. G., Xie, Y., Zhou, Z., and Li, D. (2018).
Efficient and accurate mri super-resolution using a generative adversarial network and 3d multi-level densely connected network.In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 91–99. Springer.
- [Dabov et al., 2007] Dabov, K., Foi, A., and Egiazarian, K. (2007). Video denoising by sparse 3d transform-domain collaborative filtering. In 2007 15th European Signal Processing Conference, pages 145–149. IEEE.
- [Dinh et al., 2016] Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2016). Density estimation using real nvp. arXiv preprint arXiv:1605.08803.
[Gomez et al., 2017]
Gomez, A. N., Ren, M., Urtasun, R., and Grosse, R. B. (2017).
The reversible residual network: Backpropagation without storing activations.In Advances in neural information processing systems, pages 2214–2224.
- [Goodfellow et al., 2014] Goodfellow, I. J., Pouget-Abadie, J., Mirza, Mehdi; Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial networks. arXiv:1406.2661.
- [Hand et al., 2018] Hand, P., Leong, O., and Voroninski, V. (2018). Phase retrieval under a generative prior. In Advances in Neural Information Processing Systems, pages 9136–9146.
- [Hand and Voroninski, 2017] Hand, P. and Voroninski, V. (2017). Global guarantees for enforcing deep generative priors by empirical risk. arXiv preprint arXiv:1705.07576.
- [Heckel and Hand, 2018] Heckel, R. and Hand, P. (2018). Deep decoder: Concise image representations from untrained non-convolutional networks. arXiv preprint arXiv:1810.03982.
- [Jacobsen et al., 2018] Jacobsen, J.-H., Smeulders, A., and Oyallon, E. (2018). i-revnet: Deep invertible networks. arXiv preprint arXiv:1802.07088.
- [Karras et al., 2017] Karras, T., Aila, T., Laine, S., and Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.
- [Kingma and Ba, 2014] Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- [Kingma and Dhariwal, 2018] Kingma, D. P. and Dhariwal, P. (2018). Glow: Generative flow with invertible 1x1 convolutions. In Advances in Neural Information Processing Systems, pages 10215–10224.
- [Kingma and Welling, 2013] Kingma, D. P. and Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
[Liu et al., 2015]
Liu, Z., Luo, P., Wang, X., and Tang, X. (2015).
Deep learning face attributes in the wild.
Proceedings of the IEEE international conference on computer vision, pages 3730–3738.
- [Minaee and Abdolrashidi, 2018] Minaee, S. and Abdolrashidi, A. (2018). Finger-gan: Generating realistic fingerprint images using connectivity imposed gan. arXiv preprint arXiv:1812.10482.
- [Nilsback and Zisserman, 2008] Nilsback, M.-E. and Zisserman, A. (2008). Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pages 722–729. IEEE.
- [Radford et al., 2015] Radford, A., Metz, L., and Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
- [Rezende et al., 2014] Rezende, D. J., Mohamed, S., and Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082.
- [Shin et al., 2018] Shin, H.-C., Tenenholtz, N. A., Rogers, J. K., Schwarz, C. G., Senjem, M. L., Gunter, J. L., Andriole, K. P., and Michalski, M. (2018). Medical image synthesis for data augmentation and anonymization using generative adversarial networks. In International Workshop on Simulation and Synthesis in Medical Imaging, pages 1–11. Springer.
[Ulyanov et al., 2018]
Ulyanov, D., Vedaldi, A., and Lempitsky, V. (2018).
Deep image prior.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9446–9454.
- [Wah et al., 2011] Wah, C., Branson, S., Welinder, P., Perona, P., and Belongie, S. (2011). The Caltech-UCSD Birds-200-2011 Dataset. (CNS-TR-2011-001).
6.1 Experimental Setup
Simulations were completed mainly on CelebA-HQ dataset, used in [Kingma and Dhariwal, 2018]; it has 30,000 color images that were resized to for computational reasons, and were split into 27,000 training and 3000 test images. We also provide some additional experiments on the Flowers [Nilsback and Zisserman, 2008], and Birds [Wah et al., 2011] datasets. Flowers dataset contains 8189 color images resized to out of which images are spared for testing. Birds dataset contains a total of 11,788 images, which were center aligned and resized to out of which images are set aside for testing.
We specifically model our invertible networks after the recently proposed Glow [Kingma and Dhariwal, 2018] architecture, which consists of a multiple flow steps. Each flow step comprises of an activation normalization layer, a convolutional layer, and an affine coupling layer, each of which is invertible. Let be the number of steps of flow before a splitting layer, and be the number of times the splitting is performed. To train over CelebA, we choose the network to have , and affine coupling, and train it with a learning rate , and a batch size at resolution . The model was trained over bit images with 10,000 warmup iterations as in [Kingma and Dhariwal, 2018], but when solving inverse problems using Glow original bit images were used. We refer the reader to [Kingma and Dhariwal, 2018] for specific details on the operations performed in each of the network layer.
We use LBFGS to solve the inverse problem. For best performance, we set the number of iterations and learning rate for denoising, compressed sensing, and inpainting to be , ; , ; and ,
; respectively. we use Pytorch to implement Glow network training and solve the inverse problem. Glow training was conducted on a single Titan Xp GPU using a maximum allowable (under given computational constraints) batch size of 6. In case of CS, recovering a single image on Titan Xp using LBFGS solver withsteps takes seconds ( minutes). However, we can solve inverse problems in parallel on the given hardware platform.
Unless specified otherwise, inverse problem under Glow prior is always initialized with . Whereas under DCGAN prior, we initialize with and report average over three random restarts. In all the quantitative experiments over, the reported quality metrics such as PSNR, and reconstruction errors are averaged over 12 randomly drawn test set images.
6.2 Denoising: Additional Experiments
We present additional quantitative experiments on image denoising here. Complete set of experiments on average PSNR over 12 CelebA (within distribution444The redundant ’within distribution’ phrase is added to emphasize that the test set images are drawn from the same distribution as the train set. We do this to avoid confusion with the out-of-distribution recoveries also presented in this paper.) test set images versus penalization parameter under noise levels and are presented in Figure 11 below. The central message is that Glow prior outperforms DCGAN prior uniformly across all due to the representation limit of DCGAN. In addition, striking the right balance between the misfit term and the penalization term by appropriately choosing improves the performance of Glow, and it also approaches state-of-the-art BM3D algorithm at low noise levels, and clearly visible in higher noise, for example, at a noise level of , the Glow prior improves upon BM3D by 2dB. Visually the results of Glow prior are clearly even superior to BM3D recoveries that are generally blurry and over smoothed as can be spotted in the qualitative results below. To avoid fitting the noisy image using the Glow model, we force the recoveries to be natural by choosing large enough .
Recall that we are solving a regularized empirical risk minimization program
In general, one can instead solve , where is a monotonically increasing function. Figure 12 shows the comparison of most common choices of linear (already used in the rest of the paper), and quadratic in the context of densoing. We find that the highest achievable PSNR remains the same in both the cases, however, the penalization parameter has to be adjusted accordingly.
We also trained Glow model on Flowers dataset. Below we present its qualitative denoising performance against BM3D on the test set Flowers images. We also show the effect of varying — smaller leads to overfitting and vice versa.
6.3 Compressed Senisng: Additional Experiments
Some additional quantitative image recovery results on test set of CelebA dataset are presented in Figure 16; it depicts the comparison of Glow prior, DCGAN prior, LASSO-DCT, and LASSO-WVT at compressed sensing. We plot the reconstruction error : = , where is the recovered image and is the number of pixels in the CelebA images. Glow uniformly outperforms DCGAN, and LASSO across entire range of the number of measuremnts. LASSO-DCT and LASSO-WVT eventually catch up to Glow but only when observed measurements are a significant fraction of the total number of pixels. On the other hand, DCGAN is initially better than LASSO but prematurely saturates due to limited representation capacity.
Recall that the natural face images correspond to smaller . In Figure 20, we plot the norm of the latent codes of the iterates of each algorithm vs. the number of iterations. The central message is that initializing with smaller norm tends to yield natural (smaller latent representations) recoveries. This is one explanation as to why in compressed sensing, one is able to obtain the true solution out of the affine space of solutions without penalizing the unnaturalness of the recoveries.
We now present visual recovery results on test images from the CelebA dataset under varying number of measurements in compressed sesing. We compare recoveries under Glow prior, DCGAN prior, LASSO-DCT, and LASSO-WVT.
6.3.1 Compressed Sensing on Flower and Bird Dataset
We also performed compressed sensing experiments similar to those on CelebA dataset above on Birds dataset, and Flowers dataset. We trained a Glow invertible network for each dataset, and present below the quantitative and qualitative recoveries for each dataset.
6.3.2 Compressed Sensing on Out of Distribution Images
Lack of representation error in invertible nets leads us to an important and interesting question: does the trained network fit related natural images that are underrepresented or even unrepresented in the training dataset? Specifically, can a Glow network trained on CelebA faces be a good prior on other faces; for example, those with dark-skin tone, faces with glasses or facial hair, or even animated faces? In general, our experiments show that Glow prior has an excellent performance on such out-of-distribution images that are semantically similar to celebrity faces but not representative of the CelebA dataset. In particular, we have been able to recover faces of darker skin tone, older people with beards, eastern women, men with hats, and animated characters such as Shrek, from compressed measurements under the Glow prior. Recoveries under the Glow prior convincingly beat the DCGAN prior, which shows a definite bias due to training. Not only that, the Glow prior also outperforms unbiased methods such as LASSO-DCT, and LASSO-WVT.
Can we expect the Glow prior to continue to be an effective proxy for arbitrarily out-of-distribution images? To answer this question, we tested arbitrary natural images such as car, house door, and butterfly wings that are semantically unrelated to CelebA images. In general, we found that Glow is an effective prior at compressed sensing of out-of-distribution natural images, which are assigned a high likelihood score (small normed latent representations). On these images, Glow also outperforms LASSO.
Recoveries of natural images that are assigned very low-likelihood scores by the Glow model generally run into instability issues. During training, invertible nets learn to assign high likelihood scores to the training images. All the network parameters such as scaling in the coupling layers of Glow network are learned to behave stably with such high likelihood representations. However, on very low-likelihood representations, unseen during the training process, the networks becomes unstable and outputs of network begin to diverge to very large values; this may be due to several reasons, such as normalization (scaling) layers not being tuned to the unseen representations. An LBFGS search for the solution of an inverse problem to recover a low-likelihood image leads the iterates into neighborhoods of low-likelihood representations that may lead the network to instability.
We find that Glow network has the tendency to assign higher likelihood scores to arbitrarily out-of-distribution natural images. This means that invertible networks have at least partially learned something more general about natural images from CelebA dataset — may be some high level features that face images share with other natural images such as smooth regions followed by discontinuities, etc. This allows Glow prior to extend its effectiveness as a prior to other natural images beyond just the training set.