1 Introduction
Recently the field of inverse problems has seen a growing usage of mathematically only partially understood learned (e.g. Lunz et al. (2018); Adler and Öktem (2018); Hauptmann et al. (2018); Yang et al. (2018); Bora et al. (2017)) and nonlearned (e.g. Venkatakrishnan et al. (2013); Romano et al. (2017); Ulyanov et al. (2018); Veen et al. (2018); Mataev et al. (2019); Dittmer et al. (2018)) priors.
A key challenge of many of these approaches, especially the learned ones, lies in the fact that it is often hard or even impossible to guarantee data consistency for them i.e. that the reconstruction is consistent with the measurement. More formally: Given a continuous forward operator and a noisy measurement , where some noise, could a computed reconstruction have created the measurement , given the noise level . We call the set of reconstructions that fulfill this property the set of valid solutions and denote it by .
In this paper we present an approach to reconstruction problems that guarantees data consistency by design. We start by reexamining the core challenge of reconstruction problems. We then use the gained insights to propose the projectional method (PM) as a very general framework to tackle inverse problems. A key challenge of the PM lies in the calculation of the projection into the set of valid solutions, . We analyze and derive a projection algorithm to calculate
(1) 
for being a linear continuous operator. This class of operators includes a wide variety of forward operators like blurring, the Radon transform which plays a prominent role in medical imaging, denoising (the identity), compressed sensing, etc., see e.g. Engl et al. (1996); Eldar and Kutyniok (2012). We also compute the derivative, , of via the implicit function theorem, thereby allowing for the usage of as a layer within an endtoend trained model.
We then, in Section 3, demonstrate how the PM not only compares to the recently proposed Regularization by Denoising (RED) (Romano et al. (2017)), but also how RED can be seen as a relaxation of PM. In Section 4 we demonstrate how one can apply the projectional approach to Deep Image Prior (DIP) (Ulyanov et al. (2018)) to avoid the often observed degradation of its reconstructions over time, alleviating the need for early stopping. Finally, in Section 5
, we present a novel neural network architecture based on the PM. The architecture allows for a principled incorporation of prior knowledge from data, while still guaranteeing data consistency – despite an endtoend training. Summary of contributions in order:

A projectional approach to reconstruction that guarantees data consistency.

The derivation of a neural network layer that projects into the set of valid solutions.

Interpretation of RED as an approximation to the projectional method (PM).

Numerical comparison/application of the approach to RED and DIP.

A novel neural network architecture that allows for a principled incorporation of prior knowledge, while still guaranteeing data consistency.
2 A Projectional Ansatz
2.1 Motivation and Idea
We begin with some definitions and notation which we will use throughout the paper:

Let and denote real Hilbert spaces and a continuous linear operator.

For a given we define .

Let further be a given noise level and be a noisy measurement, where the classical assumption is with some random noise (Engl et al. (1996)).

Let be a nonempty set, called the set of plausible solutions (e.g. sparse or smooth elements or even natural images). For all measurements we will assume .

As discussed above, we informally define the set of valid solutions as the set such that the elements of would “explain” the measurement , given the operator and the noise level . A formal discussion will follow in Section 2.2.
Given these definitions we can define the central object of this paper: the set , which we call the set of valid and plausible solutions. The main goal of this paper is to find an element of , which we assume to be nonempty.
If we assume and to be closed and convex or to have one of many much weaker properties, see Gubin et al. (1967); Bauschke and Borwein (1993); Lewis and Malick (2008); Lewis et al. (2009); Drusvyatskiy et al. (2015), etc., we can use von Neumann’s alternating projection algorithm (Bauschke and Borwein (1993)) to accomplish the task of finding an element in . The algorithm is given by the alternating projections into the two sets and and returns in a point in their intersection, i.e.
(2) 
for all , we always simply set . For a visual representation of the algorithm see Figure 7 in the appendix. In this paper we use this alternating pattern to tackle reconstruction problems. Therefore we rely on having the projections and . Since is task dependent, we begin by analyzing the set of valid solutions and its projection .
2.2 The Set of Valid Solutions
In this subsection we motivate a definition of the set of valid solutions and analyze it. The classical assumption for inverse problems is that one has for some noise and a noise level . This could motivate the definition of the set of valid solutions as
(3) 
But, considering the fact that is usually assumed to be “unstructured noise” (e.g. Gaussian) and is usually assumed to be high (e.g. images) or even infinitedimensional (e.g. functions), we can utilize the principle of concentration of measure (Talagrand (1996)) to conclude that , with increasing accuracy for higher dimensions. This motivates us to define
(4) 
Note that this set is closed and convex since is continuous. Furthermore, since is linear, is also convex. This makes a projection onto well defined, as well as onto up to a null set. We can rewrite the set as
(5) 
where is positive semidefinite. If was a positive definite operator could be characterized by a hyperellipsoid. The ”semi“ causes it to be a product space of a dimensional hyperellipsoid with a
dimensional vector space
– think of an elliptical dimensional cookie cutter cutting a dimensional cookie, for an illustration see Figure 7 in the appendix.We now discuss how to calculate the projection and turn it into a neural network layer.
2.3 Projection into the Valid Solutions as a Layer
In this subsection we first discuss how to calculate and then how to calculate its derivative. The calculation of its derivative is of interest, since we want to use the projection within a neural network. It would not be feasible to calculate the derivative via some automatic differentiation package, since the calculation of the projection is iterative, which could quickly cause the process to exceed the memory of the machine.
We begin by deriving an algorithm to calculate the projection into i.e. how to solve the constrained optimization problem given by Expression (1). We can use the method of Lagrangian multipliers to rewrite the expression as the optimization problem
(6) 
where
(7) 
Since not only depends on but also on via the equality
(8) 
we propose the following alternating algorithm:

Set an initial .

Solve .

Adjust via Newton step on .

Repeat 1. and 2. until some stopping criterion is reached.
To solve step 1. we can calculate
(9) 
which leads to
(10) 
This can be nicely solved via the efficient conjugate gradient method (CGmethod) (Hestenes and Stiefel (1952)), which does not have to calculate or even hold it in memory. This can, especially for shortfat matrices (like used in compressed sensing (Eldar and Kutyniok (2012))), be a significant computational advantage.
To solve step 2. we need to calculate
(11) 
where we can set and use
(12) 
to calculate
Also all these applications of can efficiently be calculated with the CGmethod. The above calculations can be used to flesh out the projection algorithm in more detail, see Algorithm 1. In practice we stop the computation for being , where the current reconstruction. For a complexity analysis of the algorithm see Figure 15 in the appendix.
The algorithm allows us, given the projection , to solve the inverse problem in the sense of finding a valid and plausible solution via Expression (2).
To use
as a layer in a neural network and to incorporate it in a backpropagation training process we have to be able to calculate its derivative. As already mentioned, using automatic differentiation may not be feasible, due to the possibility huge memory requirements caused by the iterative nature of Algorithm
1 (since this effectively would have to be realized via several subsequent layers).To overcome this problem we now utilize the implicit function theorem (Krantz and Parks (2012)) to calculate the derivative of in a way that is also agnostic of the specific algorithm used to calculate the projection itself. This allows the backpropagation procedure to ignore the inner calculations of and treat it essentially as a black box procedure. To be more specific, given and , we want to calculate .
Since, due to Expression (9), we have
(13)  
(14) 
we can calculate the relevant via
(15) 
completely agnostic of the specific algorithm used in the forward propagation to calculate .
We can now use the implicit function theorem and, considering Expression (14), obtain
(16) 
We are now able to calculate , which is necessary of the backpropagation, where is the gradient flowing downwards from the layer above again – like in Algorithm 1 – this can be done efficiently via the CGmethod, since is symmetric and positively definite.
2.4 Interpretation
In this subsection we discuss interpretations of the PM and its parts and how it relates to variations of the classical regularization
(17) 
for some regularization parameter and some linear continuous operator , often simply (Engl et al. (1996)).
We start by interpreting in the context. Usually the regularization parameter is chosen according to the Morozov’s Discrepancy Principle (Morozov (2012)), which, in its classical form states that should be chosen such that (Scherzer (1993))
(18) 
This leads to the nice interpretation that, for , we have
(19) 
i.e. is chosen according to Morozov.
A further similarity to the regularization becomes apparent, when stating the PM via the expression
(20) 
where . This looks similar to a nonlinear version of the nonstationary iterated Tikhonov regularization which is given via
(21) 
where also and some predetermined sequence (Hank and Groetsch (1998)).
3 Relation to Regularization by Denoising (RED)
In this section we want to discuss the connection of the PM with Regularization by Denoising (RED) (Romano et al. (2017)). It could be argued, that the RED (as a direct “descendent” of PlugandPlay priors (Venkatakrishnan et al. (2013))) and Deep Image Prior (DIP), which we discuss in Section 4, are two of the most relevant recent approaches to regularization.
RED is given by the minimization of the somewhat strange, for technical reasons chosen, functional
(22) 
where is an arbitrary denoiser such that

is (locally) positively homogeneous of degree , i.e. for and

is strongly passive, i.e. the spectral radius of .
Both of these properties one associates with linear projections.
An algorithm to solve the minimization problem is given in Romano et al. (2017) and can be expressed via
(23) 
for an initial and some fixed .
This means that, if we associate with , RED as stated in Expression (23), can be seen as a relaxation (setting constant) of the PM, as stated in Expression (20)).
We compared RED and PM of the course of the reconstruction process, see Figure 1, where we use the BM3D denoiser (Dabov et al. (2006)) as and as an approximation for . For our numerical experiments we set to be the (underdetermined) Radon transform (Helgason and Helgason (1999)) with angles, use the classical SheppLogan phantom (Shepp and Logan (1974)) as the ground truth and set our noise level to , and of the norm of . We ran each of the experiment times (different noise), with similar results to the ones shown in Figure 1. We see that the PM does not outperform RED for all noise levels in the Peak signaltonoise ratio (PSNR), but in the Structural Similarity Index (SSIM) and also in the error of vs , where the reconstruction. Additionally the reconstruction process runs smoother for the PM than for RED and does not seem to degrade at high noise levels over time. The regularization parameter for RED were determined via an exponential line search for the best final reconstruction. The PM required no parameter. The iterations for RED and for the PM took both on the order of one hour on an Intel i7 CPU with 2.6GHz. You can find the last reconstructions for a noise level of in Figure 3 and comprehensive comparison of reconstructions in in the appendix, Figure 9, 10 and 11.
4 Application to Deep Image Prior (DIP)
In this section we discuss how one can implement the central idea of finding an element in for the case where is given by all possible reconstructions from a Deep Image Prior (DIP).
The idea of using DIP to solve inverse problems, as described by Veen et al. (2018), is to minimize the functional
(24) 
with regard to , where is an untrained neural network with a fixed random input and parameters . The reconsturction is then given via , i.e. the output of the network.
We propose the following simple modification to the DIP functional based on the idea to find a valid and plausible reconstruction. Specifically we propose
(25) 
Here the reconstruction is given by after the minimization.
We now numerically compare this modified functional with in the same setup as used in Section 3 for RED and the PM based on BM3D. As the DIP network we use the “skip net” UNet described in the original DIP paper (Ulyanov et al. (2018)) and Adam (Kingma and Ba (2014)) with standard settings (of PyTorch), which we found to work best for the DIP reconstructions. The results can be found in Figure 4. Like in Section 3 we compare the PSNR, SSIM and , where is the noise free measurement of the ground truth and the noise free measurement of the reconstruction. We plot the reconstruction over time, not over the number of iterations, to clearly show that even over relativly long time scales the solutions of do not exhibit signs of much deterioration. We ran the experiments in parallel on two Nvidia GeForce GTX 1080 ti. You can find the last reconstructions for a noise level of in Figure 3 and comprehensive comparison of reconstructions in in the appendix, Figure 12, 13 and 14. All experiments reached approximately iterations over the course of the hours of the experiment. We find that the modified functional shows much less deterioration of the reconstruction overtime and outperforms the vanilla DIP (except for short spikes) in all our metrics.
5 von Neumann Projection Architecture
In this section we discuss how one can learn, in an endtoend fashion, a replacement for , where an arbitrary set. For this we heavily rely on the fact that we can calculate the gradients of and can therefore use it as a layer (see Section 2.3).
Based on Expression (2) we propose the following neural network architecture:
(26) 
where the are neural networks, e.g. UNets (Ronneberger et al. (2015)
) or autoencoders (
Ng (2011)), for simplicity we set .This type of alternating architecture, which we call von Neumann projection architecture (vNPA), allows one to use the PM, while incorporating prior information from data in a highly customized manner. This is in stark contrast to the use of usually quite general PlugandPlay priors (Venkatakrishnan et al. (2013); Sreehari et al. (2016)), which are not specifically adapted to the reconstruction task. Unlike other endtoend approaches this architecture, using , guarantees the data consistency of the reconstruction.
We now demonstrate the approach on a toy example. For that we set to be a set of low frequency functions , which for some random turn continuously into linear functions such that , see Figure 8 in the appendix for example plots and Python code that creates random instances of these functions in the form of vectors of the length . One could try to design a handcrafted prior for these arbitrarily chosen functions, but it would be easier to simply learn one. As the forward operator,
, we use a typical compressed sensing operator i.e. the entries are samples from a standard Gaussian distribution.
We use the vNPA as described in Expression (26), with . Each is a separate vanilla autoencoder with the widths
, where all layers up to the last one are vanilla ReLU layer, the last one being a vanilla Sigmoid layer. We trained it with a batch size of
and Adam (Kingma and Ba (2014)) at learning rate of for batches over the course of three hours on an Nvidia GeForce GTX 960M.Example reconstructions via the network compared to reconstructions (the regularization parameter optimized such that it minimizes the error to the ground truth) can be found in Figure 5. Each took less than half a second to compute. The network produces clearly better reconstruction than the optimized regularization.
6 Conclusion
We proposed a projectional approach of optimizing for an element that is valid and plausible given the operator, measurement and noise level. We find that the approach is fruitful and widely applicable. Specifically we demonstrated its applicability on the one hand via very general PlugandPlay priors like BM3D and DIP and on the other hand via highly task specific learned priors via the von Neumann projection architecture.
We also show how the approach can be connected to the well studied iterated Tikhonov reconstructions, how it allows for an interpretation of the somewhat strange, but highly effective RED functional and how it can be used to stabilize and improve DIP reconstructions.
References
 Adler and Öktem [2018] J. Adler and O. Öktem. Learned primaldual reconstruction. IEEE transactions on medical imaging, 2018.
 Anonymous [2019] Anonymous. Projection layer implementation, 2019.
 Bauschke and Borwein [1993] H. H. Bauschke and J. M. Borwein. On the convergence of von neumann’s alternating projection algorithm for two sets. SetValued Analysis, 1993.

Bora et al. [2017]
A. Bora, A. Jalal, E. Price, and A. G. Dimakis.
Compressed sensing using generative models.
In
Proceedings of the 34th International Conference on Machine LearningVolume 70
. JMLR. org, 2017.  Dabov et al. [2006] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising with blockmatching and 3d filtering. In Image Processing: Algorithms and Systems, Neural Networks, and Machine Learning. International Society for Optics and Photonics, 2006.
 Dittmer et al. [2018] S. Dittmer, T. Kluth, P. Maass, and D. Oterio Baguer. Regularization by architecture: A deep prior approach for inverse problems. arXiv preprint arXiv:1812.03889, 2018.
 Drusvyatskiy et al. [2015] D. Drusvyatskiy, A. D. Ioffe, and A. D. Lewis. Transversality and alternating projections for nonconvex sets. Foundations of Computational Mathematics, 2015.
 Eldar and Kutyniok [2012] Y. C. Eldar and G. Kutyniok. Compressed sensing: theory and applications. Cambridge University Press, 2012.
 Engl et al. [1996] H. W. Engl, M. Hanke, and A. Neubauer. Regularization of inverse problems. Springer Science & Business Media, 1996.
 Gubin et al. [1967] L. G. Gubin, B. T. Polyak, and E. V. Raik. The method of projections for finding the common point of convex sets. USSR Computational Mathematics and Mathematical Physics, 1967.
 Hank and Groetsch [1998] M. Hank and C. W. Groetsch. Nonstationary iterated tikhonov regularization. Journal of Optimization Theory and Applications, 1998.
 Hauptmann et al. [2018] A. Hauptmann, F. Lucka, M. Betcke, N. Huynh, J. Adler, B. Cox, P. Beard, S. Ourselin, and S. Arridge. Modelbased learning for accelerated, limitedview 3d photoacoustic tomography. IEEE transactions on medical imaging, 2018.
 Heath [2018] M. T. Heath. Scientific computing: an introductory survey. SIAM, 2018.
 Helgason and Helgason [1999] S. Helgason and S. Helgason. The radon transform. Springer, 1999.
 Hestenes and Stiefel [1952] M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. NBS Washington, DC, 1952.
 Kingma and Ba [2014] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 Krantz and Parks [2012] S. G. Krantz and H. R. Parks. The implicit function theorem: history, theory, and applications. Springer Science & Business Media, 2012.
 Lewis and Malick [2008] A. S. Lewis and J. Malick. Alternating projections on manifolds. Mathematics of Operations Research, 2008.
 Lewis et al. [2009] A. S. Lewis, D. R. Luke, and J. Malick. Local linear convergence for alternating and averaged nonconvex projections. Foundations of Computational Mathematics, 2009.
 Lunz et al. [2018] S. Lunz, C. Schoenlieb, and O. Öktem. Adversarial regularizers in inverse problems. In Advances in Neural Information Processing Systems, 2018.
 Mataev et al. [2019] G. Mataev, M. Elad, and P. Milanfar. Deepred: Deep image prior powered by red. arXiv preprint arXiv:1903.10176, 2019.
 Morozov [2012] V. A. Morozov. Methods for solving incorrectly posed problems. Springer Science & Business Media, 2012.
 Ng [2011] A. Ng. Sparse autoencoder. CS294A Lecture notes, 2011.
 Paszke et al. [2017] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, A. Luca, and A. Lerer. Automatic differentiation in PyTorch. In NIPS Autodiff Workshop, 2017.
 Romano et al. [2017] Y. Romano, M. Elad, and P. Milanfar. The little engine that could: Regularization by denoising (red). SIAM Journal on Imaging Sciences, 2017.
 Ronneberger et al. [2015] O. Ronneberger, P. Fischer, and T. Brox. Unet: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computerassisted intervention. Springer, 2015.
 Saad [2003] Y. Saad. Iterative methods for sparse linear systems. siam, 2003.
 Scherzer [1993] O. Scherzer. The use of morozov’s discrepancy principle for tikhonov regularization for solving nonlinear illposed problems. Computing, 1993.
 Shepp and Logan [1974] L. A. Shepp and B. F. Logan. The fourier reconstruction of a head section. IEEE Transactions on nuclear science, 1974.

Sreehari et al. [2016]
S. Sreehari, S. V. Venkatakrishnan, B. Wohlberg, G. T. Buzzard, L. F. Drummy,
J. P. Simmons, and C. A. Bouman.
Plugandplay priors for bright field electron tomography and sparse interpolation.
IEEE Transactions on Computational Imaging, 2016. 
Talagrand [1996]
M. Talagrand.
A new look at independence.
The Annals of probability
, 1996. 
Ulyanov et al. [2018]
D. Ulyanov, A. Vedaldi, and V. Lempitsky.
Deep image prior.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 2018.  Veen et al. [2018] D. Van Veen, A. Jalal, E. Price, S. Vishwanath, and A. G. Dimakis. Compressed sensing with deep image prior and learned regularization. arXiv preprint arXiv:1806.06438, 2018.
 Venkatakrishnan et al. [2013] S. V. Venkatakrishnan, C. A. Bouman B., and Wohlberg. Plugandplay priors for model based reconstruction. In 2013 IEEE Global Conference on Signal and Information Processing. IEEE, 2013.
 Yang et al. [2018] G. Yang, S. Yu, H. Dong, G. Slabaugh, P. L. Dragotti, X. Ye, F. Liu, S. Arridge, J. Keegan, and Y. Guo. Dagan: deep dealiasing generative adversarial networks for fast compressed sensing mri reconstruction. IEEE transactions on medical imaging, 2018.
Appendix
The complexity of Algorithm 1
, can simply be estimated via the complexity of the Newton method,
(Heath [2018]), where is the required precision and is the complexity of the CGmethod applied to . Assuming is the condition number of the complexity of the CGmethod is (Saad [2003]). This results in an overall complexity of(27) 