1 Introduction
In this paper, we are interested in neural network based solution of inverse problems of the form
(1) 
Here is a potentially nonlinear operator between Banach spaces and , are the given noisy data, is the unknown to be recovered, is the unknown noise perturbation and indicates the noise level. Numerous image reconstruction problems, parameter identification tasks or geophysical applications can be stated as such inverse problems [8, 27, 19, 34]. Special challenges in solving inverse problems are the nonuniqueness of the solutions and the instability of the solutions with respect to the given data. To overcome these issues, regularization methods are needed, which are used as criteria for selecting specific solutions and at the same time stabilize the inversion process.
Reconstruction with learned regularizers
One of the most established class of methods for solving inverse problems is variational regularization where regularized solutions are defined as minimizers of the Tikhonov functional [27, 17, 29]
(2) 
Here is a distance like function measuring closeness of the data, a regularization term enforcing regularity of the minimizer and is the regularization parameter. In the case that and the regularizer are defined by the Hilbert space norms, (2) is classical Tikhonov regularization for which the theory is quite complete [8, 11]
. In particular, in this case, convergence rates, which name quantitative estimates for the distance between the true and regularized solutions are well known. Convergence rates for nonconvex regularizers are derived in
[9].Typical regularization techniques are based on simple hand crafted regularization terms such as the total variation or quadratic Sobolev norms on some function space. However, these regularizers are quite simplistic and might not well reflect the actual complexity of the underlying class of functions. Therefore, recently, it has been proposed [25] and analyzed in [14]
to use machine learning to construct regularizers in a data driven manner. The strategy in
[14] is to construct a datadriven regularizer via the following consecutive steps:
[label=(T0), ref=T0]

Choose a family of desired reconstructions .

For some , construct undesired reconstructions .

Choose a class of functions (networks) .

Determine with .

Define with for some .
For imaging applications, the function class
can be chosen as convolutional neural networks which have demonstrated to give powerful classes of mappings between image spaces. The function
measures distance between a potential reconstruction and the output of the network , and possibly adds additional regularization. According to the training strategy in item (4) the value of the regularizer will be small if the reconstruction is similar to elements in and large for elements in . A simple example that we will use for our numerical results is the learned regularizer .Convergence analysis and convergence rates for NETT as well as training strategies have been established in [10, 14, 20]. A different training strategy for learning a regularizer has been proposed in [15, 18]. Note that learning the regularizer first and then minimizing the Tikhonov functional is different from variational and iterative networks [2, 3, 7, 12, 33] where an iterative scheme is applied to enroll the functional which is then trained in an end to end fashion. Training the regularizer first has the advantage of being more modular and, further, the network training is independent of the forward operator . Moreover, it enables to derive a convergence analysis as the noise level tends to zero and therefore comes with theoretical recovery guarantees.
Discrete NETT
The existing analysis of NETT considers minimizers of the Tikhonov functional (2) with regularizer of the form before discretization, typically in an infinite dimensional setting. However, in practice, only finite dimensionale approximations of the unknown, the operator and the neural network are given. To address these issues, in this paper, we study discrete NETT regularization which considers minimizers of
(3) 
Here , and are families of subspaces of , mappings and regularizers , respectively, which reflect discretization of all involved operations. We present a full convergence analysis as the noise level converges to zero and are chosen accordingly. A related convergence analysis has been presented in [24] for the case that is the norm distance in a Hilbert space and the convex regularizer is taken fixed. However, in the case of discrete NETT regularization it is natural to consider the case where the regularization depends on the discretization as regularization is learned in a discretized setting based on actual data.
Outline
The convergence analysis including convergence rates is presented in Section 2. In Section 3 we will present numerical results for a nonstandard limited data problem in photoacoustic tomography that can be considered as simultaneous inpainting and artifact removal problem. We conclude the paper with a short summary and conclusion presented in Section 4.
2 Convergence analysis
In this section we study the convergence of (3) and derive convergence rates.
2.1 Wellposedness
First we state the assumptions that we will use for wellposedness (existence and stability of minimizing NETT).
Assumptions 2.1 (Conditions for wellposedness).

[label=(W0), ref=W0]

, are Banach spaces, reflexive, weakly sequentially closed.

The distance measure satisfies

.

.

.

.

is weakly sequentially lower semicontinuous (wslsc).


is proper and wslsc.

is weakly sequentially continuous.

is nonempty and bounded.

is a sequence of subspaces of .

is a family of weakly sequentially continuous .

is a family of proper wslsc regularizers .

is nonempty and bounded.
Conditions (2)(5) are quite standard for Tikhonov regularization in Banach spaces to guarantee the existence and stability of minimizers of the Tikhonov functional and the given conditions are similar to [9, 10, 14, 20, 23, 27, 30]. In particular, (2) describes the properties that the distance measure should have. Clearly, the norm distance on fulfills these properties. Item (2c) is the continuity of while (2d) considers the continuity of at . While (2c) is not needed for existence and convergence of NETT it is required for the stability result as shown in [20, Example 2.7]. Assumption (5) is a coercivity condition; see [14, Remark 2.4f.] on how to achieve this for a regularizer defined by neural networks. Note that for convergence and convergence rates we will require additional conditions that concern the discretization of the reconstruction space, the forward operator and regularizer.
The references [9, 14, 20, 23] all consider general distance measures and allow nonconvex regularizers. However, existence and stability of minimizing (2) are shown under assumptions slightly different from (1)(5). Below we therefore give a short proof of the existence and stability results.
Theorem 2.2 (Existence and Stability).
Proof.
Since (1), (6)(9) for when are fixed give the same assumption as (1), (3)(5) for the nondiscrete counterpart , it is sufficient to verify (1), (2) for the latter. Existence of minimizers follows from (1), (2e), (3)(5), because these items imply that the is a wslsc coercive functional defined on a nonempty weakly sequentially closed subset of a reflexive Banach space. To show stability one notes that according to (2a) for all we have
According to (2c), (2d), (5) there exists such that the right hand side is bounded, which by (5) shows that has a weak accumulation point. Following the standard proof [27, Theorem 3.23] shows that weak accumulation points satisfy the claimed properties. ∎
In the following we write for minimizers of . For we call an minimizing solution of .
Lemma 2.3 (Existence of minimizing solutions).
Let Assumption 2.1 hold. For any an minimizing solution of exists. Likewise, if and an minimizing solution of exists.
Proof.
Again is is sufficient the verify the claim for minimizing solution. Because , the set is nonempty. Hence we can choose a sequence in with . Due to (2b), is contained in for some which is bounded according to (5). By (1) is reflexive and therefore has a weak accumulation point . From (1), (4), (3) we conclude that is an minimizing solution of . The case of minimizing solutions follows analogous. ∎
2.2 Convergence
Next we proof that discrete NETT converges as the noise level goes to zero and the discretization as well as the regularization parameter are chosen properly. We write and formulate the following approximation conditions for obtaining convergence.
Assumptions 2.4 (Conditions for convergence).
Element satisfies the following for all :

[label=(C0), ref=C0]

with .

.

.

.
Conditions (1) and (3) concerns the approximation of the true unknown with elements in the discretization space, that is compatible with the discretization of the forward operator and regularizer. Conditions (2) and (4) are uniform approximation properties of the operator and the regularizer on bounded sets.
Theorem 2.5 (Convergence).
Let (1)(9) hold, and let be an minimizing solution of that satisfies (1)(4). Moreover, suppose converges to zero and satisfies . Choose and such that as we have
(4)  
(5)  
(6) 
Then for the following hold:

[label=()]

has a weakly convergent subsequence

The weak limit of is an minimizing solution of .

, where is the weak limit of .

If the minimizing solution of is unique, then .
Proof.
For convenience and some abuse of notation we use the abbreviations , , , and . Because is a minimizer of the discrete NETT functional by (2) we have
(7)  
(8) 
According to (1), (3), (5), (6) the right hand side in (7) converges to zero and the right hand side in (8) to . Together with (2) we obtain and . This shows that is bounded and by (1), (9) there exists a weakly convergent subsequence . We denote the weak limit by . From (2), (4) we obtain . The weak lower semicontinuity of assumed in (3) shows
Consequently, is an minimizing solution of and . If the minimizing solution is unique then is the only weak accumulation point of which concludes the proof. ∎
2.3 Convergence rates
Next we derive quantitative error estimates (convergence rates) in terms of the absolute Bregman distance. Recall that a function is Gâteaux differentiable at some if the directional derivative exist for every . We denote by the Gâteaux derivative of at . In [14] we introduced the absolute Bregman distance of a Gâteaux differentiable functional at with respect to defined by
(9) 
We write . Convergence rates in terms of the Bregman distance are derived under a smoothness assumption on the true solution in the form of a certina variational inequality. More precisely we assume the following:
Assumptions 2.6 (Conditions for convergence rates).
According to (5) the inverse function exists and is convex. We denote by its Fenchel conjugate.
Proposition 2.7 (Error estimates).
Proof.
Remark 2.8.
The error estimate (10) includes the approximation quality of the discrete or inexact forward operator and the discrete or inexact regularizer described by and , respectively. What might be unexpected at first is the inclusion of two new parameters and . These factors both arise from the approximation of by the finite dimensional spaces , where reflects approximation accuracy in the image of the operator and approximation accuracy with respect to the true regularization functional . Note that in the case where the forward operator, the regularizer and the solution space are given precisely, we have . In this particular case we recover the estimate derived for the NETT in [14].
Theorem 2.9 (Convergence rates).
Let the assumptions of Proposition 2.7 hold and consider the parameter choice rule and let the approximation errors satisfy , . Then we have the convergence rate
(11) 
Proof.
Noting that remains bounded as , this directly follows from Proposition 2.7 ∎
Next we verify that a variational inequality of the form (5) is satisfied with under a typical source like condition.
Lemma 2.10 (Variational inequality under source condition).
Let , be Gâteaux differentiable at , consider the distance measure and assume there exist and with such that for all with we have
(12)  
Then (5) holds with and .
Proof.
Let with . Using the CauchySchwarz inequality and equation (12), we can estimate
Additionally, if , we have , and on the other hand if , we have . Putting this together we get
and thus . ∎
Corollary 2.11 (Convergence rates under source condition).
Proof.
In Corollary 2.11, the approximation quality of the discrete operator and the discrete and inexact regularization functional need to be of the same order.
3 Application to a limited data problem in PAT
Photoacoustic Tomography (PAT) is an emerging noninvasive coupledphysics biomedical imaging technique with high contrast and high spatial resolution [13, 21]. It works by illuminating a semitransparent sample with short optical pulses which causes heating of the sample followed by expansion and the subsequent emission of an acoustic wave. Sensors on the outside of the sample measure the acoustic wave and these measurements are then used to reconstruct the initial pressure , which provides information about the interior of the object. The cases and are relevant for applications in PAT. Here we only consider the case and assume a circular measurement geometry. The 2D case arises for example when using integrating line detectors in PAT [21].
3.1 Discrete forward operator
The pressure data satisfies the wave equation with initial data and . In the case of circular measurement geometry one assumes that vanishes outside the unit disc and the measurement sensors are located on the boundary . We assume that the phantom will not generate any data for some region , for example when the acoustic pressure generated inside is too small to be recorded. This masked PAT problem consists in the recovery of the function from sampled noisy measurements of where denotes the solution operator of the wave equation and the indicator function on . Note that the resulting inverse problem can be seen of the combination of an inpainting problem and in inverse problems for the wave equation.
In order to implement the PAT forward operator we use a basis ansatz where are basis coefficients and a generalized KaiserBessel (KB) and with . The generalized KB functions are popular in tomographic inverse problems [16, 28, 31, 32] and denote radially symmetric functions with support in defined by
(14) 
Here is the modified Bessel function of the first kind of order and the parameters and denote the window taper and support radius, respectively. Since is linear we have . For convenience we will use a pseudo3D approach where use the 3D solution of for which there exists an analytical representation [31]. Denote by uniformly spaced sensor locations on and by uniformly sampled measurement times in . Define the model matrix by and an diagonal matrix by if and zero otherwise. Let
be the singular valued decomposition. We then consider the discrete forward matrix
where is the diagonal matrix derived fromby setting singular values smaller than some
to zero. In our experiments we use , and take fixed as a diagonal stripe of width .3.2 Discrete NETT
We consider the discrete NETT with discrepancy term and regularizer given by
(15) 
where with is a smooth version of the total variation [1]and is a learnable network. We take as the UNet [26]
with residual connection, which has first been applied to PAT image reconstruction in
[4]. We generate training data that consist of square shaped rings with random profile and random location. See Figure 1 for an example of one such phantom (note that all plots in signal space use the same colorbar) and the corresponding data. We get a set of phantoms and corresponding basic reconstructions , where is the pseudoinverse andis Gaussian noise with standard deviation of
with . The networks are trained by minimizing where we used the Adam optimizer with learning rate 0.01 and . The considered loss is that we want the trained regularizer to give small values for and large values for . The strategy is similar to [14] but we use the final output of the network for the regularizer as proposed in [5]. To minimize (15) we use Algorithm 1 which implements a forwardbackward scheme [6].3.3 Numerical results
For the numerical results we train two regularizers and as described in Section 3.2
. The networks are implemented using PyTorch
[22]. We also use PyTorch in order to calculate the gradient . We take , and in Algorithm 1 and compute the inverse only once and then use it for all examples. We set for the noisefree case, for the low noise case and for the high noise cases, respectively, and selected a fixed . We expect that the NETT functional will yield better results due to data consistency, which is mainly helpful outside the masked center diagonal.First we use the phantom from the testdata shown in Figure 1. The results using post processing and NETT are shown in Figure 2. One sees that all results with higher noise than used during training are not very good. This indicates that one should use similar noise as in the later applications even for the NETT. Figure 3 shows the average error using 10 test phantoms similar to the on in Figure 1. Careful numerical comparison of the numerical convergence rates and the theoretical results of Theorem 2.11 is an interesting aspect of further research. To investigate the stability of our method with respect to phantoms that are different from the training data we create a phantom with different structures as seen in Figure 4. As expected, the post processing network is not really able to reconstruct the circles object, since it is quite different from the training data, but it also does not break down completely. On the other hand, the NETT approach yields good results due to data consistency.
4 Conclusion
We have analyzed the convergence a discretized NETT approach and derived the convergence rates under certain assumptions on the approximation quality of the involved operators. We performed numerical experiments using a limited data problem for PAT that is the combination of an inverse problem for the wave equation and an inpainting problem. To the best of our knowledge this is the first such problem studied with deep learning. The NETT approach yields better results that post processing for phantoms different from the training data. NETT still fails to recover some missing parts of the phantom in cases the data contains more noise than the training data. This highlights the relevance of using different regularizers for different noise levels.
Acknowledgments
S.A. and M.H. acknowledge support of the Austrian Science Fund (FWF), project P 30747N32.
References
 [1] R. Acar and C. R. Vogel. Analysis of bounded variation penalty methods for illposed problems. Inverse Probl., 10(6):1217, 1994.
 [2] J. Adler and O. Öktem. Solving illposed inverse problems using iterative deep neural networks. Inverse Probl., 33(12):124007, 2017.
 [3] H. K. Aggarwal, M. P. Mani, and M. Jacob. MoDL: modelbased deep learning architecture for inverse problems. IEEE Trans. Med. Imaging, 38(2):394–405, 2018.
 [4] S. Antholzer, M. Haltmeier, and J. Schwab. Deep learning for photoacoustic tomography from sparse data. Inverse Probl. Sci. Eng., 27(7):987–1005, 2019.
 [5] S. Antholzer, J. Schwab, J. BauerMarschallinger, P. Burgholzer, and M. Haltmeier. Nett regularization for compressed sensing photoacoustic tomography. In Photons Plus Ultrasound: Imaging and Sensing 2019, volume 10878, page 108783B, 2019.
 [6] P. L. Combettes and J.C. Pesquet. Proximal splitting methods in signal processing. In Fixedpoint algorithms for inverse problems in science and engineering, pages 185–212. Springer, 2011.
 [7] M. V. de Hoop, M. Lassas, and C. A. Wong. Deep learning architectures for nonlinear operator functions and nonlinear inverse problems. arXiv:1912.11090, 2019.
 [8] H. W. Engl, M. Hanke, and A. Neubauer. Regularization of inverse problems, volume 375 of Mathematics and its Applications. Kluwer Academic Publishers Group, Dordrecht, 1996.
 [9] M. Grasmair. Generalized Bregman distances and convergence rates for nonconvex regularization methods. Inverse Probl., 26(11):115014, 2010.
 [10] M. Haltmeier and L. V. Nguyen. Regularization of inverse problems by neural networks. arXiv:2006.03972, 2020.
 [11] V. K. Ivanov, V. V. Vasin, and V. P. Tanana. Theory of linear illposed problems and its applications. Inverse and Illposed Problems Series. VSP, Utrecht, second edition, 2002.

[12]
E. Kobler, T. Klatzer, K. Hammernik, and T. Pock.
Variational networks: connecting variational methods and deep
learning.
In
German conference on pattern recognition
, pages 281–293. Springer, 2017.  [13] R. Kruger, P. Lui, Y. Fang, and R. Appledorn. Photoacoustic ultrasound (paus) – reconstruction tomography. Med. Phys., 22(10):1605–1609, 1995.
 [14] H. Li, J. Schwab, S. Antholzer, and M. Haltmeier. NETT: solving inverse problems with deep neural networks. Inverse Probl., 36(6):065005, 2020.
 [15] S. Lunz, O. Öktem, and C.B. Schönlieb. Adversarial regularizers in inverse problems.
Comments
There are no comments yet.