1 Introduction
Various applications in medical imaging, remote sensing and elsewhere require solving inverse problems of the form
(1.1) 
where is a linear operator between Hilbert spaces , , and is the data distortion. Inverse problems are well analyzed and several established approaches for its solution exist, including filterbased methods or variational regularization techniques [1, 2]
. In the very recent years, neural networks (NN) and deep learning appeared as new paradigms for solving inverse problems, and demonstrate impressive performance. Several approaches have been developed, including twostep networks
[3, 4, 5], variational networks [6], iterative networks [7, 8] and regularizing networks [9].Standard deep learning approaches may lack data consistency for unknowns very different from the training images. To address this issue, in [10] a deep learning approach has been introduced where minimizers
(1.2) 
are investigated. Here is a trained NN, a Hilbert space, a functional, and the regularization parameter. The resulting reconstruction approach has been named NETT (for network Tikhonov regularization), as it is a generalized form of Tikhonov regularization using a NN as trained regularizer.
In [10] it is shown that under suitable assumption, NETT yields a convergent regularization method. Moreover, in that paper a training strategy has been proposed, where is trained to favor artifactfree reconstructions selected from a set of training images from a certain data manifold ; see [11] for a simplified training strategy.
Coercive variant of NETT
One of the main assumptions in the analysis of [10] is the coercivity of the regularizer . For the general form used in (3.1), this requires special care in the design and training of the network. In order to overcome this limitation, in this paper we propose a modified form of the regularizer for which we are able to rigorously proof its coercivity. More precisely, we consider
(1.3) 
Here, is an encoderdecoder network trained so such that for any we have and that is small. The term implements learned prior knowledge. The additional term forces to be close to data manifold and, as we shall prove, also guarantees coercivity of the regularization functional.
In particular, in this paper we investigate the case where for some index set and is a weighted norm used as sparsity prior. To construct an appropriate network, we train a (modified) tight frame Unet [12] of the form using the norm of during training, and take the encoder part as analysis network.
Outline
This paper is organized as follows. In Section 2, we present a convergence analysis for the augmented NETT (see (2.1)). In particular, as main auxiliary result, we establish the coercivity of the regularization term. In Section 3
, we derive convergence rates which provide quantitative estimates for the reconstruction accuracy. In Section
4, we present a suggested network structure using a modified tight frame Unet and a corresponding training strategy. The paper concludes with a short summary and outlook given in Section 5.2 Wellposedness and convergence
2.1 Augmented Nett
To solve the inverse problem (1.1) we propose and analyze the augmented NETT, which considers minimizers of
(2.1) 
Here is the regularization parameter, is called encoder network, is called decoder network, a countable index set, are positive weights, is a tuning parameter, and describes the used norm. The case yields a sparsity promoting regularization term , frequently studied when is a basis of frame [13, 14, 15, 16, 17, 18]. In the present paper, we allow and to be nonlinear mappings.
For our convergence analysis, we use the following assumptions, that we assume to be satisfied throughout this section.
Condition 2.1 (Augmented Nett).

[leftmargin=3em,label = (A0)]

is bounded linear;

is weakly sequentially continuous;

is weakly sequentially continuous;

.
The first term in the considered regularizer
(2.2) 
was proposed in [10] to impose a sparsity condition on the signal . In this paper, we add the extra term forcing the minimizers of being close to the solution manifold . This term also allows to prove the coercivity of (see the argument in the proof of Theorem 2.2), which is essential to our analysis.
2.2 Wellposedness
Theorem 2.2 (Existence).
For all and all , the augmented NETT functional (2.1) has at least one minimizer.
Proof.
Let us first prove that is coercive. Indeed, let us assume that there exists a sequence such that and is bounded. Then, is bounded in . Since , we obtain
Therefore, is also bounded in , too. Now, since is weakly sequentially continuous, this implies that also is a bounded sequence. From the estimate
it follows that is a bounded sequence. This is a contradiction and finishes the proof that is coercive.
Because the network is weakly sequentially continuous, the functional is weakly lower semicontinuous. Therefore, is weakly lower semicontinuous, too. Since is bounded from below by , it has an infimum . Let be a sequence such that . Since is coercive, the sequence is bounded, and hence, has an accumulation point in the weak topology, denoted by . Because is sequentially lower semicontinuous, it follows that . Therefore, is a minimizer of . ∎
Theorem 2.3 (Stability).
Let , , with , and . Then weak accumulation points of exist and are minimizers of . For any weak accumulation point and subsequence of , it holds that .
Proof.
The proof follows the lines of [2, Theorem 3.23]. We note that the convexity of the regularizer assumed in [2] is not needed in that proof. For the sake of completeness, we sketch here a proof for the nonconvex regularizer . Fix . Then, for all , we have , which implies
Consequently, is bounded and therefore has a weakly convergent subsequence . Let us prove that each such accumulation point satisfies . Indeed, given any , we have which implies and therefore Since this holds for all , we obtain . It now remains to prove . For that purpose, write . Then , which implies
Together with the weak sequential lowercontinuity of the regularizer , this yields and concludes the proof. ∎
2.3 Convergence
We call an minimizing solution of the equation if
As in the convex case [2], one shows that an minimizing solution exists whenever is solvable.
Theorem 2.4 (Weak Convergence).
Let , set , let satisfy for some sequence with , suppose , and let the parameter choice satisfy
(2.3) 
Then the following hold:

has at least one weak accumulation point ;

Every weak accumulation point of is an minimizing solution of ;

Every weakly convergent subsequence satisfies ;

If the minimizing solution of is unique, then .
Proof.
This follows along the lines of [2, Theorem 3.26]. ∎
Next we derive the strong convergence. For that purpose, let us recall the notions of absolute Bregman distance and total nonlinearity, defined in [10].
Definition 2.5 (Absolute Bregman distance).
Let be Gâteaux differentiable at . The absolute Bregman distance at with respect to is defined by
Here denotes the Gâteaux derivative of at .
Definition 2.6 (Total nonlinearity).
Let be Gâteaux differentiable at . We define the modulus of total nonlinearity of at as the function given by
We call totally nonlinear at , if for all .
The following convergence result in the norm topology holds.
Theorem 2.7 (Strong Convergence).
Assume that has a solution, let be totally nonlinear at all minimizing solutions of , and let , , , be as in Theorem 2.4. Then there is a subsequence of and an minimizing solution of such that . Moreover, if the minimizing solution of is unique, then in the norm topology.
Proof.
Follows from [10, Theorem 2.8]. ∎
2.4 Example: Sparse analysis regularization with a dictionary
A simple application of the above results is the case where is a bounded linear operator with closed range. We can write for socalled atoms and interpret as (analysis) dictionary. Moreover, we take the decoder network as the pseudoinverse of .
We have and the regularizer takes the form
(2.4) 
Clearly the conditions 2, 3 are satisfied, which implies that existence, stability and weak convergence for sparse analysis dictionary regularization with (2.4) hold. Following [2, Theorem 3.49] one also derives the strong convergence.
Note that if is a frame of , then in which case (2.4) yields the standard sparse regularizer . However, for a general trained dictionary we will typically have . This is even the case for overcomplete dictionaries, because the dictionary is only trained on elements in a small subset of which are supposed to satisfy a sparse analysis prior. In this case, the additional term in (2.4) ensures coercivity of the regularizer, which is essential for the convergence of Tikhonov regularization.
3 Convergence rates
Let us now prove a convergence rate in the absolute Bregman distance. For that purpose, we consider general Tikhonov regularization
(3.1) 
Here is a general, possibly nonconvex, regularizer, and the linear forward operator.
The convergence rates will be derived under the following assumptions:

[leftmargin=3em,label = (B0)]

is a bounded linear with finitedimensional range.

is coercive and weakly sequentially lower semicontinuous;

is Lipschitz,

is Gâteaux differentiable.
Remark 3.1.
Note that the regularizer of the augmented NETT (1.3) satisfies 24 as long
and the activation functions of the encoderdecoder network
are differentiable (such as the sigmoid function or smooth versions of ReLU). Condition
3 can be relaxed to a local Lipschitz property.The main restriction in the above list of assumptions is that has finitedimensional range. However, this assumption holds true in practical applications such as sparse data tomography. Unlike [10], we do not assume that , which is quite difficult to validate in practice. Modified provable conditions will be studied in future work.
We start our analysis with the following result.
Proposition 3.2.
Proof.
Let us first prove that for some constant it holds
(3.2) 
Indeed, let be the orthogonal projection onto and define . Then, and . Since the restricted operator is injective and has finitedimensional range, it is bounded from below by a constant . Therefore,
(3.3) 
On the other hand, since is the minimizing solution of and is Lipschitz, we have . Together with (3.3) we obtain (3.2).
Next we prove that there is a constant such that
(3.4) 
Since is an minimizing solution of and is Gâteaux differentiable, we obtain for . On the other hand, if , we have and . This finishes the proof of (3.4).
The following results is our main convergence rates result. It is similar to Proposition [10, Theorem 3.1], but uses different assumptions.
Proof.
4 Network design and training
For the encoderdecoder type network required for the augmented regularizer (2.2) we propose a modified tight frame Unet together with a sparse training strategy.
The tight frameUnet has been introduced in [12] and is less smoothing than the classical Unet [19] in image reconstruction. The tight frame Unet of [12] uses a residual (or bypass) connection, that is not well suited for our purpose. We therefore work with a modified tight frame Unet that has been used in [20] for sparse synthesis regularization with neural networks.
4.1 Modified tightframe Unet
For simplicity we assume that is already a finite dimensional space and contains 2D images of size and channels.
The architecture of the modified tight frame Unet is shown in Figure 4.1. It uses a hierarchical multiscale representation defined by
(4.1) 
with . Here

is the number of used scales;

and are convolutional layers followed by a nonlinearity;

are horizontal, vertical and diagonal highpass filters and is a lowpass filter such that the tight frame property is satisfies,
(4.2)
Following [12]
we define the filters by applying the tensor products
HH, HL, LH and LL of the Haar wavelet lowpass and highpass filters separately in each channel.We write the tight frame Unet defined by (4.1) in the form where is the encoder and the decoder part. The encoder part
maps the image to the high frequency parts , , at the th scale, denoted by for , and to the low frequency part at the coarsest scale, denoted by . The decoder then synthesizes the image from recursively via (4.2).
4.2 Sparse network training
To enforce sparsity in the encoded domain we will use a combination of meansquarederror and an
penalty of the filter coefficients as lossfunction for training. The idea is to thereby enhance the sparsity in the highpass filtered images.
Given a set of training images we aim for an encoderdecoder network reproducing . For that purpose, we take the encoderdecoder pair as the minimizer of the loss function (the empirical risk)
Here and are the adjustably parameters in the tight frame Unet architecture (specifically, in the convolution layers and .) The first term of the lossfunction is supposed to train the network to reproduce the training images . Following the sparse regularization strategy, the second term forces the network to learn convolutions such that highpass filtered coefficients are sparse. The additional term ensures the coercivity of the lossfunction and balances the size of the weights and .
Results for the sparse network training described above can be found in [20]. Actual application of the trained network and the augmented NETT to limited data problems in CT and elsewhere is subject of current work.
5 Conclusion and outlook
In this paper, we proposed and analyzed regularization (called augmented NETT) using the encoder of a encoderdecoder network as sparsifying transform. In order to obtain the coercivity of the regularizer, we augmented with an additional penalty , which can be seen as a measure for the distance of from the ideal data manifold. We were able to prove wellposedness and convergence of the augmented NETT and derived convergence rates in the absolute Bregman distance. We proposed the modified tight frame Unet for the network architecture together with an appropriate sparse training strategy.
Application to sparse data tomography is subject of current work. Theoretical comparison with frame and dictionary based sparse regularization methods will be studied. Moreover, we work on the derivation of additional provable convergence rates results of the augmented NETT.
Acknowledgments
D.O. and M.H. acknowledge support of the Austrian Science Fund (FWF), project P 30747N32. The research of L.N. has been supported by the National Science Foundation (NSF) Grants DMS 1212125 and DMS 1616904.
References
 [1] H. W. Engl, M. Hanke, and A. Neubauer, Regularization of inverse problems, ser. Mathematics and its Applications. Dordrecht: Kluwer Academic Publishers Group, 1996, vol. 375.
 [2] O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen, Variational methods in imaging, ser. Applied Mathematical Sciences. New York: Springer, 2009, vol. 167.
 [3] D. Lee, J. Yoo, and J. C. Ye, “Deep residual learning for compressed sensing MRI,” in IEEE 14th International Symposium on Biomedical Imaging, 2017, pp. 15–18.

[4]
K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,”
IEEE Trans. Image Process., vol. 26, no. 9, pp. 4509–4522, 2017.  [5] S. Antholzer, M. Haltmeier, and J. Schwab, “Deep learning for photoacoustic tomography from sparse data,” Inverse Probl. Sci. and Eng., vol. in press, pp. 1–19, 2018.

[6]
E. Kobler, T. Klatzer, K. Hammernik, and T. Pock, “Variational networks:
connecting variational methods and deep learning,” in
German Conference on Pattern Recognition
. Springer, 2017, pp. 281–293. 
[7]
J. R. Chang, C.L. Li, B. Poczos, and B. V. Kumar, “One network to solve them
all–solving linear inverse problems using deep projection models,” in
IEEE International Conference on Computer Vision (ICCV)
, 2017, pp. 5889–5898.  [8] J. Adler and O. Öktem, “Solving illposed inverse problems using iterative deep neural networks,” Inverse Probl., vol. 33, p. 124007, 2017.
 [9] J. Schwab, S. Antholzer, and M. Haltmeier, “Deep null space learning for inverse problems: convergence analysis and rates,” Inverse Probl., vol. 35, no. 2, p. 025008, 2019.
 [10] H. Li, J. Schwab, S. Antholzer, and M. Haltmeier, “NETT: Solving inverse problems with deep neural networks,” 2018, arXiv:1803.00092.
 [11] S. Antholzer, J. Schwab, J. BauerMarschallinger, P. Burgholzer, and M. Haltmeier, “NETT regularization for compressed sensing photoacoustic tomography,” in Photons Plus Ultrasound: Imaging and Sensing 2019, vol. 10878, 2019, p. 108783B.
 [12] Y. Han and J. C. Ye, “Framing UNet via deep convolutional framelets: Application to sparseview CT,” IEEE Trans. Med. Imag., vol. 37, pp. 1418–1429, 2018.
 [13] I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Comm. Pure Appl. Math., vol. 57, pp. 1413–1457, 2004.
 [14] R. Ramlau and G. Teschke, “A tikhonovbased projection iteration for nonlinear illposed problems with sparsity constraints,” Numerische Mathematik, vol. 104, no. 2, pp. 177–203, 2006.
 [15] M. Grasmair, M. Haltmeier, and O. Scherzer, “Sparse regularization with penalty term,” Inverse Probl., vol. 24, no. 5, pp. 055 020, 13, 2008.
 [16] ——, “Necessary and sufficient conditions for linear convergence of regularization,” Comm. Pure Appl. Math., vol. 64, no. 2, pp. 161–182, 2011.
 [17] S. Vaiter, G. Peyré, C. Dossal, and J. Fadili, “Robust sparse analysis regularization,” IEEE Transactions on information theory, vol. 59, no. 4, pp. 2001–2016, 2012.
 [18] M. Burger, J. Flemming, and B. Hofmann, “Convergence rates in regularization if the sparsity assumption fails,” Inverse Problems, vol. 29, no. 2, p. 025013, 2013.
 [19] O. Ronneberger, P. Fischer, and T. Brox, “Unet: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computerassisted intervention. Springer, 2015, pp. 234–241.
 [20] D. Obmann, J. Schwab, and M. Haltmeier, “Sparse synthesis regularization with deep neural networks,” arXiv:1902.00390, 2019.
Comments
There are no comments yet.