Various applications in medical imaging, remote sensing and elsewhere require solving inverse problems of the form
where is a linear operator between Hilbert spaces , , and is the data distortion. Inverse problems are well analyzed and several established approaches for its solution exist, including filter-based methods or variational regularization techniques [1, 2]
. In the very recent years, neural networks (NN) and deep learning appeared as new paradigms for solving inverse problems, and demonstrate impressive performance. Several approaches have been developed, including two-step networks[3, 4, 5], variational networks , iterative networks [7, 8] and regularizing networks .
Standard deep learning approaches may lack data consistency for unknowns very different from the training images. To address this issue, in  a deep learning approach has been introduced where minimizers
are investigated. Here is a trained NN, a Hilbert space, a functional, and the regularization parameter. The resulting reconstruction approach has been named NETT (for network Tikhonov regularization), as it is a generalized form of Tikhonov regularization using a NN as trained regularizer.
In  it is shown that under suitable assumption, NETT yields a convergent regularization method. Moreover, in that paper a training strategy has been proposed, where is trained to favor artifact-free reconstructions selected from a set of training images from a certain data manifold ; see  for a simplified training strategy.
Coercive variant of NETT
One of the main assumptions in the analysis of  is the coercivity of the regularizer . For the general form used in (3.1), this requires special care in the design and training of the network. In order to overcome this limitation, in this paper we propose a modified form of the regularizer for which we are able to rigorously proof its coercivity. More precisely, we consider
Here, is an encoder-decoder network trained so such that for any we have and that is small. The term implements learned prior knowledge. The additional term forces to be close to data manifold and, as we shall prove, also guarantees coercivity of the regularization functional.
In particular, in this paper we investigate the case where for some index set and is a weighted -norm used as sparsity prior. To construct an appropriate network, we train a (modified) tight frame U-net  of the form using the -norm of during training, and take the encoder part as analysis network.
This paper is organized as follows. In Section 2, we present a convergence analysis for the augmented -NETT (see (2.1)). In particular, as main auxiliary result, we establish the coercivity of the regularization term. In Section 3
, we derive convergence rates which provide quantitative estimates for the reconstruction accuracy. In Section4, we present a suggested network structure using a modified tight frame U-net and a corresponding training strategy. The paper concludes with a short summary and outlook given in Section 5.
2 Well-posedness and convergence
2.1 Augmented -Nett
To solve the inverse problem (1.1) we propose and analyze the augmented -NETT, which considers minimizers of
Here is the regularization parameter, is called encoder network, is called decoder network, a countable index set, are positive weights, is a tuning parameter, and describes the used norm. The case yields a sparsity promoting regularization term , frequently studied when is a basis of frame [13, 14, 15, 16, 17, 18]. In the present paper, we allow and to be non-linear mappings.
For our convergence analysis, we use the following assumptions, that we assume to be satisfied throughout this section.
Condition 2.1 (Augmented -Nett).
[leftmargin=3em,label = (A0)]
is bounded linear;
is weakly sequentially continuous;
is weakly sequentially continuous;
The first term in the considered regularizer
was proposed in  to impose a sparsity condition on the signal . In this paper, we add the extra term forcing the minimizers of being close to the solution manifold . This term also allows to prove the coercivity of (see the argument in the proof of Theorem 2.2), which is essential to our analysis.
Theorem 2.2 (Existence).
For all and all , the augmented -NETT functional (2.1) has at least one minimizer.
Let us first prove that is coercive. Indeed, let us assume that there exists a sequence such that and is bounded. Then, is bounded in . Since , we obtain
Therefore, is also bounded in , too. Now, since is weakly sequentially continuous, this implies that also is a bounded sequence. From the estimate
it follows that is a bounded sequence. This is a contradiction and finishes the proof that is coercive.
Because the network is weakly sequentially continuous, the functional is weakly lower semi-continuous. Therefore, is weakly lower semi-continuous, too. Since is bounded from below by , it has an infimum . Let be a sequence such that . Since is coercive, the sequence is bounded, and hence, has an accumulation point in the weak topology, denoted by . Because is sequentially lower semi-continuous, it follows that . Therefore, is a minimizer of . ∎
Theorem 2.3 (Stability).
Let , , with , and . Then weak accumulation points of exist and are minimizers of . For any weak accumulation point and subsequence of , it holds that .
The proof follows the lines of [2, Theorem 3.23]. We note that the convexity of the regularizer assumed in  is not needed in that proof. For the sake of completeness, we sketch here a proof for the non-convex regularizer . Fix . Then, for all , we have , which implies
Consequently, is bounded and therefore has a weakly convergent subsequence . Let us prove that each such accumulation point satisfies . Indeed, given any , we have which implies and therefore Since this holds for all , we obtain . It now remains to prove . For that purpose, write . Then , which implies
Together with the weak sequential lower-continuity of the regularizer , this yields and concludes the proof. ∎
We call an -minimizing solution of the equation if
As in the convex case , one shows that an -minimizing solution exists whenever is solvable.
Theorem 2.4 (Weak Convergence).
Let , set , let satisfy for some sequence with , suppose , and let the parameter choice satisfy
Then the following hold:
has at least one weak accumulation point ;
Every weak accumulation point of is an -minimizing solution of ;
Every weakly convergent subsequence satisfies ;
If the -minimizing solution of is unique, then .
This follows along the lines of [2, Theorem 3.26]. ∎
Next we derive the strong convergence. For that purpose, let us recall the notions of absolute Bregman distance and total nonlinearity, defined in .
Definition 2.5 (Absolute Bregman distance).
Let be Gâteaux differentiable at . The absolute Bregman distance at with respect to is defined by
Here denotes the Gâteaux derivative of at .
Definition 2.6 (Total nonlinearity).
Let be Gâteaux differentiable at . We define the modulus of total nonlinearity of at as the function given by
We call totally nonlinear at , if for all .
The following convergence result in the norm topology holds.
Theorem 2.7 (Strong Convergence).
Assume that has a solution, let be totally nonlinear at all -minimizing solutions of , and let , , , be as in Theorem 2.4. Then there is a subsequence of and an -minimizing solution of such that . Moreover, if the -minimizing solution of is unique, then in the norm topology.
Follows from [10, Theorem 2.8]. ∎
2.4 Example: Sparse analysis regularization with a dictionary
A simple application of the above results is the case where is a bounded linear operator with closed range. We can write for so-called atoms and interpret as (analysis) dictionary. Moreover, we take the decoder network as the pseudoinverse of .
We have and the regularizer takes the form
Clearly the conditions 2, 3 are satisfied, which implies that existence, stability and weak convergence for sparse analysis dictionary regularization with (2.4) hold. Following [2, Theorem 3.49] one also derives the strong convergence.
Note that if is a frame of , then in which case (2.4) yields the standard sparse regularizer . However, for a general trained dictionary we will typically have . This is even the case for overcomplete dictionaries, because the dictionary is only trained on elements in a small subset of which are supposed to satisfy a sparse analysis prior. In this case, the additional term in (2.4) ensures coercivity of the regularizer, which is essential for the convergence of Tikhonov regularization.
3 Convergence rates
Let us now prove a convergence rate in the absolute Bregman distance. For that purpose, we consider general Tikhonov regularization
Here is a general, possibly non-convex, regularizer, and the linear forward operator.
The convergence rates will be derived under the following assumptions:
[leftmargin=3em,label = (B0)]
is a bounded linear with finite-dimensional range.
is coercive and weakly sequentially lower semi-continuous;
is Gâteaux differentiable.
Note that the regularizer of the augmented
-NETT (1.3) satisfies 2-4 as long and the activation functions of the encoder-decoder network
and the activation functions of the encoder-decoder network3 can be relaxed to a local Lipschitz property.
The main restriction in the above list of assumptions is that has finite-dimensional range. However, this assumption holds true in practical applications such as sparse data tomography. Unlike , we do not assume that , which is quite difficult to validate in practice. Modified provable conditions will be studied in future work.
We start our analysis with the following result.
Let us first prove that for some constant it holds
Indeed, let be the orthogonal projection onto and define . Then, and . Since the restricted operator is injective and has finite-dimensional range, it is bounded from below by a constant . Therefore,
Next we prove that there is a constant such that
Since is an -minimizing solution of and is Gâteaux differentiable, we obtain for . On the other hand, if , we have and . This finishes the proof of (3.4).
The following results is our main convergence rates result. It is similar to Proposition [10, Theorem 3.1], but uses different assumptions.
From Proposition 3.2, we obtain
Cauchy’s inequality gives . For , we easily conclude . ∎
4 Network design and training
For the encoder-decoder type network required for the augmented regularizer (2.2) we propose a modified tight frame U-net together with a sparse training strategy.
The tight frame-Unet has been introduced in  and is less smoothing than the classical U-net  in image reconstruction. The tight frame U-net of  uses a residual (or by-pass) connection, that is not well suited for our purpose. We therefore work with a modified tight frame U-net that has been used in  for sparse synthesis regularization with neural networks.
4.1 Modified tight-frame U-net
For simplicity we assume that is already a finite dimensional space and contains 2D images of size and channels.
The architecture of the modified tight frame U-net is shown in Figure 4.1. It uses a hierarchical multi-scale representation defined by
with . Here
is the number of used scales;
and are convolutional layers followed by a nonlinearity;
are horizontal, vertical and diagonal high-pass filters and is a low-pass filter such that the tight frame property is satisfies,
we define the filters by applying the tensor productsHH, HL, LH and LL of the Haar wavelet low-pass and high-pass filters separately in each channel.
We write the tight frame U-net defined by (4.1) in the form where is the encoder and the decoder part. The encoder part
maps the image to the high frequency parts , , at the th scale, denoted by for , and to the low frequency part at the coarsest scale, denoted by . The decoder then synthesizes the image from recursively via (4.2).
4.2 Sparse network training
To enforce sparsity in the encoded domain we will use a combination of mean-squared-error and an
-penalty of the filter coefficients as loss-function for training. The idea is to thereby enhance the sparsity in the high-pass filtered images.
Given a set of training images we aim for an encoder-decoder network reproducing . For that purpose, we take the encoder-decoder pair as the minimizer of the loss function (the empirical risk)
Here and are the adjustably parameters in the tight frame U-net architecture (specifically, in the convolution layers and .) The first term of the loss-function is supposed to train the network to reproduce the training images . Following the sparse regularization strategy, the second term forces the network to learn convolutions such that high-pass filtered coefficients are sparse. The additional term ensures the coercivity of the loss-function and balances the size of the weights and .
Results for the sparse network training described above can be found in . Actual application of the trained network and the augmented -NETT to limited data problems in CT and elsewhere is subject of current work.
5 Conclusion and outlook
In this paper, we proposed and analyzed regularization (called augmented -NETT) using the encoder of a encoder-decoder network as sparsifying transform. In order to obtain the coercivity of the regularizer, we augmented with an additional penalty , which can be seen as a measure for the distance of from the ideal data manifold. We were able to prove well-posedness and convergence of the augmented -NETT and derived convergence rates in the absolute Bregman distance. We proposed the modified tight frame U-net for the network architecture together with an appropriate sparse training strategy.
Application to sparse data tomography is subject of current work. Theoretical comparison with frame and dictionary based sparse regularization methods will be studied. Moreover, we work on the derivation of additional provable convergence rates results of the augmented -NETT.
D.O. and M.H. acknowledge support of the Austrian Science Fund (FWF), project P 30747-N32. The research of L.N. has been supported by the National Science Foundation (NSF) Grants DMS 1212125 and DMS 1616904.
-  H. W. Engl, M. Hanke, and A. Neubauer, Regularization of inverse problems, ser. Mathematics and its Applications. Dordrecht: Kluwer Academic Publishers Group, 1996, vol. 375.
-  O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, and F. Lenzen, Variational methods in imaging, ser. Applied Mathematical Sciences. New York: Springer, 2009, vol. 167.
-  D. Lee, J. Yoo, and J. C. Ye, “Deep residual learning for compressed sensing MRI,” in IEEE 14th International Symposium on Biomedical Imaging, 2017, pp. 15–18.
K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,”IEEE Trans. Image Process., vol. 26, no. 9, pp. 4509–4522, 2017.
-  S. Antholzer, M. Haltmeier, and J. Schwab, “Deep learning for photoacoustic tomography from sparse data,” Inverse Probl. Sci. and Eng., vol. in press, pp. 1–19, 2018.
E. Kobler, T. Klatzer, K. Hammernik, and T. Pock, “Variational networks:
connecting variational methods and deep learning,” in
German Conference on Pattern Recognition. Springer, 2017, pp. 281–293.
J. R. Chang, C.-L. Li, B. Poczos, and B. V. Kumar, “One network to solve them
all–solving linear inverse problems using deep projection models,” in
IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5889–5898.
-  J. Adler and O. Öktem, “Solving ill-posed inverse problems using iterative deep neural networks,” Inverse Probl., vol. 33, p. 124007, 2017.
-  J. Schwab, S. Antholzer, and M. Haltmeier, “Deep null space learning for inverse problems: convergence analysis and rates,” Inverse Probl., vol. 35, no. 2, p. 025008, 2019.
-  H. Li, J. Schwab, S. Antholzer, and M. Haltmeier, “NETT: Solving inverse problems with deep neural networks,” 2018, arXiv:1803.00092.
-  S. Antholzer, J. Schwab, J. Bauer-Marschallinger, P. Burgholzer, and M. Haltmeier, “NETT regularization for compressed sensing photoacoustic tomography,” in Photons Plus Ultrasound: Imaging and Sensing 2019, vol. 10878, 2019, p. 108783B.
-  Y. Han and J. C. Ye, “Framing U-Net via deep convolutional framelets: Application to sparse-view CT,” IEEE Trans. Med. Imag., vol. 37, pp. 1418–1429, 2018.
-  I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Comm. Pure Appl. Math., vol. 57, pp. 1413–1457, 2004.
-  R. Ramlau and G. Teschke, “A tikhonov-based projection iteration for nonlinear ill-posed problems with sparsity constraints,” Numerische Mathematik, vol. 104, no. 2, pp. 177–203, 2006.
-  M. Grasmair, M. Haltmeier, and O. Scherzer, “Sparse regularization with penalty term,” Inverse Probl., vol. 24, no. 5, pp. 055 020, 13, 2008.
-  ——, “Necessary and sufficient conditions for linear convergence of -regularization,” Comm. Pure Appl. Math., vol. 64, no. 2, pp. 161–182, 2011.
-  S. Vaiter, G. Peyré, C. Dossal, and J. Fadili, “Robust sparse analysis regularization,” IEEE Transactions on information theory, vol. 59, no. 4, pp. 2001–2016, 2012.
-  M. Burger, J. Flemming, and B. Hofmann, “Convergence rates in -regularization if the sparsity assumption fails,” Inverse Problems, vol. 29, no. 2, p. 025013, 2013.
-  O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
-  D. Obmann, J. Schwab, and M. Haltmeier, “Sparse synthesis regularization with deep neural networks,” arXiv:1902.00390, 2019.