Various applications in medical imaging, remote sensing and elsewhere require solving inverse problems of the form
. Recently, neural networks (NN) and deep learning appeared as a new paradigms for solving inverse problems and demonstrate impressive performance[3, 4, 5, 6].
In order to enforce data consistency, in  a deep learning approach named NETT (NETwork Tikhonov Regularization) has been proposed and analyzed based on minimizing , where is a trained network serving as regularizer. One of the main assumptions for the analysis of  is the coercivity of the regularizer which requires special care in network design and training. In order to overcome this limitation, we introduce the sparse augmented NETT (aNETT), which considers minimizers of
Here is a sparse autoencoder network, and are the encoder and decoder network, is a countable index set, and is the latent Hilbert space of sparse codes. The weighted -norm implements learned sparsity, and the augmented term is to force to be close to the data manifold . Both terms together allow to show coercivity of the regularizer. Based on this we derive stability, convergence and convergence rates for aNETT. Note that sparse regularization is well investigated for linear representations [8, 9] but so far has not been investigated for nonlinear deep autoencoders.
2 Sparse augmented NETT
2.1 Theoretical results
Throughout this section we assume the following.
is linear and bounded.
is weakly sequentially continuous.
is weakly sequentially continuous.
Furthermore, we define and choose the regularizer
Under these assumptions, (1.2) has a minimizer for all and all . Moreover, we have the following results.
Theorem 2.1 (Convergence)
Let , for satisfy , and as . Then with the following hold:
has at least one weak accumulation point.
Every weak accumulation point of is an -minimizing solution of .
If has a unique -minimizing solution , then weakly converges to .
Theorem 2.2 (Convergence rate)
Let be Gâteaux differentiable, have finite-dimensional range and consider minimizers with . Then implies the convergence rate as in terms of the so-called absolute Bregman distance .
2.2 Trained autoencoder
First, an autoencoder is trained such that is close to and that is small for any in a class of images of interest. For that purpose, we add the regularizer
to the loss function for trainingas denoising network. To be more specific, let be a family of autoencoder networks , where are admissible (in the sense of above assumptions) encoder networks and admissible decoder networks. Moreover, suppose that
is a training dataset. To select the particular autoencoder based on the training data, we consider the following training strategy for the sparse denoising autoencoder
and set . Here are data perturbations and a regularization parameter.
By training with perturbed data points , we increase robustness of the trained autoencoder. Note that the perturbations are chosen independently of the operator such that the autoencoder can be used for each forward operator in a universal manner. Clearly then, the autoencoder depends on the specific manifold of images of interest. As we shall see however, opposed to typical deep learning based reconstruction methods which do not account for data consistency outside the training data set, the sparse aNETT is robust against changes of the specific image manifold. Note that Thms. 2.1 and 2.2 hold true for in place of .
2.3 Adaptation to specific forward models
The sparse aNETT (1.2) consists of a data consistency term, a sparsity term, and an augmented term enforcing . Ideally, the set of all approximately data consistent elements that are also approximate fixed points of , is close to the image manifold . However, without adjusting the autoencoder to specific forward models, this is a challenging and maybe impossible task. Indeed, for the application we consider in this paper, namely sparse view CT, we observed that the autoencoder trained independent of the forward operator, was not able to sufficiently well distinguish between data-consistent elements inside and outside desired image class.
One way to increase the value of for undesired but data consistent elements is to adopt the training strategy developed in  and to take the data perturbations in (2.1) as where is a reconstruction operator approximating the Moore-Penrose inverse of , are the artifact free images and images with artefacts. In this case, the training dataset depends on the forward operator, and the autoencoder has to be retrained for every specific forward operator. Therefore, in this paper we follow a different approach. Instead of adjusting the autoencoder training, we compose the operator independent autoencoder with another network , that is trained to distinguish between the desired images and images with operator dependent artefacts. For that purpose we choose a network architecture and select , where is a minimizer of
3 Application to sparse view CT
For the numerical simulations we consider the problem of recovering an image from sparse view parallel-beam CT data with angles. For this problem, the forward operator is given by the angularly subsampled Radon transform
for equidistant angles in . Here
is the line in the plane with normal vectorand signed distance from the origin. Discretization of the Radon transform is done using the ODL library . The data chosen for the numerical simulations are taken from the Low Dose CT Grand Challenge . We consider the images at slice thickness given in the dataset and take the first seven patients for training (4267 images), the next two patients for validation (1143 images) and the last patient for testing (526 images). Each of these images is rescaled to have pixel values in the interval .
3.1 Network training
We first train , by minimizing (2.1) and subsequently train by minimizing (2.2). The sparse autoencoder is chosen as . The network architecture chosen for the problem adapted network is the tight frame U-Net  and the auto-encoder architecture is chosen as in . The perturbations in (2.1where is uniformly sampled from and is the mean of . The weighs in the -term are taken as where is the index of the downsampling-step, see .
We train all networks using the Adam  optimizer with the recommended parameters for iterations and use only the best parameters of these iterations. Here, the best parameters are those which give the smallest loss on the validation set. The parameters are chosen empirically and we found that and give the best results for our approach.
3.2 Solution of sparse aNETT
Note that we have only replaced in th -term but not in the augmented term. To solve the above constrained version of aNETT, we use the ADMM scheme with scaled dual variable. This results in the update scheme
where is a scaling parameter. The strength of the splitting type iteration (3.1)-(3.3) is that the optimization problems involved in each iterative update is simpler and easier to solve than the original sparse aNETT minimization problem (1.2), which contains the non-differentiable -norm as well as non-linear augmented network term. In fact, the -update can be explicitly solved by soft-thresholding. Additionally, if we take being differentiable, the -update can be solved efficiently using gradient type iterative schemes.
We minimize (3.1) using gradient descent with momentum parameter . The ADMM is initialized with , and , where are the given data. Here and below denotes the filtered backprojection operator. The parameter specifications for the minimization using (3.1)-(3.3) in various scenarios are shown in Table 3.1. All parameters were chosen empirically to give the best results. Here, outer refers to the total ADMM iterations, stepsize is the stepsize and inner is the maximal number of iterations for the -update step (3.1).
3.3 Numerical results
The first case we consider is the case of noise-free data. Figure 3.1 shows the FBP reconstruction and the reconstruction with the full network where is defined as above and the aNETT reconstruction . Comparing the results we see that the output of the problem adapted network and the aNETT output are visually identical. This is because, the test image is close to the training data and therefore the considered training procedure implies that is close to minimizer of the sparse aNETT. In comparison to the FBP we see that the aNETT was able to completely remove all the artefacts and yields an almost perfect reconstruction.
To simulate noisy data we add Gaussian noise to the measurement data, i.e. we use where is the mean of the data and
is a standard normal distributed noise term. Reconstructions using FBP, post-processing and the sparse aNETT are shown in Figure3.2. We enhance the contrast in these images by a factor of using the Python Pillow library  to make the differences more clearly visible. The post-processing reconstruction shows some noise-like structure on parts where the image should be mostly constant, e.g. in and around the orange square. We hypothesize that these noise like structures occur because the problem adapted network has not been trained with noise in the data domain and hence has difficulties in reconstructing these. While we could add this to the training the networks would then likely fail on different noise models, e.g. Poisson noise. Comparing this to the aNETT we see that this noise-like structure has been greatly reduced and we have to rely more on the sparsifying term of the regularization method to get noise-free reconstructions.
3.4 Robustness to adversarial attack
One particular advantage of aNETT over post-processing is the increased robustness with respect to the type of image to be reconstructed. To highlight this advantage, as illustrated in the top left image in Figure 3.3 we add a high intensity disc to the CT image shown in 3.1. The disc represents a clear low complexity structure and its accurate reconstruction should be easily possible.
Figure 3.3 shows the reconstructions using the FBP, the post-processing network and the aNETT. Taking a look at the zoomed in square in these images we see that FBP well reconstructs the circle. The post-processing network output, however, has some dark spots close to the circle and generally shows data-inconsistent behaviour around the circle. On the other hand, using the aNETT we see that these problems do not occur. This improved accuracy is because aNETT takes into account the given data even for images different form the training data.
In this paper we introduced the sparse aNETT which is a sparse reconstruction framework using a learned regularization term and founded on a solid mathematical fundament. As we have shown in our numerical experiments, the aNETT shows results similar to a post processing network in the case of noise-free data phantoms close to the training data. However, thanks to included data consistency, the aNETT approach can much better deal with unseen phantom structure. While the chosen simple example might look artificial, it suggests that similar effects occur for more complex structures in a real scenario. When considering the case of noisy data, the aNETT is able to leverage the sparsifying term and increase robustness with respect to noise.
While the aNETT gives an overall more robust and stable reconstruction method, there is currently one major downside. Namely, our proposed approach relies on an iterative minimization scheme and is therefore substantially slower than the reconstruction by a post-processing network. Therefore the design of numerical schemes for minimizing the sparse aNETT functional is a main step of future research. Further, comparisons with different reconstruction methods including network cascades [17, 18], variational and iterative networks [5, 19, 20] and null space networks  in future work.
-  HW Engl, M Hanke, and A Neubauer, Regularization of inverse problems, vol. 375, Springer Science & Business Media, 1996.
-  O Scherzer, M Grasmair, H Grossauer, M Haltmeier, and F Lenzen, Variational methods in imaging, Springer, 2009.
-  D Lee, J Yoo, and JC Ye, “Deep residual learning for compressed sensing mri,” in 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). IEEE, 2017, pp. 15–18.
KH Jin, M McCann, E Froustey, and M Unser,
“Deep convolutional neural network for inverse problems in imaging,”IEEE Trans. Image Process., vol. 26, no. 9, pp. 4509–4522, 2017.
-  J Sun, H Li, and Z et al. Xu, “Deep ADMM-Net for compressive sensing MRI,” in Advances in neural information processing systems, 2016, pp. 10–18.
-  G Wang, “A perspective on deep imaging,” IEEE Access, vol. 4, pp. 8914–8924, 2016.
-  H Li, J Schwab, S Antholzer, and M Haltmeier, “NETT: Solving inverse problems with deep neural networks,” Inverse Probl., 2020.
-  M Grasmair, M Haltmeier, and O Scherzer, “Sparse regularization with -penalty term,” Inverse Probl., vol. 24, no. 5, pp. 055020, 2008.
-  I Daubechies, M Defrise, and C De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Commun. Pur. Appl. Math., vol. 57, no. 11, pp. 1413–1457, 2004.
-  M Haltmeier, L Nguyen, D Obmann, and J Schwab, “Sparse -regularization of inverse problems with deep learning,” arXiv:1908.03006, 2019.
-  J Adler, H Kohr, and O Öktem, “Operator discretization library (odl),” Software available from https://github. com/odlgroup/odl, 2017.
-  C McCollough, “TU-FG-207A-04: Overview of the low dose CT grand challenge,” Med. Phys., vol. 43, no. 6Part35, pp. 3759–3760, 2016.
-  Y Han and JC Ye, “Framing u-net via deep convolutional framelets: Application to sparse-view ct,” IEEE Trans. Med. Imag., vol. 37, no. 6, pp. 1418–1429, 2018.
-  D Obmann, J Schwab, and M Haltmeier, “Deep synthesis regularization of inverse problems,” arXiv:2002.00155, 2020.
-  D Kingma and J Ba, “Adam: A method for stochastic optimization,” arXiv:1412.6980, 2014.
-  A Clark, “Pillow (pil fork) documentation,” 2015.
A Kofler, M Haltmeier, C Kolbitsch, M Kachelrieß, and M Dewey,
“A U-Nets cascade for sparse view computed tomography,”
International Workshop on Machine Learning for Medical Image Reconstruction. Springer, 2018, pp. 91–99.
-  J Schlemper, J Caballero, J Hajnal, A Price, and D Rueckert, “A deep cascade of convolutional neural networks for dynamic MR image reconstruction,” IEEE Trans. Med. Imag., vol. 37, no. 2, pp. 491–503, 2017.
-  J Adler and O Öktem, “Solving ill-posed inverse problems using iterative deep neural networks,” Inverse Probl., vol. 33, no. 12, pp. 124007, 2017.
E Kobler, T Klatzer, K Hammernik, and T Pock,
“Variational networks: connecting variational methods and deep
German conference on pattern recognition. Springer, 2017, pp. 281–293.
-  J Schwab, S Antholzer, and M Haltmeier, “Deep null space learning for inverse problems: convergence analysis and rates,” Inverse Probl., vol. 35, no. 2, pp. 025008, 2019.