1 Introduction
Inverse problems naturally arise in a wide range of important imaging applications, ranging from computed tomography, remote sensing to image restoration. Such application can be formulated as the task of reconstructing the unknown image from data
(1) 
Here is a linear operator between Hilbert spaces and is the data distortion. Moreover, we assume , where the index denotes noise level. For we call the noise free equation.
1.1 Regularization
Inverse problems are typically illposed. This means that even in the noise free case the solution of (1) is not unique or is unstable with respect to data perturbations. In order to overcome the illposedness, regularization methods have to be applied which incorporate suitable prior information that acts as a selection criterium and at the same time stabilizes the reconstruction [scherzer2009variational, EngHanNeu96].
One of the most established stable reconstruction approaches is convex variational regularization. In this case one considers minimizers of the generalized Tikhonov functional
(2) 
where is a convex regularizer on the signal space and is the regularization parameter. Several choices for the regularizer have been proposed and analysed. For example, the choice leads to quadratic Tikhonov regularization, choosing the regularizer as the total variation (TV) seminorm yields to TVregularization, and choosing the norm with respect to a given orthonormal basis yields to sparse regularization.
Convex variational regularization is build on a solid theoretical fundament. In particular, if is convex, lower semicontinuous and coercive on , then (2) is wellposed, stable and convergent as . Moreover, for elements satisfying the socalled source condition
, convergence rates in the form of quantitative estimates between
and solutions of (2) have been derived [scherzer2009variational, burger2004convergence].1.2 Framebased methods
Variational regularization (2) is based on the assumption that a small value of the regularizer is a good prior for the underlying signalclass. However, it is often challenging to handcraft an appropriate regularization term for a given class of images. Framebased methods address this issue by adjusting frames to the signal class and using a small value of the regularizer of the frame coefficients as image prior.
Let be a frame of and denote by the synthesis operator that maps to the synthesized signal . Its adjoint is the analysis operator and maps to the socalled analysis coefficients . Two established frame based approaches are the following frame synthesis and frame analysis regularization, respectively,
(3)  
(4) 
Here is a convex regularizer on the coefficient space and the regularization parameter. Typical instances of frame analysis and frame synthesis regularization are when the regularizer is taken as the (weighted) norm given by
(5) 
where . In this case (4) enforces sparsity of the analysis coefficients , whereas (3) enforces sparsity of the synthesis coefficients of the signal. In the case of bases, the two approaches (3) and (4) are equivalent (if one of them is applied with the dual basis). In the redundant case, however, they are fundamentally different [elad2007analysis, frikel2019sparse].
Many different framebased approaches for solving inverse problems have been analyzed [dong2013x, haltmeier2013stable, chaux2007variational, elad2007analysis, frikel2019sparse]. However, these methods rely on linear synthesis operators which may not be appropriate for a given signalclass. Recently, nonlinear deep learning and neural network based approaches showed outstanding performance for various imaging applications. Inspired by such methods, in this paper, we generalize the frame based synthesis approach to allow neural networks as synthesis operators. Because these representations are nonlinear, the resulting approach requires new mathematical theory that we develop in this paper.
1.3 Deep synthesis regularization
In this paper, we propose deep synthesis regularization (DESYRE) where we consider minimizers of
(6) 
Here, are possibly nonlinear synthesis mappings and is the weighted norm defined by (5). The main theoretical results of this paper provide a complete convergence analysis for the DESYRE approach. The corresponding proofs closely follow [grasmair2008sparse, scherzer2009variational]. We point out, however, that the convergence results in these references cannot be directly applied to our setting because we allow the coefficient operators in (6) to depend on . Using nonstationary synthesis mappings allows accounting for discretization as well as for approximate network training. In the recent years various deep learning based reconstruction methods have been derived which outperform classical variational regularization [han2018framing, lee2017deep, adler2017solving, jin2017deep]. However, rigorously analyzing them as regularization methods is challenging. DESYRE follows the deep learning strategy and, as we demonstrate in this paper, allows the derivation of results similar to classical sparse regularization.
In [li2018nett], a somehow dual approach to (6) has been studied, where minimizers of the NETT functional have been considered, where is a nonlinear analysis operator and a regularizer. Due to the nonlinearity of , the penalty is typically nonconvex. One advantage of the deep synthesis method over the analysis counterpart is that the penalty term is still convex, which is beneficial for the theoretical analysis as well as the numerical minimization. Another strength of DESYRE is that the network can be trained without explicit knowledge of the operator . Thus, the proposed approach has some kind of universality like [rick2017one, li2018nett, romano2017little, lunz2018adversarial, aggarwal2018modl] in the sense that the network is trained independent of the specific forward operator and can be used for different inverse problems without retraining.
1.4 Outline
The rest of the paper is organized as follows. Section 2 gives a theoretical analysis of the proposed method. In Section 3 we propose a learned synthesis mapping. We present numerical results and compare DESYRE to other reconstruction methods in Section 4. The paper ends with a summary and outlook presented in Section 5. This paper is a significantly changed and extended version of the proceedings [obmann2019sparse] presented at the SampTA 2019 in Bordeaux. The analysis of the proposed method and all the numerical results are completely new.
2 Convergence analysis
This section gives a complete convergence analysis of DESYRE together with convergence rates. For the following analysis consider the coefficient equation
(7) 
where is the linear forward operator and a possibly nonlinear synthesis operator. In the case of noisy data, we approach (7) by deep synthesis regularization (6), (5) with variable synthesis operators . Whenever it is clear from the context which norm is used, we omit the subscripts.
2.1 Wellposedness
Throughout this paper we assume that the following assumption holds. For the following let for be given.
Assumption 2.1 (Deep synthesis regularization).

[label=(R0), leftmargin=3em]

and are Hilbert spaces;

is linear and bounded;

is an at most countable set;

is weakly sequentially continuous;

satisfies .
Assumption 5 implies that is coercive. To see this, let and assume . Then, for all , we have and hence
This yields the estimate and proves the coercivity of . Since the datadiscrepancy term is nonnegative this also shows the coercivity of the synthesis functional for every .
Remark 2.2.
The above proof relies on the fact that for . Since the inequality on holds for any , the above proof can be done with a weighted norm. Similarly, the following wellposedness and the convergence results also hold for the (weighted) regularizer with .
As the sum of nonnegative convex and weakly continuous functionals, is convex and weakly lower semicontinuous. Assumptions 2, 4 imply that is weakly sequentially continuous. Moreover, the norm is weakly sequentially lower semicontinuous. This shows that is weakly lower semicontinuous as a sum of weakly lower semicontinuous functionals. Coercivity and weak lower semicontinuity basically yield the following wellposedness results for deep synthesis regularization.
Theorem 2.3 (Wellposedness).
Let Assumption 2.1 be satisfied, and . Then the following hold:

[label=()]

Existence: has at least one minimizer.

Stability: Let satisfy and choose . Then has a convergent subsequence and the limit of every convergent subsequence is a minimizer of .
2.2 Convergence
An element in the set is called minimizing solution of (7). Note that minimizing solutions exists whenever there is any solution with , see [scherzer2009variational, Theorem 3.25]. For the following convergence results we make the following additional assumption

[label=(R0), leftmargin=3em]

.
This assumption guarantees that arbitrarily well approximates as .
Theorem 2.4 (Convergence).
Proof.
Let and write , . By definition of , we have
(9) 
The right hand side in (9) tends to , which together with the estimate and (8) yields
(10)  
(11) 
The coercivity of and (11) in turn imply that there is some weakly convergent subsequence . We denote its weak limit by .
Using (10) and 6, we see that solves (7). The lower semicontinuity of and (11) imply
Hence is an minimizing solution of (7) and . According to [grasmair2008sparse, Lemma 2], weak convergence of together with the convergence of implies . Finally, if (7) has a unique minimizing solution, then every subsequence of has a subsequence converging to , which implies . ∎
The existence of a solution to (7) always implies the existence of a solution to the original problem. Moreover, Theorem 2.4 shows strong convergence of the regularized solutions in the coefficient space . By assuming that the synthesis mappings are uniformly Lipschitz continuous we further get the strong convergence of the regularized solutions in the signal space.
Theorem 2.5 (Convergence in signal space).
Let the assumptions of Theorem 2.4 hold. Assume, additionally, that has a unique minimizing solution , and that are uniformly Lipschitz. Consider , and define , . Then .
Proof.
Remark 2.6.
With , Theorem 2.4 states that has a subsequence converging to some minimizing solution of . If we additionally assume that are uniformly Lipschitzcontinuous, then following the proof of Theorem 2.5 one shows that . In particular, the limits are characterized as solutions of having a representation using an minimizing solution of the coefficient problem .
A network is typically given as a composition of trained Lipschitzcontinuous functions and given activation functions. The standard activation functions all satisfy a Lipschitzcontinuity condition, e.g. the ReLU,
or sigmoid function are all Lipschitzcontinuous. The uniform Lipschitzcontinuity assumption on the networks
can therefore easily be fulfilled.2.3 Convergence rates
Convergence rates name quantitative error estimates between exact and regularized solutions. In order to derive such results we have to make some additional assumptions on the interplay between the regularization functional and the operators , , .
Assumption 2.7.
Proposition 2.8 (Quantitative error estimates).
Proof.
By definition of we have
Using Assumption 2.7 we get
Applying Young’s inequality with and shows
and the assertion follows since the terms on the right hand side are nonnegative. ∎
In the following we write if for and some constants .
Theorem 2.9 (Linear convergence rate).
Proof.
This is an immediate consequence of the first inequality in Proposition 2.8. ∎
Remark 2.10.
The convergence rate condition in Assumptions 2.1 is the same as by taking and choosing for the forward operator in [grasmair2008sparse, Assumption 1]. As shown in [grasmair2008sparse], this assumption is satisfied if is linear and injective on certain finite dimensional subspaces, and the socalled source condition is satisfied. The latter condition is in particular satisfied if is finite and is injective.
3 Learned synthesis operator
In this section we propose a nonlinear learned synthesis operator which is part of a sparse encoderdecoder pair. We describe a modified Unet that we use for the network architecture and give details on the network training.
3.1 Sparse encoderdecoder pair
Following the deep learning paradigm we select the synthesis operator from a parametrized class of neural network functions by adjusting them to certain training images . Here the subscript refers to the parameters of the network taken from a finite dimensional Hilbert space. The theoretical analysis presented in the previous section is based on the assumption that the images of interest can be represented as where has small value of the regularizer . To find a suitable synthesis network , we write the corresponding coefficients in the form with another neural network applied to .
In order to enforce sparsity, a reasonable training strategy is to take the network parameters as solutions of the constraint minimization problem
(12) 
Here is a regularization term with regularization parameters . As described in the following subsection 3.2, we use a modified tightframe Unet as actual architecture for . The weights in are chosen as , where corresponds to the coefficients in the th downsampling step of the network architecture (see Figure 3.1).
3.2 Modified tight frame Unet
For the numerical results we take as a variation of the tightframe Unet [han2018framing]
, where we do not include the bypassconnections from the sequential layer to the concatenation layer. The input is passed to a sequential layer which consists of convolution, batch normalization and a ReLU activation. The first sequential layer starts with
channels. We pass the output of the first sequential layer to another sequential layer where we reduce the number of channels to to keep the output dimension of the encoder reasonable. The output of the second sequential layer is then passed to the downsampling step with the filters and , that are given by the Haar wavelets lowpass and highpass filters, respectively. The output of the high pass filtered image then serves as one set of inputs for the decoder. This downsampling step is then recursively applied to the low frequency output . The lowpass output of the last step of the network is also used as input for the decoder . In each downsampling step the dimensions are reduced by a factor of while the number of channels is increased by a factor of .The upsampling is performed with the transposed filters of the downsampling step. The outputs of the upsampling step and are then concatenated and two sequential layers are applied. The channel sizes of these sequential layers are the same as the corresponding first sequential layer of the downsampling step. To obtain the final output we apply a convolution with no activation function. We use downsampling and upsampling steps. A visualization of one downsampling and upsampling step is depicted in Figure 3.1.
Table 3.1 shows the dimension of the feature outputs and the channel sizes for the step. Note that the number of channels for the highpass and lowpass filtered output is multiplied by because we have different highpass filters.
Dimension  No. Channels  

Sequential  
Sequential  same  


Concatenation  same  
Sequential  same 
3.3 Network training
Finding minimizers of (12
) might be unstable in practice and difficult to solve. Therefore we consider a relaxed version where the constraint is added as a penalty. Hence we train the networks by instead considering the following lossfunction
(13) 
where is the regularization parameter in (6) and , . In the numerical realization, the parameters have been chosen empirically as , . We minimize (13) with the Adam optimizer [kingma2014adam] using proposed hyperparameter settings for Adam for epochs and a batch size of .
As training data we use the Low Dose CT Grand Challenge dataset provided by Mayo Clinic [mccollough2016tu]. The complete dataset consists of grayscale images from which we selected a subset containing only lung slices. This results in a dataset with a total of images of which images (corresponding to 8 patients) were used for training purposes and (corresponding to the remaining 2 patients) were used for testing purposes. Each image was scaled to have pixelvalues in the interval .
4 Numerical results
In this section we present numerical results for sparse view CT, comparing the DESYRE functional (6) with wavelet synthesis regularization, TVregularization and a postprocessing network.
4.1 Sparse view CT
For the numerical results we consider the Radon transform [Nat86, helgason1999radon] with undersampling in the angles as a forward operator . Formally, the Radon transform is given by
where . The discretization of the Radon transform was performed using the Operator Discretization Library [adler2017odl] using equidistant angles in and equidistant parallel beams in the interval . In the case of noisy data we simulate the noise by adding Gaussian noise to the measurementdata. This data is visualized in Figure 4.1.
We minimize (6) using the FISTA algorithm [beck2009fast] with a constant stepsize. We found experimentally that we can use a stepsize of at most to guarantee stability. For the following numerical results we choose and use iterations to minimize (6). To obtain the initial coefficients we apply the encoder to the FBP reconstruction. The appropriate regularization parameter value for the DESYRE approach has been chosen empirically to give the best results which resulted in for the noisefree case and for the noisy case. Developing more efficient algorithms for minimizing the functional (6) is an important aspect of future work, that is beyond the scope of the present article.
4.2 Comparison methods
We compare DESYRE to wavelet synthesis regularization, TVregularization and a postprocessing network. In wavelet synthesis regularization and TVregularization we consider elements
(14)  
(15) 
respectively. Here is the Wavelet synthesis operator corresponding to the Haar Wavelet basis and is the discrete total variation of . In order to minimize (14) and (15) numerically we use the FISTA algorithm [beck2009fast] and the primaldual algorithm [chambolle2011first], respectively. The stepsizes for both algorithms are chosen as the inverse of the operator norm of the discretized forward operator. For the Wavelet synthesis regularization we use iterations, whereas the TVregularization needs about iterations to converge. For both methods the regularization parameter was chosen to give the best results which resulted in for wavelet synthesis regularization and for TVregularization in the case of noisefree data and and in the case of noisy data. We initialize each algorithm with , where denotes the filtered backprojection.
As a postprocessing network we use the tight frame UNet of [han2018framing]. For a given set of training images , the network is trained to map the filtered backprojection reconstruction to the residual image . The reconstruction of the signal is then given by . To obtain images with streaking artefacts the FBP using the Hann filter was applied to the data . No noise was added for the training of the postprocessing network. To regularize the parameters of the network we add regularization with regularization parameter . The network was then trained for epochs using the Adam optimizer [kingma2014adam] with hyperparameters as suggested in [kingma2014adam] and a batch size of .
4.3 Results
Figure 4.2 shows an example of the different reconstruction methods. We can see that DESYRE is outperformed by the postprocessing method. We hypothesize that this is because the postprocessing network uses additional information about the inverse problem in training, whereas the training of DESYRE independent of the operator. In comparison to the other regularization methods, DESYRE shows a better performance visually and quantitatively in the case of noisefree data. When comparing DESYRE with the Wavelet synthesis regularization we see that DESYRE does not suffer from the ’pixellike’ structure even though it is also based on the Haar wavelets. Taking a look at the zoomed in version of the plot, we see that the TVregularization somehow merges the details, whereas DESYRE is still able to represent these smaller details. One iteration of the wavelet synthesis, TV regularization and DESYRE take , and seconds, respectively
PSNR  NMSE  

FBP  
Wavelet  
TV  
Postprocessing  
DESYRE  

To quantitatively compare the reconstructions we compute the peaksignaltonoiseration and the normalizedmeansquarederror, respectively
where is the ground truth. The results of these quantitative comparisons in the case of noisefree data can be seen in Table 4.1.
Figure 4.3 shows an example of the reconstructions using noisy data. In this case DESYRE is able to completely remove the noise from the image. However, it cannot satisfactorily recover any small detail features which are present in the image. When compared to TVregularization DESYRE shows a more blurry image and is outperformed by the TVregularization. Like in the noisefree case DESYRE shows a smoother but blurrier image. This suggests that additional regularization, e.g. TVregularization, of the image should be used to further enhance the image quality. Even though no noise was added during training, the postprocessing approach still shows reasonable results. When comparing the two deep learning approaches we see that DESYRE is able to better remove the noise from the image, while the postprocessing approach is able to more clearly represent edges.
Lastly we illustrate one practical advantage of DESYRE over a standard postprocessing approach. To this end, we consider a similar problem as above. However, this time we only take projection views. While the underlying signal class does not change, the reconstruction of the signal is more difficult and the postprocessing network trained with has been used. This results in reconstructions containing artefacts (Figure 4.4) which cannot be removed using the postprocessing network. Additionally, the postprocessing network produces some structure which are not present in the ground truth. While the deep synthesis regularization approach was not able to completely remove the artefacts, it was able to greatly reduce them and does not introduce any additional artefacts. While we have not studied this behaviour in great detail, this suggests that the deep synthesis approach is more generally applicable without retraining the network.
5 Summary and outlook
We have introduced the deep synthesis approach for solving inverse problems. This approach relies on a neural network as a nonlinear synthesis operator for representing a signal using the representation such that is small. Using this representation we proposed to solve inverse problems by minimizing the deep synthesis functional (6), generalizing linear frame based methods. In section 2 we proved that the method is indeed a regularization method and we derived linear convergence rates. To find such nonlinear representations we follow a datadriven approach where we train a sparse encoderdecoder pair. We give numerical results of the proposed method and compare it with other regularization methods and a deep learning approach.
Besides the theoretical benefits, a practical advantage of the deep synthesis approach over standard postprocessing networks is that the training is independent of the forward operator and is thus more flexible in changes of the forward operator. While the results for the sparse view CT problem shows that this approach is outperformed by specifically trained postprocessing network, the deep synthesis approach outperforms classical regularization methods for the considered problem. To further improve the quality of the image reconstruction one could adapt the training strategy to include information about the inverse problem at hand. This could be achieved by, for instance, including images with artefacts in the training data and map these images to some output which does not represent the original image well, thus yielding a large datadiscrepancy value in the minimization process.
Acknowledgements
D.O. and M.H. acknowledge support of the Austrian Science Fund (FWF), project P 30747N32.
Comments
There are no comments yet.