Deep synthesis regularization of inverse problems

Recently, a large number of efficient deep learning methods for solving inverse problems have been developed and show outstanding numerical performance. For these deep learning methods, however, a solid theoretical foundation in the form of reconstruction guarantees is missing. In contrast, for classical reconstruction methods, such as convex variational and frame-based regularization, theoretical convergence and convergence rate results are well established. In this paper, we introduce deep synthesis regularization (DESYRE) using neural networks as nonlinear synthesis operator bridging the gap between these two worlds. The proposed method allows to exploit the deep learning benefits of being well adjustable to available training data and on the other hand comes with a solid mathematical foundation. We present a complete convergence analysis with convergence rates for the proposed deep synthesis regularization. We present a strategy for constructing a synthesis network as part of an analysis-synthesis sequence together with an appropriate training strategy. Numerical results show the plausibility of our approach.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 12

page 14

page 15

page 16

08/08/2019

Sparse ℓ^q-regularization of inverse problems with deep learning

We propose a sparse reconstruction framework for solving inverse problem...
02/28/2018

NETT: Solving Inverse Problems with Deep Neural Networks

Recovering a function or high-dimensional parameter vector from indirect...
09/20/2019

Sparse regularization of inverse problems by biorthogonal frame thresholding

We analyze sparse frame based regularization of inverse problems by mean...
12/23/2021

Shearlet-based regularization in statistical inverse learning with an application to X-ray tomography

Statistical inverse learning theory, a field that lies at the intersecti...
05/13/2022

Regularization Theory of the Analytic Deep Prior Approach

The analytic deep prior (ADP) approach was recently introduced for the t...
05/21/2018

Variational based Mixed Noise Removal with CNN Deep Learning Regularization

In this paper, the traditional model based variational method and learni...
01/23/2022

ULSA: Unified Language of Synthesis Actions for Representation of Synthesis Protocols

Applying AI power to predict syntheses of novel materials requires high-...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Inverse problems naturally arise in a wide range of important imaging applications, ranging from computed tomography, remote sensing to image restoration. Such application can be formulated as the task of reconstructing the unknown image from data

(1)

Here is a linear operator between Hilbert spaces and is the data distortion. Moreover, we assume , where the index denotes noise level. For we call the noise free equation.

1.1 Regularization

Inverse problems are typically ill-posed. This means that even in the noise free case the solution of (1) is not unique or is unstable with respect to data perturbations. In order to overcome the ill-posedness, regularization methods have to be applied which incorporate suitable prior information that acts as a selection criterium and at the same time stabilizes the reconstruction [scherzer2009variational, EngHanNeu96].

One of the most established stable reconstruction approaches is convex variational regularization. In this case one considers minimizers of the generalized Tikhonov functional

(2)

where is a convex regularizer on the signal space and is the regularization parameter. Several choices for the regularizer have been proposed and analysed. For example, the choice leads to quadratic Tikhonov regularization, choosing the regularizer as the total variation (TV) semi-norm yields to TV-regularization, and choosing the -norm with respect to a given orthonormal basis yields to sparse -regularization.

Convex variational regularization is build on a solid theoretical fundament. In particular, if is convex, lower semi-continuous and coercive on , then (2) is well-posed, stable and convergent as . Moreover, for elements satisfying the so-called source condition

, convergence rates in the form of quantitative estimates between

and solutions of (2) have been derived [scherzer2009variational, burger2004convergence].

1.2 Frame-based methods

Variational regularization (2) is based on the assumption that a small value of the regularizer is a good prior for the underlying signal-class. However, it is often challenging to hand-craft an appropriate regularization term for a given class of images. Frame-based methods address this issue by adjusting frames to the signal class and using a small value of the regularizer of the frame coefficients as image prior.

Let be a frame of and denote by the synthesis operator that maps to the synthesized signal . Its adjoint is the analysis operator and maps to the so-called analysis coefficients . Two established frame based approaches are the following frame synthesis and frame analysis regularization, respectively,

(3)
(4)

Here is a convex regularizer on the coefficient space and the regularization parameter. Typical instances of frame analysis and frame synthesis regularization are when the regularizer is taken as the (weighted) -norm given by

(5)

where . In this case (4) enforces sparsity of the analysis coefficients , whereas (3) enforces sparsity of the synthesis coefficients of the signal. In the case of bases, the two approaches (3) and (4) are equivalent (if one of them is applied with the dual basis). In the redundant case, however, they are fundamentally different [elad2007analysis, frikel2019sparse].

Many different frame-based approaches for solving inverse problems have been analyzed [dong2013x, haltmeier2013stable, chaux2007variational, elad2007analysis, frikel2019sparse]. However, these methods rely on linear synthesis operators which may not be appropriate for a given signal-class. Recently, non-linear deep learning and neural network based approaches showed outstanding performance for various imaging applications. Inspired by such methods, in this paper, we generalize the frame based synthesis approach to allow neural networks as synthesis operators. Because these representations are non-linear, the resulting approach requires new mathematical theory that we develop in this paper.

1.3 Deep synthesis regularization

In this paper, we propose deep synthesis regularization (DESYRE) where we consider minimizers of

(6)

Here, are possibly non-linear synthesis mappings and is the weighted -norm defined by (5). The main theoretical results of this paper provide a complete convergence analysis for the DESYRE approach. The corresponding proofs closely follow [grasmair2008sparse, scherzer2009variational]. We point out, however, that the convergence results in these references cannot be directly applied to our setting because we allow the coefficient operators in (6) to depend on . Using non-stationary synthesis mappings allows accounting for discretization as well as for approximate network training. In the recent years various deep learning based reconstruction methods have been derived which outperform classical variational regularization [han2018framing, lee2017deep, adler2017solving, jin2017deep]. However, rigorously analyzing them as regularization methods is challenging. DESYRE follows the deep learning strategy and, as we demonstrate in this paper, allows the derivation of results similar to classical sparse regularization.

In [li2018nett], a somehow dual approach to (6) has been studied, where minimizers of the NETT functional have been considered, where is a non-linear analysis operator and a regularizer. Due to the non-linearity of , the penalty is typically non-convex. One advantage of the deep synthesis method over the analysis counterpart is that the penalty term is still convex, which is beneficial for the theoretical analysis as well as the numerical minimization. Another strength of DESYRE is that the network can be trained without explicit knowledge of the operator . Thus, the proposed approach has some kind of universality like [rick2017one, li2018nett, romano2017little, lunz2018adversarial, aggarwal2018modl] in the sense that the network is trained independent of the specific forward operator and can be used for different inverse problems without retraining.

1.4 Outline

The rest of the paper is organized as follows. Section 2 gives a theoretical analysis of the proposed method. In Section 3 we propose a learned synthesis mapping. We present numerical results and compare DESYRE to other reconstruction methods in Section 4. The paper ends with a summary and outlook presented in Section 5. This paper is a significantly changed and extended version of the proceedings [obmann2019sparse] presented at the SampTA 2019 in Bordeaux. The analysis of the proposed method and all the numerical results are completely new.

2 Convergence analysis

This section gives a complete convergence analysis of DESYRE together with convergence rates. For the following analysis consider the coefficient equation

(7)

where is the linear forward operator and a possibly non-linear synthesis operator. In the case of noisy data, we approach (7) by deep synthesis regularization (6), (5) with variable synthesis operators . Whenever it is clear from the context which norm is used, we omit the subscripts.

2.1 Well-posedness

Throughout this paper we assume that the following assumption holds. For the following let for be given.

Assumption 2.1 (Deep synthesis regularization).
  1. [label=(R0), leftmargin=3em]

  2. and are Hilbert spaces;

  3. is linear and bounded;

  4. is an at most countable set;

  5. is weakly sequentially continuous;

  6. satisfies .

Assumption 5 implies that is coercive. To see this, let and assume . Then, for all , we have and hence

This yields the estimate and proves the coercivity of . Since the data-discrepancy term is non-negative this also shows the coercivity of the synthesis functional for every .

Remark 2.2.

The above proof relies on the fact that for . Since the inequality on holds for any , the above proof can be done with a weighted -norm. Similarly, the following well-posedness and the convergence results also hold for the (weighted) -regularizer with .

As the sum of non-negative convex and weakly continuous functionals, is convex and weakly lower semi-continuous. Assumptions 2, 4 imply that is weakly sequentially continuous. Moreover, the norm is weakly sequentially lower semi-continuous. This shows that is weakly lower semi-continuous as a sum of weakly lower semi-continuous functionals. Coercivity and weak lower semi-continuity basically yield the following well-posedness results for deep synthesis regularization.

Theorem 2.3 (Well-posedness).

Let Assumption 2.1 be satisfied, and . Then the following hold:

  1. [label=()]

  2. Existence: has at least one minimizer.

  3. Stability: Let satisfy and choose . Then has a convergent subsequence and the limit of every convergent subsequence is a minimizer of .

Proof.

Assumptions 2, 4 imply that is weakly sequentially continuous. Therefore the results follow from [grasmair2008sparse, Propositions 5 and 6]. ∎

2.2 Convergence

An element in the set is called -minimizing solution of (7). Note that -minimizing solutions exists whenever there is any solution with , see [scherzer2009variational, Theorem 3.25]. For the following convergence results we make the following additional assumption

  1. [label=(R0), leftmargin=3em]

  2. .

This assumption guarantees that arbitrarily well approximates as .

Theorem 2.4 (Convergence).

Let Assumption 2.1 and 6 hold, let , be an -minimizing solution of (7) and choose such that

(8)

Moreover, let , satisfy , and choose . Then, the following hold:

  1. [label=()]

  2. has a convergent subsequence.

  3. The limit of every convergent subsequence is an -minimizing solution of (7).

  4. If the -minimizing solution of the coefficient equation (7) is unique, then .

Proof.

Let and write , . By definition of , we have

(9)

The right hand side in (9) tends to , which together with the estimate and (8) yields

(10)
(11)

The coercivity of and (11) in turn imply that there is some weakly convergent subsequence . We denote its weak limit by .

Using (10) and 6, we see that solves (7). The lower semi-continuity of and (11) imply

Hence is an -minimizing solution of (7) and . According to [grasmair2008sparse, Lemma 2], weak convergence of together with the convergence of implies . Finally, if (7) has a unique -minimizing solution, then every subsequence of has a subsequence converging to , which implies . ∎

The existence of a solution to (7) always implies the existence of a solution to the original problem. Moreover, Theorem 2.4 shows strong convergence of the regularized solutions in the coefficient space . By assuming that the synthesis mappings are uniformly Lipschitz continuous we further get the strong convergence of the regularized solutions in the signal space.

Theorem 2.5 (Convergence in signal space).

Let the assumptions of Theorem 2.4 hold. Assume, additionally, that has a unique -minimizing solution , and that are uniformly Lipschitz. Consider , and define , . Then .

Proof.

Following the notions of the proof of Theorem 2.4 we have

where is a uniform Lipschitz-constant for . By assumption, and according to Theorem 2.4 we have . This yields and concludes the proof. ∎

Remark 2.6.

With , Theorem 2.4 states that has a subsequence converging to some -minimizing solution of . If we additionally assume that are uniformly Lipschitz-continuous, then following the proof of Theorem 2.5 one shows that . In particular, the limits are characterized as solutions of having a representation using an -minimizing solution of the coefficient problem .

A network is typically given as a composition of trained Lipschitz-continuous functions and given activation functions. The standard activation functions all satisfy a Lipschitz-continuity condition, e.g. the ReLU,

or sigmoid function are all Lipschitz-continuous. The uniform Lipschitz-continuity assumption on the networks

can therefore easily be fulfilled.

2.3 Convergence rates

Convergence rates name quantitative error estimates between exact and regularized solutions. In order to derive such results we have to make some additional assumptions on the interplay between the regularization functional and the operators , , .

Assumption 2.7.

Let be an -minimizing solution of (7) with . Assume there are and such that

for all with and .

Proposition 2.8 (Quantitative error estimates).

Let Assumptions 2.1, 2.7 and 6 hold. Furthermore, assume that satisfies choose , and set and . Then, the following hold:

Proof.

By definition of we have

Using Assumption 2.7 we get

Applying Young’s inequality with and shows

and the assertion follows since the terms on the right hand side are non-negative. ∎

In the following we write if for and some constants .

Theorem 2.9 (Linear convergence rate).

Let Assumptions 2.1, 2.7 and 6 hold, let be such that and choose . Then the following hold:

  1. [label=()]

  2. ,

  3. Assume that is Lipschitz and set , . Then .

Proof.

This is an immediate consequence of the first inequality in Proposition 2.8. ∎

Remark 2.10.

The convergence rate condition in Assumptions 2.1 is the same as by taking and choosing for the forward operator in [grasmair2008sparse, Assumption 1]. As shown in [grasmair2008sparse], this assumption is satisfied if is linear and injective on certain finite dimensional subspaces, and the so-called source condition is satisfied. The latter condition is in particular satisfied if is finite and is injective.

3 Learned synthesis operator

In this section we propose a non-linear learned synthesis operator which is part of a sparse encoder-decoder pair. We describe a modified U-net that we use for the network architecture and give details on the network training.

3.1 Sparse encoder-decoder pair

Following the deep learning paradigm we select the synthesis operator from a parametrized class of neural network functions by adjusting them to certain training images . Here the subscript refers to the parameters of the network taken from a finite dimensional Hilbert space. The theoretical analysis presented in the previous section is based on the assumption that the images of interest can be represented as where has small value of the regularizer . To find a suitable synthesis network , we write the corresponding coefficients in the form with another neural network applied to .

In order to enforce sparsity, a reasonable training strategy is to take the network parameters as solutions of the constraint minimization problem

(12)

Here is a regularization term with regularization parameters . As described in the following subsection 3.2, we use a modified tight-frame U-net as actual architecture for . The weights in are chosen as , where corresponds to the coefficients in the -th downsampling step of the network architecture (see Figure 3.1).

3.2 Modified tight frame U-net

For the numerical results we take as a variation of the tight-frame U-net [han2018framing]

, where we do not include the bypass-connections from the sequential layer to the concatenation layer. The input is passed to a sequential layer which consists of convolution, batch normalization and a ReLU activation. The first sequential layer starts with

channels. We pass the output of the first sequential layer to another sequential layer where we reduce the number of channels to to keep the output dimension of the encoder reasonable. The output of the second sequential layer is then passed to the downsampling step with the filters and , that are given by the Haar wavelets low-pass and high-pass filters, respectively. The output of the high pass filtered image then serves as one set of inputs for the decoder. This downsampling step is then recursively applied to the low frequency output . The low-pass output of the last step of the network is also used as input for the decoder . In each downsampling step the dimensions are reduced by a factor of while the number of channels is increased by a factor of .

The upsampling is performed with the transposed filters of the downsampling step. The outputs of the upsampling step and are then concatenated and two sequential layers are applied. The channel sizes of these sequential layers are the same as the corresponding first sequential layer of the downsampling step. To obtain the final output we apply a -convolution with no activation function. We use downsampling and upsampling steps. A visualization of one downsampling and upsampling step is depicted in Figure 3.1.

Input

Seq.

input

Seq.
Figure 3.1: Illustration of the used synthesis network. We start by applying sequential layers which consist of convolution, batch normalization and activation. After that we use a fixed filter to downsample the features. The network is then recursively applied to the output . We continue by upsampling using the same filters as before and concatenating the features. Lastly we apply sequential layers again to obtain the output.

Table 3.1 shows the dimension of the feature outputs and the channel sizes for the step. Note that the number of channels for the highpass and lowpass filtered output is multiplied by because we have different highpass filters.

Dimension No. Channels
Sequential
Sequential same

Concatenation same
Sequential same
Table 3.1: Dimension and channel sizes for the step of the network. Same means that the same dimension as the one above is used.

3.3 Network training

Finding minimizers of (12

) might be unstable in practice and difficult to solve. Therefore we consider a relaxed version where the constraint is added as a penalty. Hence we train the networks by instead considering the following loss-function

(13)

where is the regularization parameter in (6) and , . In the numerical realization, the parameters have been chosen empirically as , . We minimize (13) with the Adam optimizer [kingma2014adam] using proposed hyper-parameter settings for Adam for epochs and a batch size of .

As training data we use the Low Dose CT Grand Challenge dataset provided by Mayo Clinic [mccollough2016tu]. The complete dataset consists of grayscale images from which we selected a subset containing only lung slices. This results in a dataset with a total of images of which images (corresponding to 8 patients) were used for training purposes and (corresponding to the remaining 2 patients) were used for testing purposes. Each image was scaled to have pixel-values in the interval .

4 Numerical results

In this section we present numerical results for sparse view CT, comparing the DESYRE functional (6) with wavelet synthesis regularization, TV-regularization and a post-processing network.

4.1 Sparse view CT

For the numerical results we consider the Radon transform [Nat86, helgason1999radon] with undersampling in the angles as a forward operator . Formally, the Radon transform is given by

where . The discretization of the Radon transform was performed using the Operator Discretization Library [adler2017odl] using equidistant angles in and equidistant parallel beams in the interval . In the case of noisy data we simulate the noise by adding Gaussian noise to the measurement-data. This data is visualized in Figure 4.1.

Figure 4.1: Measurement-data using 60 projection views. Left: Noise-free data. Right: Noisy data with Gaussian noise added.

We minimize (6) using the FISTA algorithm [beck2009fast] with a constant step-size. We found experimentally that we can use a step-size of at most to guarantee stability. For the following numerical results we choose and use iterations to minimize (6). To obtain the initial coefficients we apply the encoder to the FBP reconstruction. The appropriate regularization parameter value for the DESYRE approach has been chosen empirically to give the best results which resulted in for the noise-free case and for the noisy case. Developing more efficient algorithms for minimizing the functional (6) is an important aspect of future work, that is beyond the scope of the present article.

4.2 Comparison methods

We compare DESYRE to wavelet synthesis regularization, TV-regularization and a post-processing network. In wavelet synthesis regularization and TV-regularization we consider elements

(14)
(15)

respectively. Here is the Wavelet synthesis operator corresponding to the Haar Wavelet basis and is the discrete total variation of . In order to minimize (14) and (15) numerically we use the FISTA algorithm [beck2009fast] and the primal-dual algorithm [chambolle2011first], respectively. The step-sizes for both algorithms are chosen as the inverse of the operator norm of the discretized forward operator. For the Wavelet synthesis regularization we use iterations, whereas the TV-regularization needs about iterations to converge. For both methods the regularization parameter was chosen to give the best results which resulted in for wavelet synthesis regularization and for TV-regularization in the case of noise-free data and and in the case of noisy data. We initialize each algorithm with , where denotes the filtered back-projection.

As a post-processing network we use the tight frame U-Net of [han2018framing]. For a given set of training images , the network is trained to map the filtered back-projection reconstruction to the residual image . The reconstruction of the signal is then given by . To obtain images with streaking artefacts the FBP using the Hann filter was applied to the data . No noise was added for the training of the post-processing network. To regularize the parameters of the network we add -regularization with regularization parameter . The network was then trained for epochs using the Adam optimizer [kingma2014adam] with hyper-parameters as suggested in [kingma2014adam] and a batch size of .

Figure 4.2: Reconstruction results for views an noise-free data. The subplot in the lower right corner shows a zoomed in version of the orange square.
Figure 4.3: Reconstruction results for views and Gaussian noise. The subplot in the lower right corner shows a zoomed in version of the orange square.

4.3 Results

Figure 4.2 shows an example of the different reconstruction methods. We can see that DESYRE is outperformed by the post-processing method. We hypothesize that this is because the post-processing network uses additional information about the inverse problem in training, whereas the training of DESYRE independent of the operator. In comparison to the other regularization methods, DESYRE shows a better performance visually and quantitatively in the case of noise-free data. When comparing DESYRE with the Wavelet synthesis regularization we see that DESYRE does not suffer from the ’pixel-like’ structure even though it is also based on the Haar wavelets. Taking a look at the zoomed in version of the plot, we see that the TV-regularization somehow merges the details, whereas DESYRE is still able to represent these smaller details. One iteration of the wavelet synthesis, TV regularization and DESYRE take , and seconds, respectively

PSNR NMSE
FBP
Wavelet
TV
Post-processing
DESYRE

Table 4.1: Quantitative comparison of different reconstruction methods for noise-free data. Average results standard deviation over 160 different phantoms.

To quantitatively compare the reconstructions we compute the peak-signal-to-noise-ration and the normalized-mean-squared-error, respectively

where is the ground truth. The results of these quantitative comparisons in the case of noise-free data can be seen in Table 4.1.

Figure 4.3 shows an example of the reconstructions using noisy data. In this case DESYRE is able to completely remove the noise from the image. However, it cannot satisfactorily recover any small detail features which are present in the image. When compared to TV-regularization DESYRE shows a more blurry image and is outperformed by the TV-regularization. Like in the noise-free case DESYRE shows a smoother but blurrier image. This suggests that additional regularization, e.g. TV-regularization, of the image should be used to further enhance the image quality. Even though no noise was added during training, the post-processing approach still shows reasonable results. When comparing the two deep learning approaches we see that DESYRE is able to better remove the noise from the image, while the post-processing approach is able to more clearly represent edges.

Lastly we illustrate one practical advantage of DESYRE over a standard post-processing approach. To this end, we consider a similar problem as above. However, this time we only take projection views. While the underlying signal class does not change, the reconstruction of the signal is more difficult and the post-processing network trained with has been used. This results in reconstructions containing artefacts (Figure 4.4) which cannot be removed using the post-processing network. Additionally, the post-processing network produces some structure which are not present in the ground truth. While the deep synthesis regularization approach was not able to completely remove the artefacts, it was able to greatly reduce them and does not introduce any additional artefacts. While we have not studied this behaviour in great detail, this suggests that the deep synthesis approach is more generally applicable without retraining the network.

Figure 4.4: Reconstruction results for projection views and noise-free data. The subplot in the lower right corner shows a zoomed in version of the orange square.

5 Summary and outlook

We have introduced the deep synthesis approach for solving inverse problems. This approach relies on a neural network as a non-linear synthesis operator for representing a signal using the representation such that is small. Using this representation we proposed to solve inverse problems by minimizing the deep synthesis functional (6), generalizing linear frame based methods. In section 2 we proved that the method is indeed a regularization method and we derived linear convergence rates. To find such non-linear representations we follow a data-driven approach where we train a sparse encoder-decoder pair. We give numerical results of the proposed method and compare it with other regularization methods and a deep learning approach.

Besides the theoretical benefits, a practical advantage of the deep synthesis approach over standard post-processing networks is that the training is independent of the forward operator and is thus more flexible in changes of the forward operator. While the results for the sparse view CT problem shows that this approach is outperformed by specifically trained post-processing network, the deep synthesis approach outperforms classical regularization methods for the considered problem. To further improve the quality of the image reconstruction one could adapt the training strategy to include information about the inverse problem at hand. This could be achieved by, for instance, including images with artefacts in the training data and map these images to some output which does not represent the original image well, thus yielding a large data-discrepancy value in the minimization process.

Acknowledgements

D.O. and M.H. acknowledge support of the Austrian Science Fund (FWF), project P 30747-N32.

References