First order algorithms in variational image processing

12/13/2014 ∙ by Martin Burger, et al. ∙ 0

Variational methods in imaging are nowadays developing towards a quite universal and flexible tool, allowing for highly successful approaches on tasks like denoising, deblurring, inpainting, segmentation, super-resolution, disparity, and optical flow estimation. The overall structure of such approaches is of the form D(Ku) + α R (u) →_u ; where the functional D is a data fidelity term also depending on some input data f and measuring the deviation of Ku from such and R is a regularization functional. Moreover K is a (often linear) forward operator modeling the dependence of data on an underlying image, and α is a positive regularization parameter. While D is often smooth and (strictly) convex, the current practice almost exclusively uses nonsmooth regularization functionals. The majority of successful techniques is using nonsmooth and convex functionals like the total variation and generalizations thereof or ℓ_1-norms of coefficients arising from scalar products with some frame system. The efficient solution of such variational problems in imaging demands for appropriate algorithms. Taking into account the specific structure as a sum of two very different terms to be minimized, splitting algorithms are a quite canonical choice. Consequently this field has revived the interest in techniques like operator splittings or augmented Lagrangians. Here we shall provide an overview of methods currently developed and recent results as well as some computational studies providing a comparison of different methods and also illustrating their success in applications.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 37

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Variational methods in imaging are nowadays developing towards a quite universal and flexible tool, allowing for highly successful approaches on tasks like denoising, deblurring, inpainting, segmentation, super-resolution, disparity, and optical flow estimation. The overall structure of such approaches is of the form

where the functional is a data fidelity term also depending on some input data and measuring the deviation of from such and is a regularization functional. Moreover is a (often linear) forward operator modeling the dependence of data on an underlying image, and is a positive regularization parameter. While is often smooth and (strictly) convex, the current practice almost exclusively uses nonsmooth regularization functionals. The majority of successful techniques is using nonsmooth and convex functionals like the total variation and generalizations thereof, cf. [28, 31, 40], or -norms of coefficients arising from scalar products with some frame system, cf. [73] and references therein.

The efficient solution of such variational problems in imaging demands for appropriate algorithms. Taking into account the specific structure as a sum of two very different terms to be minimized, splitting algorithms are a quite canonical choice. Consequently this field has revived the interest in techniques like operator splittings or augmented Lagrangians. In this chapter we shall provide an overview of methods currently developed and recent results as well as some computational studies providing a comparison of different methods and also illustrating their success in applications.

We start with a very general viewpoint in the first sections, discussing basic notations, properties of proximal maps, firmly non-expansive and averaging operators, which form the basis of further convergence arguments. Then we proceed to a discussion of several state-of-the art algorithms and their (theoretical) convergence properties. After a section discussing issues related to the use of analogous iterative schemes for ill-posed problems, we present some practical convergence studies in numerical examples related to PET and spectral CT reconstruction.

2 Notation

In the following we summarize the notations and definitions that will be used throughout the presented chapter:

  • , , whereby the maximum operation has to be interpreted componentwise.

  • is the indicator function of a set given by

  • is a set of proper, convex, and lower semi-continuous functions mapping from into the extended real numbers .

  • denotes the effective domain of .

  • denotes the subdifferential of at and is the set consisting of the subgradients of at . If is differentiable at , then Conversely, if contains only one element then is differentiable at and this element is just the gradient of at . By Fermat’s rule, is a global minimizer of if and only if

  • is the (Fenchel) conjugate of . For proper , we have if and only if . If is positively homogeneous, i.e., for all , then

    In particular, the conjugate functions of -norms, , are given by

    (1)

    where and as usual corresponds to and conversely, and denotes the ball of radius with respect to the -norm.

3 Proximal Operator

The algorithms proposed in this chapter to solve various problems in digital image analysis and restoration have in common that they basically reduce to the evaluation of a series of proximal problems. Therefore we start with these kind of problems. For a comprehensive overview on proximal algorithms we refer to [132].

3.1 Definition and Basic Properties

For and , the proximal operator of is defined by

(2)

It compromises between minimizing and being near to , where is the trade-off parameter between these terms. The Moreau envelope or Moreau-Yoshida regularization is given by

A straightforward calculation shows that . The following theorem ensures that the minimizer in (2) exists, is unique and can be characterized by a variational inequality. The Moreau envelope can be considered as a smooth approximation of . For the proof we refer to [8].

Theorem 3.1.

Let . Then,

  • For any , there exists a unique minimizer of (2).

  • The variational inequality

    (3)

    is necessary and sufficient for to be the minimizer of (2).

  • is a minimizer of if and only if it is a fixed point of , i.e.,

  • The Moreau envelope is continuously differentiable with gradient

    (4)
  • The set of minimizers of and are the same.

Rewriting iv) as we can interpret as a gradient descent step with step size for minimizing .

Example 3.2.

Consider the univariate function and

Then, a straightforward computation yields that is the soft-shrinkage function with threshold (see Fig. 1) defined by

Setting , we get

This function is known as Huber function (see Fig. 1).

Figure 1: Left: Soft-shrinkage function for . Right: Moreau envelope .
Theorem 3.3 (Moreau decomposition).

For the following decomposition holds:

For a proof we refer to [141, Theorem 31.5].

Remark 3.4 (Proximal operator and resolvent).

The subdifferential operator is a set-valued function . For , we have by Fermat’s rule and subdifferential calculus that if and only if

which implies by the uniqueness of the proximum that . In particular, is a single-valued operator which is called the resolvent of the set-valued operator . In summary, the proximal operator of coincides with the resolvent of , i.e.,

The proximal operator (2) and the proximal algorithms described in Section 5 can be generalized by introducing a symmetric, positive definite matrix as follows:

(5)

where , see, e.g., [49, 54, 181].

3.2 Special Proximal Operators

Algorithms involving the solution of proximal problems are only efficient if the corresponding proximal operators can be evaluated in an efficient way. In the following we collect frequently appearing proximal mappings in image processing. For epigraphical projections see [12, 47, 88].

3.2.1 Orthogonal Projections

The proximal operator generalizes the orthogonal projection operator. The orthogonal projection of onto a non-empty, closed, convex set C is given by

and can be rewritten for any as

Some special sets are considered next.

Affine set

with , .
In case of subject to we substitute which leads to

This can be directly solved, see [20], and leads after back-substitution to

where denotes the Moore-Penrose inverse of .

Halfspace

with , .
A straightforward computation gives

Box and Nonnegative Orthant

with .
The proximal operator can be applied componentwise and gives

For and we get the orthogonal projection onto the non-negative orthant

Probability Simplex

.
Here we have

where has to be determined such that . Now can be found, e.g., by bisection with starting interval or by a method similar to those described in Subsection 3.2.2 for projections onto the -ball. Note that is a linear spline function with knots so that is completely determined if we know the neighbor values of .

3.2.2 Vector Norms

We consider the proximal operator of , . By the Moreau decomposition in Theorem 3.3, regarding and by (1) we obtain

where . Thus the proximal operator can be simply computed by the projections onto the -ball. In particular, it follows for :

:

For ,

where , , denotes the componentwise soft-shrinkage with threshold .

and

with , where are the sorted absolute values of the components of and is the largest index such that is positive and see also [59, 62]

. Another method follows similar lines as the projection onto the probability simplex in the previous subsection.


Further, grouped/mixed --norms are defined for with , by

For the --norm we see that

can be computed separately for each which results by the above considerations for the -norm for each in

The procedure for evaluating is sometimes called coupled or grouped shrinkage.

Finally, we provide the following rule from [53, Prop. 3.6].

Lemma 3.5.

Let , where is differentiable at with . Then .

Example 3.6.

Consider the elastic net regularizer , see [183]. Setting the gradient in the proximal operator of to zero we obtain

The whole proximal operator of can be then evaluated componentwise and we see by Lemma 3.5 that

3.2.3 Matrix Norms

Next we deal with proximation problems involving matrix norms. For , we are looking for

(6)

where is the Frobenius norm and is any unitarily invariant matrix norm, i.e., for all unitary matrices . Von Neumann (1937) [169] has characterized the unitarily invariant matrix norms as those matrix norms which can be written in the form

where

is the vector of singular values of

and is a symmetric gauge function, see [175]. Recall that is a symmetric gauge function if it is a positively homogeneous convex function which vanishes at the origin and fulfills

for all and all permutations of indices. An analogous result was given by Davis [60] for symmetric matrices, where is replaced by

and the singular values by the eigenvalues.

We are interested in the Schatten- norms for which are defined for and by

The following theorem shows that the solution of (6) reduces to a proximal problem for the vector norm of the singular values of . Another proof for the special case of the nuclear norm can be found in [37].

Theorem 3.7.

Let

be the singular value decomposition of

and a unitarily invariant matrix norm. Then in (6) is given by , where the singular values in are determined by

(7)

with the symmetric gauge function corresponding to .

Proof.

By Fermat’s rule we know that the solution of (6) is determined by

(8)

and from [175] that

(9)

We now construct the unique solution of (8). Let be the unique solution of (7). By Fermat’s rule satisfies and consequently there exists such that

By (9) we see that is a solution of (8). This completes the proof. ∎

For the special matrix norms considered above, we obtain by the previous subsection

4 Fixed Point Algorithms and Averaged Operators

An operator is contractive if it is Lipschitz continuous with Lipschitz constant , i.e., there exists a norm on such that

In case , the operator is called nonexpansive. A function is firmly nonexpansive if it fulfills for all one of the following equivalent conditions [12]:

(10)

In particular we see that a firmly nonexpansive function is nonexpansive.

Lemma 4.1.

For , the proximal operator is firmly nonexpansive. In particular the orthogonal projection onto convex sets is firmly nonexpansive.

Proof.

By Theorem 3.1ii) we have that

With this gives

and similarly

Adding these inequalities we obtain

The Banach fixed point theorem guarantees that a contraction has a unique fixed point and that the Picard sequence

(11)

converges to this fixed point for every initial element . However, in many applications the contraction property is too restrictive in the sense that we often do not have a unique fixed point. Indeed, it is quite natural in many cases that the reached fixed point depends on the starting value . Note that if is continuous and is convergent, then it converges to a fixed point of . In the following, we denote by the set of fixed points of . Unfortunately, we do not have convergence of just for nonexpansive operators as the following example shows.

Example 4.2.

In we consider the reflection operator

Obviously, is nonexpansive and we only have convergence of if . A possibility to obtain a ’better’ operator is to average , i.e., to build

By

(12)

we see that and have the same fixed points. Moreover, since , the sequence converges to for every .

An operator is called averaged if there exists a nonexpansive mapping and a constant such that

Following (12) we see that

Historically, the concept of averaged mappings can be traced back to [106, 113, 149], where the name ’averaged’ was not used yet. Results on averaged operators can also be found, e.g., in [12, 36, 52].

Lemma 4.3 (Averaged, (Firmly) Nonexpansive and Contractive Operators).

space

  • Every averaged operator is nonexpansive.

  • A contractive operator with Lipschitz constant is averaged with respect to all parameters .

  • An operator is firmly nonexpansive if and only if it is averaged with .

Proof.

i) Let be averaged. Then the first assertion follows by

ii) We define the operator . It holds for all that

so is nonexpansive if .
iii) With we obtain the following equalities

and therefore after reordering

If is nonexpansive, then the last expression is and consequently (10) holds true so that is firmly nonexpansive. Conversely, if fulfills (10), then

so that is nonexpansive. This completes the proof. ∎

By the following lemma averaged operators are closed under composition.

Lemma 4.4 (Composition of Averaged Operators).

space

  • Suppose that is averaged with respect to . Then, it is also averaged with respect to any other parameter .

  • Let be averaged operators. Then, is also averaged.

Proof.

i) By assumption, with nonexpansive. We have

and for all it holds that

So, is nonexpansive.
ii) By assumption there exist nonexpansive operators and such that

The concatenation of two nonexpansive operators is nonexpansive. Finally, the convex combination of two nonexpansive operators is nonexpansive so that is indeed nonexpansive. ∎

An operator is called asymptotically regular if it holds for all that

Note that this property does not imply convergence, even boundedness cannot be guaranteed. As an example consider the partial sums of a harmonic sequence.

Theorem 4.5 (Asymptotic Regularity of Averaged Operators).

Let be an averaged operator with respect to the nonexpansive mapping and the parameter . Assume that . Then, is asymptotically regular.

Proof.

Let and for some starting element . Since is nonexpansive, i.e., we obtain

(13)

Using it follows