Variational methods in imaging are nowadays developing towards a quite universal and flexible tool, allowing for highly successful approaches on tasks like denoising, deblurring, inpainting, segmentation, super-resolution, disparity, and optical flow estimation. The overall structure of such approaches is of the form
where the functional is a data fidelity term also depending on some input data and measuring the deviation of from such and is a regularization functional. Moreover is a (often linear) forward operator modeling the dependence of data on an underlying image, and is a positive regularization parameter. While is often smooth and (strictly) convex, the current practice almost exclusively uses nonsmooth regularization functionals. The majority of successful techniques is using nonsmooth and convex functionals like the total variation and generalizations thereof, cf. [28, 31, 40], or -norms of coefficients arising from scalar products with some frame system, cf.  and references therein.
The efficient solution of such variational problems in imaging demands for appropriate algorithms. Taking into account the specific structure as a sum of two very different terms to be minimized, splitting algorithms are a quite canonical choice. Consequently this field has revived the interest in techniques like operator splittings or augmented Lagrangians. In this chapter we shall provide an overview of methods currently developed and recent results as well as some computational studies providing a comparison of different methods and also illustrating their success in applications.
We start with a very general viewpoint in the first sections, discussing basic notations, properties of proximal maps, firmly non-expansive and averaging operators, which form the basis of further convergence arguments. Then we proceed to a discussion of several state-of-the art algorithms and their (theoretical) convergence properties. After a section discussing issues related to the use of analogous iterative schemes for ill-posed problems, we present some practical convergence studies in numerical examples related to PET and spectral CT reconstruction.
In the following we summarize the notations and definitions that will be used throughout the presented chapter:
, , whereby the maximum operation has to be interpreted componentwise.
is the indicator function of a set given by
is a set of proper, convex, and lower semi-continuous functions mapping from into the extended real numbers .
denotes the effective domain of .
denotes the subdifferential of at and is the set consisting of the subgradients of at . If is differentiable at , then Conversely, if contains only one element then is differentiable at and this element is just the gradient of at . By Fermat’s rule, is a global minimizer of if and only if
is the (Fenchel) conjugate of . For proper , we have if and only if . If is positively homogeneous, i.e., for all , then
In particular, the conjugate functions of -norms, , are given by
where and as usual corresponds to and conversely, and denotes the ball of radius with respect to the -norm.
3 Proximal Operator
The algorithms proposed in this chapter to solve various problems in digital image analysis and restoration have in common that they basically reduce to the evaluation of a series of proximal problems. Therefore we start with these kind of problems. For a comprehensive overview on proximal algorithms we refer to .
3.1 Definition and Basic Properties
For and , the proximal operator of is defined by
It compromises between minimizing and being near to , where is the trade-off parameter between these terms. The Moreau envelope or Moreau-Yoshida regularization is given by
A straightforward calculation shows that . The following theorem ensures that the minimizer in (2) exists, is unique and can be characterized by a variational inequality. The Moreau envelope can be considered as a smooth approximation of . For the proof we refer to .
Let . Then,
For any , there exists a unique minimizer of (2).
The variational inequality
is necessary and sufficient for to be the minimizer of (2).
is a minimizer of if and only if it is a fixed point of , i.e.,
The Moreau envelope is continuously differentiable with gradient
The set of minimizers of and are the same.
Rewriting iv) as we can interpret as a gradient descent step with step size for minimizing .
Theorem 3.3 (Moreau decomposition).
For the following decomposition holds:
For a proof we refer to [141, Theorem 31.5].
Remark 3.4 (Proximal operator and resolvent).
The subdifferential operator is a set-valued function . For , we have by Fermat’s rule and subdifferential calculus that if and only if
which implies by the uniqueness of the proximum that . In particular, is a single-valued operator which is called the resolvent of the set-valued operator . In summary, the proximal operator of coincides with the resolvent of , i.e.,
3.2 Special Proximal Operators
Algorithms involving the solution of proximal problems are only efficient if the corresponding proximal operators can be evaluated in an efficient way. In the following we collect frequently appearing proximal mappings in image processing. For epigraphical projections see [12, 47, 88].
3.2.1 Orthogonal Projections
The proximal operator generalizes the orthogonal projection operator. The orthogonal projection of onto a non-empty, closed, convex set C is given by
and can be rewritten for any as
Some special sets are considered next.
with , .
In case of subject to we substitute which leads to
This can be directly solved, see , and leads after back-substitution to
where denotes the Moore-Penrose inverse of .
with , .
A straightforward computation gives
Box and Nonnegative Orthant
The proximal operator can be applied componentwise and gives
For and we get the orthogonal projection onto the non-negative orthant
Here we have
where has to be determined such that . Now can be found, e.g., by bisection with starting interval or by a method similar to those described in Subsection 3.2.2 for projections onto the -ball. Note that is a linear spline function with knots so that is completely determined if we know the neighbor values of .
3.2.2 Vector Norms
where . Thus the proximal operator can be simply computed by the projections onto the -ball. In particular, it follows for :
where , , denotes the componentwise soft-shrinkage with threshold .
. Another method follows similar lines as the projection onto the probability simplex in the previous subsection.
Further, grouped/mixed --norms are defined for with , by
For the --norm we see that
can be computed separately for each which results by the above considerations for the -norm for each in
The procedure for evaluating is sometimes called coupled or grouped shrinkage.
Finally, we provide the following rule from [53, Prop. 3.6].
Let , where is differentiable at with . Then .
3.2.3 Matrix Norms
Next we deal with proximation problems involving matrix norms. For , we are looking for
where is the Frobenius norm and is any unitarily invariant matrix norm, i.e., for all unitary matrices . Von Neumann (1937)  has characterized the unitarily invariant matrix norms as those matrix norms which can be written in the form
whereand is a symmetric gauge function, see . Recall that is a symmetric gauge function if it is a positively homogeneous convex function which vanishes at the origin and fulfills
for all and all permutations of indices. An analogous result was given by Davis  for symmetric matrices, where is replaced by
and the singular values by the eigenvalues.
We are interested in the Schatten- norms for which are defined for and by
The following theorem shows that the solution of (6) reduces to a proximal problem for the vector norm of the singular values of . Another proof for the special case of the nuclear norm can be found in .
By Fermat’s rule we know that the solution of (6) is determined by
and from  that
For the special matrix norms considered above, we obtain by the previous subsection
4 Fixed Point Algorithms and Averaged Operators
An operator is contractive if it is Lipschitz continuous with Lipschitz constant , i.e., there exists a norm on such that
In case , the operator is called nonexpansive. A function is firmly nonexpansive if it fulfills for all one of the following equivalent conditions :
In particular we see that a firmly nonexpansive function is nonexpansive.
For , the proximal operator is firmly nonexpansive. In particular the orthogonal projection onto convex sets is firmly nonexpansive.
By Theorem 3.1ii) we have that
With this gives
Adding these inequalities we obtain
The Banach fixed point theorem guarantees that a contraction has a unique fixed point and that the Picard sequence
converges to this fixed point for every initial element . However, in many applications the contraction property is too restrictive in the sense that we often do not have a unique fixed point. Indeed, it is quite natural in many cases that the reached fixed point depends on the starting value . Note that if is continuous and is convergent, then it converges to a fixed point of . In the following, we denote by the set of fixed points of . Unfortunately, we do not have convergence of just for nonexpansive operators as the following example shows.
In we consider the reflection operator
Obviously, is nonexpansive and we only have convergence of if . A possibility to obtain a ’better’ operator is to average , i.e., to build
we see that and have the same fixed points. Moreover, since , the sequence converges to for every .
An operator is called averaged if there exists a nonexpansive mapping and a constant such that
Following (12) we see that
Historically, the concept of averaged mappings can be traced back to [106, 113, 149], where the name ’averaged’ was not used yet. Results on averaged operators can also be found, e.g., in [12, 36, 52].
Lemma 4.3 (Averaged, (Firmly) Nonexpansive and Contractive Operators).
Every averaged operator is nonexpansive.
A contractive operator with Lipschitz constant is averaged with respect to all parameters .
An operator is firmly nonexpansive if and only if it is averaged with .
i) Let be averaged. Then the first assertion follows by
ii) We define the operator . It holds for all that
so is nonexpansive if .
iii) With we obtain the following equalities
and therefore after reordering
so that is nonexpansive. This completes the proof. ∎
By the following lemma averaged operators are closed under composition.
Lemma 4.4 (Composition of Averaged Operators).
Suppose that is averaged with respect to . Then, it is also averaged with respect to any other parameter .
Let be averaged operators. Then, is also averaged.
i) By assumption, with nonexpansive. We have
and for all it holds that
So, is nonexpansive.
ii) By assumption there exist nonexpansive operators and such that
The concatenation of two nonexpansive operators is nonexpansive. Finally, the convex combination of two nonexpansive operators is nonexpansive so that is indeed nonexpansive. ∎
An operator is called asymptotically regular if it holds for all that
Note that this property does not imply convergence, even boundedness cannot be guaranteed. As an example consider the partial sums of a harmonic sequence.
Theorem 4.5 (Asymptotic Regularity of Averaged Operators).
Let be an averaged operator with respect to the nonexpansive mapping and the parameter . Assume that . Then, is asymptotically regular.
Let and for some starting element . Since is nonexpansive, i.e., we obtain
Using it follows