Problems involving reconstruction tasks for functions with discontinuities appear in various biological and medical applications. Examples are the steps in the rotation of the bacterial flagella motor [77, 76, 68], the cross-hybridization of DNA [75, 29, 43], x-ray tomography , electron tomography  and SPECT [50, 91]. An engineering example is crack detection in brittle material in mechanics . Further examples may for instance be found in the papers [56, 57, 24, 21, 33] and the references therein. In general, signals with discontinuities appear in many applied problems. A central task is to restore the jumps, edges, change points or segments of the signals or images from the observed data. These observed data are usually indirectly measured. Furthermore, they consist of measurements on a discretized grid and are typically corrupted by noise.
In many scenarios, nonconvex nonsmooth variational methods are a suitable choice for the partitioning task, i.e., the task of finding the jumps/edges/change points; see for example [56, 68, 12]. In particular, methods based on piecewise constant Mumford-Shah functionals [60, 61] have been used in various different applications. The piecewise-constant Mumford-Shah model also appears in statistics and image processing where it is often called Potts model [35, 12, 13, 14, 70, 89]; this is a tribute to Renfrey B. Potts and his work in statistical mechanics . The variational formulation of the piecewise-constant Mumford-Shah/Potts model (with an indirect measurement term) is given by
Here, is a linear operator modeling the measurement process, e.g., the Radon transform in computed tomography (CT), or the point-spread function of the microscope in microscopy. Further, is an element of the data space, e.g., a sinogram or part of it in CT, or the blurred microscopy image in microscopy. The mathematically precise definition of the jump term in the general situation is rather technical. However, if is piecewise-constant and the discontinuity set of is sufficiently regular, say, a union of curves, then is just the total arc length of this union. In general, the gradient is given in the distributional sense and the boundary length is expressed in terms of the -dimensional Hausdorff measure. When is not piecewise-constant, the jump penalty is infinite . The second term measures the fidelity of a solution to the data The parameter controls the balance between data fidelity and jump penalty.
The piecewise-constant Mumford-Shah/Potts model can be interpreted in two ways. On the one hand, if the imaged object is (approximately) piecewise-constant, then the solution is an (approximate) reconstruction of the imaged object. On the other hand, since a piecewise-constant solution directly induces a partitioning of the image domain, it can be seen as joint reconstruction and segmentation. Executing reconstruction and segmentation jointly typically leads to better results than performing the two steps successively [50, 72, 73, 83]. We note that in order to deal with the discrete data, the energy functional is typically discretized; see Section 2.1. Some references concerning Mumford-Shah functionals are [2, 17, 73, 44, 8, 32, 65] and also the references therein; see also the book . The piecewise constant Mumford-Shah functionals are among the maybe most well-known representatives of the class of free-discontinuity problems introduced by De Giorgi .
The analysis of the nonsmooth and nonconvex problem (1) is rather involved. We discuss some analytic aspects. We first note that without additional assumptions the existence of minimizers of (1) is not guaranteed in a continuous domain setting [32, 31, 73, 82]. To ensure the existence of minimizers, additional penalty terms such as an () term of the form [72, 73] or pointwise boundedness constraints  have been considered. We note that the existence of minimizers is guaranteed in the discrete domain setup for typical discretizations [32, 82]. Another important topic is to verify that the Potts model is a regularization method in the sense of inverse problems. The first work dealing with this task is : The authors assume that the solution space consists of non-degenerate piecewise-constant functions with at most (arbitrary, but fixed) different values which are additionally bounded. Under relatively mild assumptions on the operator , they show stability. Further, by giving a suitable parameter choice rule, they show that the method is a regularizer in the sense of inverse problems. Related references are [49, 44] with the latter including (non-piecewise-constant) Mumford-Shah functionals. We note that Mumford-Shah approaches (including the piecewise constant Mumford-Shah variant) also regularize the boundaries of the discontinuity set of the underlying signal .
Solving the Potts problem is algorithmically challenging. For it is NP-hard for multivariate domains [85, 12], and, for general linear operators it is even NP-hard for univariate signals . Thus, finding a global minimizer within reasonable time seems to be unrealistic in general. Nevertheless, due to its importance, many approximative strategies for multivariate Potts problems with have been proposed. (We note that the case is important as well since it captures the partitioning problem in image processing.) For the Potts problem with general there are still some but not that many existing approaches, in particular in the multivariate situation. For a more detailed discussion, we refer to the paragraph on algorithms for piecewise constant Mumford-Shah problems below. A further discussion of methods for reconstructing piecewise constant signals may be found in . In , we have considered the univariate Potts problem for a general operator and have proposed a majorization-minimization strategy which we called iterative Potts minimization in analogy to iterative thresholding schemes. In this work, we will develop iterative Potts minimization schemes for the more demanding multivariate situation which is important for multivariate applications as appearing in imaging problems.
Existing algorithmic approaches to the piecewise constant Mumford-Shah problem and related problems.
We start to consider the Potts problem for general operator In , Bar et al. consider an Ambrosio-Tortorelli-type approximation. Kim et al. use a level-set based active contour method for deconvolution in . Ramlau and Ring  employ a related level-set approach for the joint reconstruction and segmentation of x-ray tomographic images; further applications are electron tomography  and SPECT . The authors of the present paper have proposed a strategy based on the alternating methods of multipliers in  for the univariate case and in  for the multivariate case.
Fornasier and Ward  rewrite Mumford-Shah problems as a pointwise penalized problem and derive generalized iterative thresholding algorithms for the rewritten problems in the univariate situation. Further, they show that their method converges to a local minimizer in the univariate case. Their approach principally carries over to the piecewise constant Mumford-Shah functional as explained in [82, 88] and then results in a sparsity problem. In the univariate situation, this NP-hard optimization problem is unconstrained and may be addressed by iterative hard thresholding algorithms for penalizations, analyzed by Blumensath and Davies in [9, 10]. (Note that related algorithms based on iterative soft thresholding for penalized problems have been considered by Daubechies, Defrise, and De Mol in .) Artina et al.  in particular consider the multivariate discrete Mumford-Shah model using the pointwise penalization approach of . In the multivariate setting, this results in a corresponding nonconvex and nonsmooth problem with linear constraints. The authors successively minimize local quadratic and strictly convex perturbations (depending on the previous iterate) of a (fixed) smoothed version of the objective by augmented Lagrangian iterations which themselves can be accomplished by iterative thresholding via a Lipschitz continuous thresholding function. They show that the accumulation points of the sequences produced by their algorithm are constraint critical points of the smoothed problem. In the multivariate situation, a similar approach for rewriting the Potts problem results in an sparsity problem with additional equality constraints. Algorithmic approaches for such sparsity problem with equality constraints are the penalty decomposition methods of [58, 59, 94]. The connection with iterative hard thresholding is that the inner loop of the employed two stage process usually is of iterative hard thresholding type. The difference of the hard thresholding based methods to our approach in this paper is that we do not have to deal with constraints and the full matrix but with the nonseparable regularizing term instead of its separable counterpart Hence we cannot use hard thresholding.
Another frequently appearing method in the context of restoration of piecewise constant images is total variation minimization . There the jump penalty is replaced by the total variation The arising minimization problem is convex and therefore numerically tractable with convex optimization techniques [20, 25]. Candès, Wakin, and Boyd  use iteratively reweighted total variation minimization for piecewise constant recovery problems. Results of compressed sensing type related to the Potts problem have been derived by Needell and Ward [63, 64]: under certain conditions, minimizers of the Potts functional agree with total variation minimizers. However, in the presence of noise, total variation minimizers might significantly differ from minimizers of the Potts problem. But the minimizers of the Potts problem are the results frequently desired in practice. Further, algorithms based on convex relaxations of the Potts problem (1) have gained a lot of interest in recent years; see, e.g., [54, 4, 19, 15, 36, 84].
We next discuss approaches for the multivariate Potts problem for the situation which is particularly interessting in image processing and for which there are some further approaches. The first class of approaches is the approach via graph cuts. Here, the range space of is a priori restricted to a relatively small number of values. The problem remains NP-hard, but it then allows for an approach by sequentially solving binary partitioning problems via minimal graph cut algorithms [12, 51, 11]. Another approach is to limit the number of different values which may take without discretizing the range space a priori. For active contours were used by Chan and Vese  to minimize the corresponding binary Potts model. They use a level set function to represent the partitions which evolves according to the Euler-Lagrange equations of the Potts model. A globally convergent strategy for the binary segmentation problem is presented in . The active contour method for was extended to larger in . Note that, for the problem is NP hard. We refer to  for an overview on level set segmentation. In [39, 40, 41], Hirschmüller proposes a non-iterative strategy for the Potts problem which is based on cost aggregation. It has lower computational cost, but comes with lower quality reconstructions compared with graph cuts. Due to the small number of potential values of these methods mainly appear in connection with image segmentation. Methods for restoring piecewise constant images without restricting the range space are proposed in Nikolova et al. [67, 66]. They use non-convex regularizers which are algorithmically approached using a graduated non-convexity approach. We note that the Potts problem (1) does not fall into the class of problems considered in [67, 66]. Last but not least, Xu et al.  proposed a piecewise constant model reminiscent of the Potts model that is approached by a half-quadratic splitting using a pixelwise iterative thresholding type technique. It was later extended to a method for blind image deconvolution .
The contributions of this paper are threefold: (i) We propose a new iterative minimization strategy for multivariate piecewise constant Mumford-Shah/Potts functionals as well as a (still NP-hard) quadratic penalty relaxation. (ii) We provide a convergence analysis of the proposed schemes. (iii) We show the applicability of our schemes in several experiments.
Concerning (i), we propose two schemes which are based on majorization-minimization or forward-backward splitting methods of Douglas-Rachford type . The one scheme addresses the Potts problem directly, whereas the other scheme treats a quadratic penalty relaxation. The solutions of the relaxed problem themselves are not feasible for the Potts problem but near to a feasible solution of the Potts problem where nearness can be quantified. In particular, when a given tolerance in applications is acceptable the relaxed scheme is applicable. In contrast to the approaches in [32, 9] and [58, 59] for sparsity problems which lead to thresholding algorithms, our approach leads to non-separable yet computationally tractable problems in the backward step.
Concerning (ii), we first analyze the proposed quadratic penalty relaxation scheme. In particular, we show convergence towards a local minimizer. Due to the NP hardness of the quadratic penalty relaxation, the convergence result is in the range of what can be expected best. Concerning the scheme for the non-relaxed Potts problem we also perform a convergence analysis. In particular, we obtain results on the convergence towards local minimizers on subsequences. The quality of the convergence results is comparable with the ones in [58, 59]. We note that compared with [58, 59] we face the additional challenge to deal with the non-separability of the backward step. (We note that in practice we observe convergence of the whole sequence, not on a subsequence.)
we consider problems with full and partial data. We begin to apply our algorithms to deconvolution problems. In particular, we consider deblurring and denoising Gaussian blur images and motion blur images, respectively. We further consider noisy and undersampled Radon data, together with the task of joint reconstruction, denoising and segmentation. Finally, we use our method in the situation of pure image partitioning (without blur) which is a widely considered problem in computer vision.
Organization of the paper.
2 Majorization-minimization algorithms for multivariate Potts problems
We use the following finite difference type discretization of the multivariate Potts problem (1) given by
where the come from a finite set of directions and the symbol denotes the directional difference with respect to the direction at the pixel . The symbol denotes the number of nonzero entries of
The simplest set of directions consists of the unit vectorsalong with unit weights. Unfortunately, when refining the grid, this discretization converges to a limit that measures the boundary in terms of the analogue of the Hausdorff measure . The practical consequences are unwanted block artifacts in the reconstruction (geometric staircasing). More isotropic results are obtained by adding the diagonals to the directions and a near isotropic discretization can be achieved by extending this system by the knight moves (The name is inspired by the possible moves of a knight in chess.) Weights for the system of coordinate directions and diagonal directions can be chosen as for the coordinate part and for diagonal part . When additionally adding knight-move directions, weights for the system can be chosen as for the coordinate part for diagonal part , and for diagonal part General schemes for weight design are proposed in  and in . We note that for the system of coordinate directions and diagonal directions the weights of  and in  coincide; the weights displayed for the knight-move case above are the ones derived by the scheme in . For further details we refer to these references.
We record that the considered problem (2) has a minimizer.
The discrete multivariate Potts problem (2) has a minimizer.
2.2 Derivation of the proposed algorithmic schemes
We start out with the discretization (2) of the multivariate Potts problem. We introduce versions of the target and link them via equality constraints in the following consensus form to obtain the problem
where the functional of the variables is given by
Note that solving (3) is equivalent to solving the discrete Potts problem (2). Further, note that we have overloaded the symbol which, for one argument denotes the discrete Potts functional of (2) and for arguments denotes the functional of (4); we have the relation .
A majorization-minimization approach to the quadratic penalty relaxation of the Potts problem.
The quadratic penalty relaxation of (4) is given by
Here, the soft constraints which replace the equalities are realized via the squared Euclidean norms where the nonnegative numbers denote weights (which may be set to zero if no direct coupling between the particular is desired.) The symbol denotes a positive penalty parameter promoting the soft constraint, i.e., increasing enforces the to be closer to each other w.r.t. the Euclidean distance. We note that we later analytically quantify the size of which is necessary to obtain an a priori prescribed tolerance in the see (18) below. Frequently, we use the short-hand notation
Typical choices of the are
i.e., the constant choice (), as well as the coupling between consecutive variables with constant parameter ( if and only if and otherwise.) We note that in these situations only one additional positive parameter appears, and that this parameter is tied to the tolerance one is willing to accept as a distance of the see Algorithm 1.
denotes the identity matrix and
the zero matrix; The matrixhas block columns and block rows. Further, we introduce the difference operator given by
which applies the difference w.r.t. the th direction to the th component of We employ the weights to define the quantity which counts the weighted number of jumps by
With all this comprehensive notation at hand, we may rewrite the functional of (5) as
Here denotes a constant which is chosen larger than the spectral norm of (i.e., the operator norm w.r.t. the norm.) This scaling is made to ensure that is contractive. In terms of and the penalties we require that
For the particular choice as on the left-hand side of (7) we can choose smaller, i.e., For only coupling neighboring with the same constant , i.e., the right-hand coupling of (7), we have where if is even, and if
is odd. These choices ensure thatis contractive by Lemma 8. Basics on surrogate functionals as we need them for this paper are gathered in Section 3.4. Further details on surrogate functionals can be found in [27, 9, 10].
Using elementary properties of the inner product shows that
where is a rest term which is irrelevant when minimizing w.r.t. for fixed Writing this down in terms of the original system matrix and the data yields
where is given by
for and for indices with The following algorithm computes a result fulfilling (18).
We consider the quadratic penalty relaxed Potts problem (5) and tolerance for the targets we are willing to accept. We propose the following algorithm for the relaxed Potts problem (5) (which yields a result with targets deviating from each other by at most ).
Initialize as discussed in the corresponding paragraph below, (e.g., for all )
Iterate until convergence:
1. 2. (19)
The relation between the Potts problem and its quadratic penalty relaxation and obtaining a feasible solution for the Potts problem (4) from the output of Algorithm 1.
As pointed out above, we show in Theorem 4 that Algorithm 1 produces a local minimizer of the quadratic penalty relaxation (5) of the Potts problem (4) and that the corresponding variables of a resulting solution are close up to an a priori prescribed tolerance. This may in practice be already enough. However, strictly speaking a local minimizer of the quadratic penalty relaxation (5) is not feasible for the Potts problem (4).
We will now explain a projection procedure to derive a feasible solution for the Potts problem (4) from a local minimizer of (5) with nearby variables (as produced by Algorithm 1.) Related theoretical results are stated as Theorem 5. In particular, we will see that in case the image operator is lower bounded, the projection procedure applied to the output of Algorithm 1 yields a feasible point which is close to a local minimizer of the original Potts problem (4).
In order to explain the averaging procedure, we need some notions on partitionings. Recall that a partitioning consists of a (finite number of) segments which are pairwise disjoint sets of pixel coordinates whose union equals the image domain i.e.,
Here, we assume that each segment is connected w.r.t. the neighborhood system in the sense that there is a path connecting any two elements in with steps in
We will need the following proposed notion of a directional partitioning. A directional partition w.r.t. a set of directions consists of a set of (discrete) intervals , where each interval is associated with exactly one of the directions here, an interval associated with the direction has to be of the form where and belongs to the discrete domain. (For each direction , the corresponding intervals form an ordinary partition.) We note that Algorithm 1 which produces output induces a directional partitioning as follows. We observe that each variable is associated with a direction For any we let each (maximal) interval of constance of be an interval in associated with
Each partitioning induces a directional partitioning by letting the intervals of be the stripes with direction obtained from segment for each direction and each segment Furthermore, each directional partitioning induces a partitioning by the following merging process.
We say that pixels are related, in symbols, , if there is a path connecting in the sense that for any consecutive members of the path there is an interval of the directional partitioning containing both
The relation obviously defines an equivalence relation and the corresponding equivalence classes yield a partitioning on We use the symbols
to denote the mappings assigning a partitioning a directional partitioning and vice versa, respectively.
As a final preparation we consider a function as produced by Algorithm 1 and a partitioning of and define the following projection to a function by
where the symbol denotes the number of elements in the segment Hence the projection defined via (22) averages w.r.t. all components of and all members of the segment and so produces a piecewise constant function w.r.t. the partitioning
Using these notions we propose the following projection procedure.
Procedure 1 (Projection Procedure).
We consider output of Algorithm 1 together with its induced directional partitioning
We notice that when having a partitioning solving the normal equation in the space of functions constant on would be an alternative to the above second step which, however, might be more expensive.
A penalty method for the Potts problem based on a majorization-minimization approach for its quadratic penalty relaxation.
Intuitively, increasing the parameters during the iterations should tie the closer together such that the constraint of (3) should be ultimately fulfilled which results in an approach for the initial Potts problem (2). Recall that was defined by (6), where the are nonnegative numbers weighting the constraints. We here increase while leaving the fixed during this process.
Let be a strictly increasing sequence (e.g., with ) and be a strictly decreasing sequence converging to zero (e.g., ) Further, let
Initialize as discussed in the corresponding paragraph below, (e.g., for all )
1. 2. (26)
This approach is inspired by  which considers quadratic penalty methods in the sparsity context. There, the authors are searching for a solution with only a few nonzero entries. The corresponding prior is separable. In contrast to this work, the present work considers a non-separable prior.
Although the initialization of Algorithm 1 and of Algorithm 2 is not relevant for its convergence properties (cf. Section 3), the choice of the initialization influences the final result. (Please note that this also might happen for convex but not strictly convex problems.) We discuss different initialization strategies. The simplest choice is the all-zero initialization Likewise, one can select the right hand side of the normal equations of the underlying least squares problem, that is . A third reasonable choice is the solution of the normal equation itself or an approximation of it. Using an approximation might in particular be reasonable to get a regularized approximation of the normal equation. A possible strategy to obtain such a regularized initialization is to apply a fixed number of Landweber iterations  or of the conjugate gradient method to the underlying least square problem. (In our experiments, we initialized Algorithm 1 with the result of 1000 Landweber iterations and Algorithm 2 with .)
2.3 A non-iterative algorithm for minimizing the Potts subproblem (16)
Both proposed algorithms require solving the Potts subproblem (16) in the backward step, see (19),(26). We first observe that (16) can be solved for each of the separately. The corresponding minimization problems are of the prototypical form
with given data , the jump penalty and the direction . As a next step, we see that (28) decomposes into univariate Potts problems for data along the paths in induced by , e.g., for those paths correspond to the rows of and we obtain a minimizer of (28) by determining each of its rows individually. The univariate Potts problem amounts to minimizing
where the data is given by the restriction of to the pixels in of the form i.e., Here the offset is fixed when solving each univariate problem, but varied afterwards to get all lines in the image with direction The target to optimize is denoted by and, in the resulting univariate situation, denotes the number of jumps of .
It is well-known that the univariate direct problem (29) has a unique minimizer. Further these particular problems can be solved exactly by dynamic programming [34, 90, 17, 60, 61] which we briefly describe in the following. For further details we refer to [34, 80]. Assume we have computed minimizers of (29) for partial data for each , . Then the minimum value of (29) for can be found by
where we let be the empty vector, and be the quadratic deviation of from its mean. By denoting the minimizing argument in (30) by the minimizer is given by
where is the mean value of . Thus, we obtain a minimizer for full data by successively computing for each
. By precomputing the first and second moments of dataand storing only jump locations the described method can be implemented in , . Another way to achieve
is based on the QR decomposition of the design matrix by means of Givens rotations, see. Furthermore, the search space can be pruned to speed up computations [46, 81].
3.1 Analytic results
In the course of the derivation of the proposed algorithms above, we consider the quadratic penalty relaxation (5) of the multivariate Potts problem. Although it is more straight-forward to access algorithmically via our approach, we first note that this problem is still NP hard (as is the original problem).
Finding a (global) minimizer of the quadratic penalty relaxation (5) of the multivariate Potts problem is an NP hard problem.
The proof is given in Section 3.3 below. In Section 2.2 we have proposed Algorithm 1 to approach the quadratic penalty relaxation of the multivariate Potts problem. We show that the proposed algorithm converges to a local minimizer and that a feasible point of the original multivariate Potts problem is nearby.
We have the following relation between local minimizers , global minimizers and the fixed points of the iteration of Algorithm 1,
Assume a tolerance we are willing to accept for the distance between the i.e.,
Running Algorithm 1 with the choice of the parameter by
(where is the smallest non-zero eigenvalue of with given by (49); for the particular choice of the coupling given by (7), and respectively) yields a local minimizer of the quadratic penalty relaxation (5) such that the are close up to i.e., (33) is fulfilled.
The proof is given in Section 3.5 below. A solution of Algorithm 1 is not a feasible point for the initial Potts problem (3). However, we see below that it produces a -approximative solution in the sense that there is and a partitioning such that
where is given by (53) below. In this context note that the conditions for a local minimizer are given by and the Lagrange multiplier condition So (35) intuitively means that both the constraint and the Lagrange multiplier condition are approximately fulfilled for the partitioning induced by .
The projection procedure (Procedure 1) proposed in Section 2.2 applied to the solution of Algorithm 1 produces a feasible image (together with a valid partitioning) for the Potts problem (3) which is close to in the sense that
where quantifies the deviation between the Here where the symbol denotes the number of elements in If the imaging operator is lower bounded, i.e., there is a constant such that , a local minimizer of the Potts problem (3) is nearby, i.e.,
The proof of Theorem 5 can be found at the end of Section 3.4, where most relevant statements are already shown in Section 3.3. Theorem 5 theoretically underpins the fact that, on the application side, we may use Algorithm 1 for the Potts problem (3) (accepting some arbitrary small tolerance we may fix in advance).