# Iterative Potts minimization for the recovery of signals with discontinuities from indirect measurements -- the multivariate case

Signals and images with discontinuities appear in many problems in such diverse areas as biology, medicine, mechanics, and electrical engineering. The concrete data are often discrete, indirect and noisy measurements of some quantities describing the signal under consideration. A frequent task is to find the segments of the signal or image which corresponds to finding the discontinuities or jumps in the data. Methods based on minimizing the piecewise constant Mumford-Shah functional -- whose discretized version is known as Potts functional -- are advantageous in this scenario, in particular, in connection with segmentation. However, due to their non-convexity, minimization of such functionals is challenging. In this paper we propose a new iterative minimization strategy for the multivariate Potts functional dealing with indirect, noisy measurements. We provide a convergence analysis and underpin our findings with numerical experiments.

## Authors

• 2 publications
• 9 publications
• 10 publications
05/29/2021

### The Dantzig selector: Recovery of Signal via ℓ_1-αℓ_2 Minimization

In the paper, we proposed the Dantzig selector based on the ℓ_1-αℓ_2 (0<...
01/05/2020

### Emergent limits of an indirect measurement from phase transitions of inference

Measurements are inseparable from inference, where the estimation of sig...
01/27/2020

### Compressed Sensing with 1D Total Variation: Breaking Sample Complexity Barriers via Non-Uniform Recovery

This paper investigates total variation minimization in one spatial dime...
09/07/2020

### Compressed Sensing with 1D Total Variation: Breaking Sample Complexity Barriers via Non-Uniform Recovery (iTWIST'20)

This paper investigates total variation minimization in one spatial dime...
09/09/2019

### Signal retrieval with measurement system knowledge using variational generative model

Signal retrieval from a series of indirect measurements is a common task...
12/24/2004

### Global minimization of a quadratic functional: neural network approach

The problem of finding out the global minimum of a multiextremal functio...
10/05/2015

### A Common-Factor Approach for Multivariate Data Cleaning with an Application to Mars Phoenix Mission Data

Data quality is fundamentally important to ensure the reliability of dat...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Problems involving reconstruction tasks for functions with discontinuities appear in various biological and medical applications. Examples are the steps in the rotation of the bacterial flagella motor [77, 76, 68], the cross-hybridization of DNA [75, 29, 43], x-ray tomography [72], electron tomography [48] and SPECT [50, 91]. An engineering example is crack detection in brittle material in mechanics [3]. Further examples may for instance be found in the papers [56, 57, 24, 21, 33] and the references therein. In general, signals with discontinuities appear in many applied problems. A central task is to restore the jumps, edges, change points or segments of the signals or images from the observed data. These observed data are usually indirectly measured. Furthermore, they consist of measurements on a discretized grid and are typically corrupted by noise.

In many scenarios, nonconvex nonsmooth variational methods are a suitable choice for the partitioning task, i.e., the task of finding the jumps/edges/change points; see for example [56, 68, 12]. In particular, methods based on piecewise constant Mumford-Shah functionals [60, 61] have been used in various different applications. The piecewise-constant Mumford-Shah model also appears in statistics and image processing where it is often called Potts model [35, 12, 13, 14, 70, 89]; this is a tribute to Renfrey B. Potts and his work in statistical mechanics [71]. The variational formulation of the piecewise-constant Mumford-Shah/Potts model (with an indirect measurement term) is given by

 argminu γ∥∇u∥0+\normAu−f22. (1)

Here, is a linear operator modeling the measurement process, e.g., the Radon transform in computed tomography (CT), or the point-spread function of the microscope in microscopy. Further, is an element of the data space, e.g., a sinogram or part of it in CT, or the blurred microscopy image in microscopy. The mathematically precise definition of the jump term in the general situation is rather technical. However, if is piecewise-constant and the discontinuity set of is sufficiently regular, say, a union of curves, then is just the total arc length of this union. In general, the gradient is given in the distributional sense and the boundary length is expressed in terms of the -dimensional Hausdorff measure. When is not piecewise-constant, the jump penalty is infinite [73]. The second term measures the fidelity of a solution to the data The parameter controls the balance between data fidelity and jump penalty.

The piecewise-constant Mumford-Shah/Potts model can be interpreted in two ways. On the one hand, if the imaged object is (approximately) piecewise-constant, then the solution is an (approximate) reconstruction of the imaged object. On the other hand, since a piecewise-constant solution directly induces a partitioning of the image domain, it can be seen as joint reconstruction and segmentation. Executing reconstruction and segmentation jointly typically leads to better results than performing the two steps successively [50, 72, 73, 83]. We note that in order to deal with the discrete data, the energy functional is typically discretized; see Section 2.1. Some references concerning Mumford-Shah functionals are [2, 17, 73, 44, 8, 32, 65] and also the references therein; see also the book [1]. The piecewise constant Mumford-Shah functionals are among the maybe most well-known representatives of the class of free-discontinuity problems introduced by De Giorgi [28].

The analysis of the nonsmooth and nonconvex problem (1) is rather involved. We discuss some analytic aspects. We first note that without additional assumptions the existence of minimizers of (1) is not guaranteed in a continuous domain setting [32, 31, 73, 82]. To ensure the existence of minimizers, additional penalty terms such as an () term of the form [72, 73] or pointwise boundedness constraints [44] have been considered. We note that the existence of minimizers is guaranteed in the discrete domain setup for typical discretizations [32, 82]. Another important topic is to verify that the Potts model is a regularization method in the sense of inverse problems. The first work dealing with this task is [73]: The authors assume that the solution space consists of non-degenerate piecewise-constant functions with at most (arbitrary, but fixed) different values which are additionally bounded. Under relatively mild assumptions on the operator , they show stability. Further, by giving a suitable parameter choice rule, they show that the method is a regularizer in the sense of inverse problems. Related references are [49, 44] with the latter including (non-piecewise-constant) Mumford-Shah functionals. We note that Mumford-Shah approaches (including the piecewise constant Mumford-Shah variant) also regularize the boundaries of the discontinuity set of the underlying signal [44].

Solving the Potts problem is algorithmically challenging. For it is NP-hard for multivariate domains [85, 12], and, for general linear operators it is even NP-hard for univariate signals [82]. Thus, finding a global minimizer within reasonable time seems to be unrealistic in general. Nevertheless, due to its importance, many approximative strategies for multivariate Potts problems with have been proposed. (We note that the case is important as well since it captures the partitioning problem in image processing.) For the Potts problem with general there are still some but not that many existing approaches, in particular in the multivariate situation. For a more detailed discussion, we refer to the paragraph on algorithms for piecewise constant Mumford-Shah problems below. A further discussion of methods for reconstructing piecewise constant signals may be found in [57]. In [88], we have considered the univariate Potts problem for a general operator and have proposed a majorization-minimization strategy which we called iterative Potts minimization in analogy to iterative thresholding schemes. In this work, we will develop iterative Potts minimization schemes for the more demanding multivariate situation which is important for multivariate applications as appearing in imaging problems.

#### Existing algorithmic approaches to the piecewise constant Mumford-Shah problem and related problems.

We start to consider the Potts problem for general operator In [5], Bar et al. consider an Ambrosio-Tortorelli-type approximation. Kim et al. use a level-set based active contour method for deconvolution in [47]. Ramlau and Ring [72] employ a related level-set approach for the joint reconstruction and segmentation of x-ray tomographic images; further applications are electron tomography [48] and SPECT [50]. The authors of the present paper have proposed a strategy based on the alternating methods of multipliers in [82] for the univariate case and in [83] for the multivariate case.

Fornasier and Ward [32] rewrite Mumford-Shah problems as a pointwise penalized problem and derive generalized iterative thresholding algorithms for the rewritten problems in the univariate situation. Further, they show that their method converges to a local minimizer in the univariate case. Their approach principally carries over to the piecewise constant Mumford-Shah functional as explained in [82, 88] and then results in a sparsity problem. In the univariate situation, this NP-hard optimization problem is unconstrained and may be addressed by iterative hard thresholding algorithms for penalizations, analyzed by Blumensath and Davies in [9, 10]. (Note that related algorithms based on iterative soft thresholding for penalized problems have been considered by Daubechies, Defrise, and De Mol in [27].) Artina et al. [3] in particular consider the multivariate discrete Mumford-Shah model using the pointwise penalization approach of [32]. In the multivariate setting, this results in a corresponding nonconvex and nonsmooth problem with linear constraints. The authors successively minimize local quadratic and strictly convex perturbations (depending on the previous iterate) of a (fixed) smoothed version of the objective by augmented Lagrangian iterations which themselves can be accomplished by iterative thresholding via a Lipschitz continuous thresholding function. They show that the accumulation points of the sequences produced by their algorithm are constraint critical points of the smoothed problem. In the multivariate situation, a similar approach for rewriting the Potts problem results in an sparsity problem with additional equality constraints. Algorithmic approaches for such sparsity problem with equality constraints are the penalty decomposition methods of [58, 59, 94]. The connection with iterative hard thresholding is that the inner loop of the employed two stage process usually is of iterative hard thresholding type. The difference of the hard thresholding based methods to our approach in this paper is that we do not have to deal with constraints and the full matrix but with the nonseparable regularizing term instead of its separable counterpart Hence we cannot use hard thresholding.

Another frequently appearing method in the context of restoration of piecewise constant images is total variation minimization [74]. There the jump penalty is replaced by the total variation The arising minimization problem is convex and therefore numerically tractable with convex optimization techniques [20, 25]. Candès, Wakin, and Boyd [16] use iteratively reweighted total variation minimization for piecewise constant recovery problems. Results of compressed sensing type related to the Potts problem have been derived by Needell and Ward [63, 64]: under certain conditions, minimizers of the Potts functional agree with total variation minimizers. However, in the presence of noise, total variation minimizers might significantly differ from minimizers of the Potts problem. But the minimizers of the Potts problem are the results frequently desired in practice. Further, algorithms based on convex relaxations of the Potts problem (1) have gained a lot of interest in recent years; see, e.g., [54, 4, 19, 15, 36, 84].

We next discuss approaches for the multivariate Potts problem for the situation which is particularly interessting in image processing and for which there are some further approaches. The first class of approaches is the approach via graph cuts. Here, the range space of is a priori restricted to a relatively small number of values. The problem remains NP-hard, but it then allows for an approach by sequentially solving binary partitioning problems via minimal graph cut algorithms [12, 51, 11]. Another approach is to limit the number of different values which may take without discretizing the range space a priori. For active contours were used by Chan and Vese [23] to minimize the corresponding binary Potts model. They use a level set function to represent the partitions which evolves according to the Euler-Lagrange equations of the Potts model. A globally convergent strategy for the binary segmentation problem is presented in [22]. The active contour method for was extended to larger in [86]. Note that, for the problem is NP hard. We refer to [26] for an overview on level set segmentation. In [39, 40, 41], Hirschmüller proposes a non-iterative strategy for the Potts problem which is based on cost aggregation. It has lower computational cost, but comes with lower quality reconstructions compared with graph cuts. Due to the small number of potential values of these methods mainly appear in connection with image segmentation. Methods for restoring piecewise constant images without restricting the range space are proposed in Nikolova et al. [67, 66]. They use non-convex regularizers which are algorithmically approached using a graduated non-convexity approach. We note that the Potts problem (1) does not fall into the class of problems considered in [67, 66]. Last but not least, Xu et al. [92] proposed a piecewise constant model reminiscent of the Potts model that is approached by a half-quadratic splitting using a pixelwise iterative thresholding type technique. It was later extended to a method for blind image deconvolution [93].

#### Contributions.

The contributions of this paper are threefold: (i) We propose a new iterative minimization strategy for multivariate piecewise constant Mumford-Shah/Potts functionals as well as a (still NP-hard) quadratic penalty relaxation. (ii) We provide a convergence analysis of the proposed schemes. (iii) We show the applicability of our schemes in several experiments.

Concerning (i), we propose two schemes which are based on majorization-minimization or forward-backward splitting methods of Douglas-Rachford type [55]. The one scheme addresses the Potts problem directly, whereas the other scheme treats a quadratic penalty relaxation. The solutions of the relaxed problem themselves are not feasible for the Potts problem but near to a feasible solution of the Potts problem where nearness can be quantified. In particular, when a given tolerance in applications is acceptable the relaxed scheme is applicable. In contrast to the approaches in [32, 9] and [58, 59] for sparsity problems which lead to thresholding algorithms, our approach leads to non-separable yet computationally tractable problems in the backward step.

Concerning (ii), we first analyze the proposed quadratic penalty relaxation scheme. In particular, we show convergence towards a local minimizer. Due to the NP hardness of the quadratic penalty relaxation, the convergence result is in the range of what can be expected best. Concerning the scheme for the non-relaxed Potts problem we also perform a convergence analysis. In particular, we obtain results on the convergence towards local minimizers on subsequences. The quality of the convergence results is comparable with the ones in [58, 59]. We note that compared with [58, 59] we face the additional challenge to deal with the non-separability of the backward step. (We note that in practice we observe convergence of the whole sequence, not on a subsequence.)

Concerning (iii)

we consider problems with full and partial data. We begin to apply our algorithms to deconvolution problems. In particular, we consider deblurring and denoising Gaussian blur images and motion blur images, respectively. We further consider noisy and undersampled Radon data, together with the task of joint reconstruction, denoising and segmentation. Finally, we use our method in the situation of pure image partitioning (without blur) which is a widely considered problem in computer vision.

#### Organization of the paper.

In Section 2 we derive the proposed algorithmic schemes. In Section 3 we provide a convergence analysis for the proposed schemes. In Section 4 we apply the algorithms derived in the present paper to concrete reconstruction problems. In Section 5 we draw conclusions.

## 2 Majorization-minimization algorithms for multivariate Potts problems

### 2.1 Discretization

We use the following finite difference type discretization of the multivariate Potts problem (1) given by

 Pγ(u)=∥Au−f∥22+γS∑s=1ωs∥∇asu∥0, (2)

where the come from a finite set of directions and the symbol denotes the directional difference with respect to the direction at the pixel . The symbol denotes the number of nonzero entries of

The simplest set of directions consists of the unit vectors

along with unit weights. Unfortunately, when refining the grid, this discretization converges to a limit that measures the boundary in terms of the analogue of the Hausdorff measure [17]. The practical consequences are unwanted block artifacts in the reconstruction (geometric staircasing). More isotropic results are obtained by adding the diagonals to the directions and a near isotropic discretization can be achieved by extending this system by the knight moves (The name is inspired by the possible moves of a knight in chess.) Weights for the system of coordinate directions and diagonal directions can be chosen as for the coordinate part and for diagonal part . When additionally adding knight-move directions, weights for the system can be chosen as for the coordinate part for diagonal part , and for diagonal part General schemes for weight design are proposed in [18] and in [83]. We note that for the system of coordinate directions and diagonal directions the weights of [18] and in [83] coincide; the weights displayed for the knight-move case above are the ones derived by the scheme in [83]. For further details we refer to these references.

We record that the considered problem (2) has a minimizer.

###### Theorem 1.

The discrete multivariate Potts problem (2) has a minimizer.

The validity of Theorem 1 can be seen by following the lines of the proof of [42, Theorem 2.1] where an analogous statement is shown for the (non-piecewise constant) Mumford-Shah problem.

### 2.2 Derivation of the proposed algorithmic schemes

We start out with the discretization (2) of the multivariate Potts problem. We introduce versions of the target and link them via equality constraints in the following consensus form to obtain the problem

 Pγ(u1,…,uS)→min, s.t. u1=…=uS, (3)

where the functional of the variables is given by

 (4)

Note that solving (3) is equivalent to solving the discrete Potts problem (2). Further, note that we have overloaded the symbol which, for one argument denotes the discrete Potts functional of (2) and for arguments denotes the functional of (4); we have the relation .

#### A majorization-minimization approach to the quadratic penalty relaxation of the Potts problem.

The quadratic penalty relaxation of (4) is given by

 Pγ,ρ(u1,…,uS)=S∑s=11S∥Aus−f∥22+γS∑s=1ωs∥∇asus∥0+ρ∑1≤s

Here, the soft constraints which replace the equalities are realized via the squared Euclidean norms where the nonnegative numbers denote weights (which may be set to zero if no direct coupling between the particular is desired.) The symbol denotes a positive penalty parameter promoting the soft constraint, i.e., increasing enforces the to be closer to each other w.r.t. the Euclidean distance. We note that we later analytically quantify the size of which is necessary to obtain an a priori prescribed tolerance in the see (18) below. Frequently, we use the short-hand notation

 ρs,s′=ρ cs,s′. (6)

Typical choices of the are

 ρs,s′=ρ for all s,s′, % orρs,s′=ρ δ(s+1) mod S,s′, (7)

i.e., the constant choice (), as well as the coupling between consecutive variables with constant parameter ( if and only if and otherwise.) We note that in these situations only one additional positive parameter appears, and that this parameter is tied to the tolerance one is willing to accept as a distance of the see Algorithm 1.

For the majorization-minimzation approach, we derive a surrogate functional [27] of the functional of (5). For this purpose, we introduce the block matrix and the vector given by

 B = ⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝S−1/2A0⋯00S−1/2A⋯0⋮⋱⋮00⋯S−1/2A000⋯0S−1/2Aρ1/21,2I−ρ1/21,2I0…00ρ1/21,3I0−ρ1/21,3I…00⋮⋮ρ1/21,SI00…0−ρ1/21,SI0ρ1/22,3I−ρ1/22,3I…00⋮⋮0ρ1/22,SI0…0−ρ1/22,SI⋮⋮000…ρ1/2S−1,SI−ρ1/2S−1,SI⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠,g = ⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝S−1/2fS−1/2f⋮S−1/2fS−1/2f00⋮00⋮0⋮⋮0⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠. (8)

Here

denotes the identity matrix and

the zero matrix; The matrix

has block columns and block rows. Further, we introduce the difference operator given by

 D(u1,…,uS)=⎛⎜ ⎜⎝∇a1u1⋮∇aSuS⎞⎟ ⎟⎠ (9)

which applies the difference w.r.t. the th direction to the th component of We employ the weights to define the quantity which counts the weighted number of jumps by

 ∥D(u1,…,uS)∥0,ω=S∑s=1ωs∥∇asus∥0. (10)

With all this comprehensive notation at hand, we may rewrite the functional of (5) as

 Pγ,ρ(u1,…,uS)=∥∥B(u1,…,uS)T−g∥∥22+γ ∥∥ D(u1,…,uS) ∥∥0,ω. (11)

Using the representation (11), the surrogate functional in the sense of [27] of the functional is given by

 Psurrγ,ρ (u1,…,uS,v1,…,vS)=1L2ρ∥∥B(u1,…,uS)T−g∥∥22+γL2ρ ∥∥ D(u1,…,uS) ∥∥0,ω (12) −1L2ρ∥∥B(u1,…,uS)T−B(v1,…,vS)T∥∥22+∥∥(u1,…,uS)T−(v1,…,vS)T∥∥22.

Here denotes a constant which is chosen larger than the spectral norm of (i.e., the operator norm w.r.t. the norm.) This scaling is made to ensure that is contractive. In terms of and the penalties we require that

 L2ρ>∥A∥22/S+2maxs∈{1,…,S}S∑s′:s′≠sρs,s′. (13)

For the particular choice as on the left-hand side of (7) we can choose smaller, i.e., For only coupling neighboring with the same constant , i.e., the right-hand coupling of (7), we have where if is even, and if

is odd. These choices ensure that

is contractive by Lemma 8. Basics on surrogate functionals as we need them for this paper are gathered in Section 3.4. Further details on surrogate functionals can be found in [27, 9, 10].

Using elementary properties of the inner product shows that

 Psurrγ,ρ(u1,…,uS,v1,…,vS)= ∥∥ ∥∥(u1,…,uS)T−((v1,…,vS)T−1L2ρBT(B(v1,…,vS)T−g))∥∥ ∥∥22 (14)

where is a rest term which is irrelevant when minimizing w.r.t.  for fixed Writing this down in terms of the original system matrix and the data yields

 Psurrγ,ρ (u1,…,uS,v1,…,vS) (15) = S∑s=1⎡⎢⎣∥∥ ∥∥us−⎛⎝vs+1SL2ρA∗f−1SL2ρA∗Avs−∑s≠s′ρs,s′L2ρ(vs−vs′)⎞⎠∥∥ ∥∥22+γωsL2ρ∥∇asus∥0⎤⎥⎦+R(v).

For the quadratic penalty relaxation of the Potts problem, i.e., for minimizing the problem (5) we propose to use the surrogate iteration, i.e. Applied to (15), this surrogate iteration reads

 (u(n+1)1,…,u(n+1)S)∈argminu1,…,uSS∑s=1[∥∥us−h(n)s∥∥22+γωsL2ρ∥∇asus∥0] (16)

where is given by

 h(n)s=u(n)s+1SL2ρA∗f−1SL2ρA∗Au(n)s−∑s′:s′≠sρs,s′L2ρ(u(n)s−u(n)s′), for all % s∈{1,…,S}. (17)

Note that in Section 2.3 below, we derive an efficient algorithm which computes an exact minimizer of (16). Now assume that we are willing to accept a deviation between the which is small, i.e.,

 ∥us−us′∥22=∑i,j|(us)ij−(us′)ij|2<ε2cs,s′, (18)

for and for indices with The following algorithm computes a result fulfilling (18).

###### Algorithm 1.

We consider the quadratic penalty relaxed Potts problem (5) and tolerance for the targets we are willing to accept. We propose the following algorithm for the relaxed Potts problem (5) (which yields a result with targets deviating from each other by at most ).

• Set according to (34), set according to (13) (or, in the special cases of (7), as below (34) and (13).)

• Initialize as discussed in the corresponding paragraph below, (e.g., for all )

• Iterate until convergence:

 1 h(n)s=u(n)s+1SL2ρA∗f−1SL2ρA∗Au(n)s−∑s′:s′≠sρs,s′L2ρ(u(n)s−u(n)s′),s=1,…,S, 2 (19)

We will see in Theorem 4 that this algorithm converges to a local minimizer of the quadratic penalty relaxation (5) and that the are -close, i.e., (18) is fulfilled.

#### The relation between the Potts problem and its quadratic penalty relaxation and obtaining a feasible solution for the Potts problem (4) from the output of Algorithm 1.

As pointed out above, we show in Theorem 4 that Algorithm 1 produces a local minimizer of the quadratic penalty relaxation (5) of the Potts problem (4) and that the corresponding variables of a resulting solution are close up to an a priori prescribed tolerance. This may in practice be already enough. However, strictly speaking a local minimizer of the quadratic penalty relaxation (5) is not feasible for the Potts problem (4).

We will now explain a projection procedure to derive a feasible solution for the Potts problem (4) from a local minimizer of (5) with nearby variables (as produced by Algorithm 1.) Related theoretical results are stated as Theorem 5. In particular, we will see that in case the image operator is lower bounded, the projection procedure applied to the output of Algorithm 1 yields a feasible point which is close to a local minimizer of the original Potts problem (4).

In order to explain the averaging procedure, we need some notions on partitionings. Recall that a partitioning consists of a (finite number of) segments which are pairwise disjoint sets of pixel coordinates whose union equals the image domain i.e.,

 (20)

Here, we assume that each segment is connected w.r.t. the neighborhood system in the sense that there is a path connecting any two elements in with steps in

We will need the following proposed notion of a directional partitioning. A directional partition w.r.t. a set of directions consists of a set of (discrete) intervals , where each interval is associated with exactly one of the directions here, an interval associated with the direction has to be of the form where and belongs to the discrete domain. (For each direction , the corresponding intervals form an ordinary partition.) We note that Algorithm 1 which produces output induces a directional partitioning as follows. We observe that each variable is associated with a direction For any we let each (maximal) interval of constance of be an interval in associated with

Each partitioning induces a directional partitioning by letting the intervals of be the stripes with direction obtained from segment for each direction and each segment Furthermore, each directional partitioning induces a partitioning by the following merging process.

###### Definition 2.

We say that pixels are related, in symbols, , if there is a path connecting in the sense that for any consecutive members of the path there is an interval of the directional partitioning containing both

The relation obviously defines an equivalence relation and the corresponding equivalence classes yield a partitioning on We use the symbols

 (21)

to denote the mappings assigning a partitioning a directional partitioning and vice versa, respectively.

As a final preparation we consider a function as produced by Algorithm 1 and a partitioning of and define the following projection to a function by

 πP(u)|Pi=∑x∈Pi∑Ss=1us(x)S #Pi, (22)

where the symbol denotes the number of elements in the segment Hence the projection defined via (22) averages w.r.t. all components of and all members of the segment and so produces a piecewise constant function w.r.t. the partitioning

Using these notions we propose the following projection procedure.

###### Procedure 1 (Projection Procedure).

We consider output of Algorithm 1 together with its induced directional partitioning

• Compute the partitioning induced by the directional partitioning as explained above (21).

• Project to using (22) for the partitioning and return as output.

We notice that when having a partitioning solving the normal equation in the space of functions constant on would be an alternative to the above second step which, however, might be more expensive.

#### A penalty method for the Potts problem based on a majorization-minimization approach for its quadratic penalty relaxation.

Intuitively, increasing the parameters during the iterations should tie the closer together such that the constraint of (3) should be ultimately fulfilled which results in an approach for the initial Potts problem (2). Recall that was defined by (6), where the are nonnegative numbers weighting the constraints. We here increase while leaving the fixed during this process.

###### Algorithm 2.

We consider the Potts problem (3) in variables (which is equivalent to (2) as explained above). We propose the following algorithm for the Potts problem (3).

• Let be a strictly increasing sequence (e.g., with ) and be a strictly decreasing sequence converging to zero (e.g., ) Further, let

 t>2σ−1/21S−1/2∥A∥ ∥f∥, (23)

where

is the smallest non-zero eigenvalue of

with given by (49). For the particular choice of coupling given by the left-hand and right hand side of (7) we let

 t>2S∥A∥ ∥f∥, and t>2(2−2cos(2π/S))−1/2S−1/2∥A∥ ∥f∥, respectively. (24)
• Initialize as discussed in the corresponding paragraph below, (e.g., for all )

• Set   set according to (13) (or, in the special cases of (7), as explained below (13))

• While

 ∥∥u(k,n)s−u(k,n)s′∥∥>tρ√cs,s′, or ∥∥u(k,n)s−u(k,n−1)s∥∥>δLρ (25)

do

 1 h(k,n)s=u(k,n)s+1SL2ρA∗f−1SL2ρA∗Au(k,n)s−∑s′:s′≠sρs,s′L2ρ(u(k,n)s−u(k,n)s′),s=1,…,S, 2 (u(k,n+1)1,…,u(k,n+1)S)∈argminu1,…,uSS∑s=1[∥∥us−h(k,n)s∥∥22+γωsL2ρ∥∇asus∥0], (26)

and set

• Set

 u(k+1)s=u(k+1,0)s=u(k,n)s, (27)

set and let set according to (13) (or, in the special cases of (7), as below (13)) and goto A.

This approach is inspired by [58] which considers quadratic penalty methods in the sparsity context. There, the authors are searching for a solution with only a few nonzero entries. The corresponding prior is separable. In contrast to this work, the present work considers a non-separable prior.

#### Initialization.

Although the initialization of Algorithm 1 and of Algorithm 2 is not relevant for its convergence properties (cf. Section 3), the choice of the initialization influences the final result. (Please note that this also might happen for convex but not strictly convex problems.) We discuss different initialization strategies. The simplest choice is the all-zero initialization Likewise, one can select the right hand side of the normal equations of the underlying least squares problem, that is . A third reasonable choice is the solution of the normal equation itself or an approximation of it. Using an approximation might in particular be reasonable to get a regularized approximation of the normal equation. A possible strategy to obtain such a regularized initialization is to apply a fixed number of Landweber iterations [53] or of the conjugate gradient method to the underlying least square problem. (In our experiments, we initialized Algorithm 1 with the result of 1000 Landweber iterations and Algorithm 2 with .)

### 2.3 A non-iterative algorithm for minimizing the Potts subproblem (16)

Both proposed algorithms require solving the Potts subproblem (16) in the backward step, see (19),(26). We first observe that (16) can be solved for each of the separately. The corresponding minimization problems are of the prototypical form

 argminus:Ω→R∥us−f∥22+γ′s∥∇asu∥0 (28)

with given data , the jump penalty and the direction . As a next step, we see that (28) decomposes into univariate Potts problems for data along the paths in induced by , e.g., for those paths correspond to the rows of and we obtain a minimizer of (28) by determining each of its rows individually. The univariate Potts problem amounts to minimizing

 Pid,1dγ(x)=∥x−g∥22+γ∥∇x∥0→min, (29)

where the data is given by the restriction of to the pixels in of the form i.e., Here the offset is fixed when solving each univariate problem, but varied afterwards to get all lines in the image with direction The target to optimize is denoted by and, in the resulting univariate situation, denotes the number of jumps of .

It is well-known that the univariate direct problem (29) has a unique minimizer. Further these particular problems can be solved exactly by dynamic programming [34, 90, 17, 60, 61] which we briefly describe in the following. For further details we refer to [34, 80]. Assume we have computed minimizers of (29) for partial data for each , . Then the minimum value of (29) for can be found by

 Pid,1dγ(xr+1)=minl=1,...,r+1Pid,1dγ(xl−1)+γ+El:r+1, (30)

where we let be the empty vector, and be the quadratic deviation of from its mean. By denoting the minimizing argument in (30) by the minimizer is given by

 xr+1=(xl∗−1,μ[l∗,r],...,μ[l∗,r]), (31)

where is the mean value of . Thus, we obtain a minimizer for full data by successively computing for each

. By precomputing the first and second moments of data

and storing only jump locations the described method can be implemented in , [34]. Another way to achieve

is based on the QR decomposition of the design matrix by means of Givens rotations, see

[80]. Furthermore, the search space can be pruned to speed up computations [46, 81].

## 3 Analysis

### 3.1 Analytic results

In the course of the derivation of the proposed algorithms above, we consider the quadratic penalty relaxation (5) of the multivariate Potts problem. Although it is more straight-forward to access algorithmically via our approach, we first note that this problem is still NP hard (as is the original problem).

###### Theorem 3.

Finding a (global) minimizer of the quadratic penalty relaxation (5) of the multivariate Potts problem is an NP hard problem.

The proof is given in Section 3.3 below. In Section 2.2 we have proposed Algorithm 1 to approach the quadratic penalty relaxation of the multivariate Potts problem. We show that the proposed algorithm converges to a local minimizer and that a feasible point of the original multivariate Potts problem is nearby.

###### Theorem 4.

We consider the iterative Potts minimization Algorithm 1 for the quadratic penalty relaxation (5) of the multivariate Potts problem.

1. Algorithm 1 computes a local minimizer of the quadratic penalty relaxation (5) of the multivariate Potts problem for any starting point. The convergence rate is linear.

2. We have the following relation between local minimizers , global minimizers and the fixed points of the iteration of Algorithm 1,

 G⊂Fix(I)⊂L. (32)
3. Assume a tolerance we are willing to accept for the distance between the i.e.,

 ∑s,s′cs,s′∥us−us′∥22=∑s,s′cs,s′∑i,j|(us)ij−(us′)ij|2≤ε2. (33)

Running Algorithm 1 with the choice of the parameter by

 ρ>2ε−1 σ−1/21S−1/2∥A∥∥f∥ (34)

(where is the smallest non-zero eigenvalue of with given by (49); for the particular choice of the coupling given by (7), and respectively) yields a local minimizer of the quadratic penalty relaxation (5) such that the are close up to i.e., (33) is fulfilled.

The proof is given in Section 3.5 below. A solution of Algorithm 1 is not a feasible point for the initial Potts problem (3). However, we see below that it produces a -approximative solution in the sense that there is and a partitioning such that

 ∑s,s′cs,s′∥u∗s−u∗s′∥22<δ, and L(μ∗)<δ, (35)

where is given by (53) below. In this context note that the conditions for a local minimizer are given by and the Lagrange multiplier condition So (35) intuitively means that both the constraint and the Lagrange multiplier condition are approximately fulfilled for the partitioning induced by .

Further, given a solution of Algorithm 1 we find a feasible point for the Potts problem (3) (or, equivalently,(2)) which is nearby as detailed in the following theorem.

###### Theorem 5.

We consider the iterative Potts minimization Algorithm 1 for the quadratic penalty relaxation (5) in connection with the (non-relaxed) Potts problem (3).

1. Algorithm 1 produces an approximative solution in the sense of (35) of the Potts problem (3).

2. The projection procedure (Procedure 1) proposed in Section 2.2 applied to the solution of Algorithm 1 produces a feasible image (together with a valid partitioning) for the Potts problem (3) which is close to in the sense that

 ∥u′s−^u∥≤C1εfor % alls∈{1,…,S}, (36)

where quantifies the deviation between the Here where the symbol denotes the number of elements in If the imaging operator is lower bounded, i.e., there is a constant such that , a local minimizer of the Potts problem (3) is nearby, i.e.,

 ∥u∗−^u∥≤√ηc (37)

where

 η:=(∥A∥2εC21+2∥A∥C1∥f∥2)ε. (38)

The proof of Theorem 5 can be found at the end of Section 3.4, where most relevant statements are already shown in Section 3.3. Theorem 5 theoretically underpins the fact that, on the application side, we may use Algorithm 1 for the Potts problem (3) (accepting some arbitrary small tolerance we may fix in advance).

In addition, in Section 2.2, we have proposed Algorithm 2 to approach the Potts problem (3). We first show that Algorithm 2 is well-defined.

###### Theorem 6.

Algorithm 2 is well-defined in the sense that the inner iteration governed by (25) terminates, i.e., for any there is such that the termination criterium given by (25) holds.

The proof of Theorem 6 is given in Section 3.6. Concerning the convergence properties of Algorithm 2 we obtain the following results.

###### Theorem 7.

We consider the iterative Potts minimization algorithm (Algorithm 2) for the Potts problem (3).

• Any cluster point of the sequence