1 Introduction and Background
1.1 Convex Relaxations of Partitioning Problems
In this work, we will be concerned with a class of variational problems used in image processing and analysis for formulating multiclass image partitioning problems, which are of the form
(1)  
(2)  
(3)  
(4) 
The labeling function assigns to each point in the image domain a label , which is represented by one of the
dimensional unit vectors
. Since it is piecewise constant and therefore cannot be assumed to be differentiable, the problem is formulated as a free discontinuity problem in the space of functions of bounded variation; we refer to Ambrosio2000 for a general overview.The objective function consists of a data term and a regularizer. The data term is given in terms of the function , and assigns to the choice the “penalty” , in the sense that
(5) 
where is the class region for label , i.e., the set of points that are assigned the th label. The data term generally depends on the input data – such as color values of a recorded image, depth measurements, or other features – and promotes a good fit of the minimizer to the input data. While it is purely local, there are no further restrictions such as continuity, convexity etc., therefore it covers many interesting applications such as segmentation, multiview reconstruction, stitching, and inpainting Paragios2006 .
1.2 Convex Regularizers
The regularizer is defined by the positively homogeneous, continuous and convex function acting on the distributional derivative of , and incorporates additional prior knowledge about the “typical” appearance of the desired output. For piecewise constant , it can be seen that the definition in (1) amounts to a weighted penalization of the discontinuities of :
(6)  
where is the jump set of , i.e., the set of points where has welldefined righthand and lefthand limits and and (in an infinitesimal sense) jumps between the values
across a hyperplane with normal
, (see Ambrosio2000 for the precise definitions).A particular case is to set , i.e., the scaled Frobenius norm. In this case is just the (scaled) total variation of , and, since and assume values in and cannot be equal on the jump set , it holds that
(7)  
(8) 
Therefore, for the regularizer just amounts to penalizing the total length of the interfaces between class regions as measured by the dimensional Hausdorff measure , which is known as uniform metric or Potts regularizer.
A general regularizer was proposed in Lellmann2011 , based on Chambolle2008 : Given a metric (distance) (not to be confused with the ambient space dimension), define
(9)  
(10)  
It was then shown that
(11) 
therefore in view of (1.2) the corresponding regularizer is nonuniform: the boundary between the class regions and is penalized by its length, multiplied by the weight depending on the labels of both regions.
However, even for the comparably simple regularizer (7), the model (1) is a (spatially continuous) combinatorial problem due to the integral nature of the constraint set , therefore optimization is nontrivial. In the context of multiclass image partitioning, a first approach can be found in Lysaker2006 , where the problem was posed in a level setformulation in terms of a labeling function , which is subsequently relaxed to . Then is replaced by polynomials in , which coincide with the indicator functions for the case where assumes integral values. However, the numerical approach involves several nonlinearities and requires to solve a sequence of nontrivial subproblems.
In contrast, representation (1) directly suggests a more straightforward relaxation to a convex problem: replace by its convex hull, which is just the unit simplex in dimensions,
and solve the relaxed problem
(13)  
(14)  
(15) 
Sparked by a series of papers Zach2008 ; Chambolle2008 ; Lellmann2009 , recently there has been much interest in problems of this form, since they – although generally nonsmooth – are convex and therefore can be solved to global optimality, e.g., using primaldual techniques. The approach has proven useful for a wide range of applications Kolev2009 ; Goldstein2009a ; Delaunoy2009 ; Yuan2010 .
1.3 FiniteDimensional vs. Continuous Approaches
Many of these applications have been tackled before in a finitedimensional setting, where they can be formulated as combinatorial problems on a grid graph, and solved using combinatorial optimization methods such as expansion and related integer linear programming (ILP) methods Boykov2001 ; Komodakis2007 . These methods have been shown to yield an integral labeling with the a priori bound
(16) 
where is the (unknown) solution of the integral problem (1). They therefore permit to compute a suboptimal solution to the – originally NPhard Boykov2001 – combinatorial problem with an upper bound on the objective. No such bound is yet available for methods based on the spatially continuous problem (13).
Despite these strong theoretical and practical results available for the finitedimensional combinatorial energies, the functionbased, spatially continuous formulation (1) has several unique advantages:

The energy (1) is truly isotropic, in the sense that for a proper choice of it is invariant under rotation of the coordinate system. Pursuing finitedimensional “discretizefirst” approaches generally introduces artifacts due to the inherent anisotropy, which can only be avoided by increasing the neighborhood size, thereby reducing sparsity and severely slowing down the graph cutbased methods.
In contrast, properly discretizing the relaxed problem (13) and solving it as a convex problem with subsequent thresholding yields much better results without compromising sparsity (Fig. 1 and 2, Klodt2008 )
.
This can be attributed to the fact that solving the discretized problem as a combinatorial problem in effect discards much of the information about the problem structure that is contained in the nonlinear terms of the discretized objective.

Present combinatorial optimization methods Boykov2001 ; Komodakis2007 are inherently sequential and difficult to parallelize. On the other hand, parallelizing primaldual methods for solving the relaxed problem (13) is straightforward, and GPU implementations have been shown to outperform stateoftheart graph cut methods Zach2008 .

Analyzing the problem in a fully functionalanalytic setting gives valuable insight into the problem structure, and is of theoretical interest in itself.
1.4 Optimality Bounds
However, one possible drawback of the spatially continuous approach is that the solution of the relaxed problem (13) may assume fractional values, i.e., values in . Therefore, in applications that require a true partition of , some rounding process is needed in order to generate an integral labeling . This may increase the objective, and lead to a suboptimal solution of the original problem (1).
The regularizer as defined in (9) enjoys the property that it majorizes all other regularizers that can be written in integral form and satisfy (11). Therefore it is in a sense “optimal”, since it introduces as few fractional solutions as possible. In practice, this forces solutions of the relaxed problem to assume integral values in most points, and rounding is in practice only required in small regions.
However, the rounding step may still increase the objective and generate suboptimal integral solutions. Therefore the question arises whether this approach allows to recover “good” integral solutions of the original problem (1).
In the following, we are concerned with the question whether it is possible to obtain, using the convex relaxation (13), integral solutions with an upper bound on the objective. Specifically, we focus on inequalities of the form
(17) 
for some constant , which provide an upper bound on the objective of the rounded integral solution with respect to the objective of the (unknown) optimal integral solution of (1). Note that generally it is not possible to show that (17) holds for any . The reverse inequality
(18) 
always holds since and is an optimal integral solution. The specific form (17) can be attributed to the alternative interpretation
(19) 
which provides a bound for the relative gap to the optimal objective of the combinatorial problem. Such can be obtained a posteriori by actually computing (or approximating) and a dual feasible point: Assume that a feasible primaldual pair is known, where approximates , and assume that some integral feasible has been obtained from by a rounding process. Then the pair is feasible as well since , and we obtain an a posteriori optimality bound of the form (19) with respect to the optimal integral solution :
(20) 
However, this requires that the the primal and dual objectives and can be accurately evaluated, and requires to compute a minimizer of the problem for the specific input data, which is generally difficult, especially in the spatially continuous formulation.
In contrast, true a priori bounds do not require knowledge of a solution and apply uniformly to all problems of a class, irrespective of the particular input. When considering rounding methods, one generally has to discriminate between

deterministic vs. probabilistic methods, and

spatially discrete (finitedimensional) vs. spatially continuous methods.
Most known a priori approximation results only hold in the finitedimensional setting, and are usually proven using graphbased pairwise formulations. In contrast, we assume an “optimize first” perspective due to the reasons outlined in the introduction. Unfortunately, the proofs for the finitedimensional results often rely on pointwise arguments that cannot directly be transferred to the continuous setting. Deriving similar results for continuous problems therefore requires considerable additional work.
1.5 Contribution and Main Results
In this work we prove that using the regularizer (9), the a priori bound (16) can be carried over to the spatially continuous setting. Preliminary versions of these results with excerpts of the proofs have been announced as conference proceedings Lellmann2011 . We extend these results to provide the exact bound (16), and supply the full proofs.
As the main result, we show that it is possible to construct a rounding method parametrized by a parameter , where is an appropriate parameter space:
(22)  
such that for a suitable probability distribution on
, the following theorem holds for the expectation :Theorem 1.1
Let , , , and let be positively homogeneous, convex and continuous. Assume there exists a lower bound such that, for ,
(23) 
Moreover, assume there exists an upper bound such that, for every ,
(24) 
Then Alg. 1 (see below) generates an integral labeling almost surely, and
(25) 
Note that always holds if both are defined, since (23) implies, for with ,
(26) 
The proof of Thm. 1.1 (Sect. 4) is based on the work of Kleinberg and Tardos Kleinberg1999 , which is set in an LP relaxation framework. However their results are restricted in that they assume a graphbased representation and extensively rely on the finite dimensionality. In contrast, our results hold in the continuous setting without assuming a particular problem discretization.
Theorem 1.1 guarantees that – in a probabilistic sense – the rounding process may only increase the energy in a controlled way, with an upper bound depending on . An immediate consequence is
Corollary 1
Under the conditions of Thm. 1.1, if minimizes over , minimizes over , and denotes the output of Alg. 1 applied to , then
(27) 
Therefore the proposed approach allows to recover, from the solution of the convex relaxed problem (13), an approximate integral solution of the nonconvex original problem (1) with an upper bound on the objective.
In particular, for the tight relaxation of the regularizer as in (9), we obtain (cf. Prop. 10)
(28) 
which is exactly the same bound as has been proven for the combinatorial expansion method (16).
To our knowledge, this is the first bound available for the fully spatially convex relaxed problem (13). Related is the work of Olsson et al. Olsson2009 ; Olsson2009a , where the authors consider a continuous analogue to the expansion method known as continuous binary fusion Trobin2008 , and claim that a bound similar to (16) holds for the corresponding fixed points when using the separable regularizer
(29) 
for some , which implements an anisotropic variant of the uniform metric. However, a rigorous proof in the BV framework was not given.
In Bae2011 , the authors propose to solve the problem (1) by considering the dual problem to (13) consisting of coupled maximumflow problems, which are solved using a logsumexp smoothing technique and gradient descent. In case the dual solution allows to unambiguously recover an (integral) primal solution, the latter is necessarily the unique minimizer of , and therefore a global integral minimizer of the combinatorial problem (1). This provides an a posteriori bound, which applies if a dual solution can be computed. While useful in practice as a certificate for global optimality, in the spatially continuous setting it requires explicit knowledge of a dual solution, which is rarely available since it depends on the regularizer as well as the input data .
In contrast, the a priori bound (27) holds uniformly over all problem instances, does not require knowledge of any primal or dual solutions and covers also nonuniform regularizers.
2 A Probabilistic View of the Coarea Formula
2.1 The TwoClass Case
As a motivation for the following sections, we first provide a probabilistic interpretation of a tool often used in geometric measure theory, the coarea formula (cf. Ambrosio2000 ). Assuming and for a.e. , the coarea formula states that the total variation of can be represented by summing the boundary lengths of its superlevelsets:
(30) 
Here
denotes the characteristic function of a set
, i.e., iff and otherwise. The coarea formula provides a connection between problem (1) and the relaxation (13) in the twoclass case, where , and : As noted in Lellmann2009a ,(31) 
therefore the coarea formula (30) can be rewritten as
(32)  
(33)  
(35)  
Consequently, the total variation of can be expressed as the mean over the total variations of a set of integral labelings , obtained by rounding at different thresholds . We now adopt a probabilistic view of (35): We regard the mapping
(36) 
as a parametrized, deterministic rounding algorithm that depends on and on an additional parameter . From this we obtain a probabilistic (randomized) rounding algorithm by assuming
to be a uniformly distributed random variable. Under these assumptions the coarea formula (
35) can be written as(37) 
This has the probabilistic interpretation that applying the probabilistic rounding to (arbitrary, but fixed) does – in a probabilistic sense, i.e., in the mean – not change the objective. It can be shown that this property extends to the full functional in (13): In the twoclass case, the “coarealike” property
(38) 
holds. Functions with property (38) are also known as levelable functions Darbon2006 ; Darbon2006a or discrete total variations Chambolle2009 and have been studied in Strandmark2009 . A wellknown implication is that if , i.e., minimizes the relaxed problem (13), then in the twoclass case almost every is an integral minimizer of the original problem (1), i.e., the optimality bound (17) holds with Nikolova2006 .
2.2 The MultiClass Case and Generalized Coarea Formulas
Generalizing these observations to more than two labels hinges on a property similar to (38) that holds for vectorvalued . In a general setting, the question is whether there exist

a probability space , and

a parametrized rounding method, i.e., for almost every :
(40) satisfying for all ,
such that a “multiclass coarealike property” (or generalized coarea formula)
(41) 
holds. In a probabilistic sense this corresponds to
(42) 
For and , (37) shows that (42) holds with , , , and as defined in (36). Unfortunately, property (37) is intrinsically restricted to the twoclass case with regularizer.
In the multiclass case, the difficulty lies in providing a suitable combination of a probability space and a parametrized rounding step . Unfortunately, obtaining a relation such as (37) for the full functional (1) is unlikely, as it would mean that solutions to the (after discretization) NPhard problem (1) could be obtained by solving the convex relaxation (13) and subsequent rounding, which can be achieved in polynomial time.
Therefore we restrict ourselves to an approximate variant of the generalized coarea formula:
(43) 
While (43) is not sufficient to provide a bound on for particular , it permits a probabilistic bound: for any minimizer of the relaxed problem (13), eq. (43) implies
(44) 
i.e., the ratio between the objective of the rounded relaxed solution and the optimal integral solution is bounded – in a probabilistic sense – by .
In the following sections we construct a suitable parametrized rounding method and probability space in order to obtain an approximate generalized coarea formula of the form (43).
3 Probabilistic Rounding for Multiclass Image Partitions
3.1 Approach
We consider the probabilistic rounding approach based on Kleinberg1999 as defined in Alg. 1.
The algorithm proceeds in a number of phases. At each iteration, a label and a threshold
are randomly chosen (step 3), and label is assigned to all yet unassigned points where holds (step 5). In contrast to the twoclass case considered above, the randomness is provided by a sequence of uniformly distributed random variables, i.e., .
After iteration , all points in the set are still unassigned, while all points in have been assigned an (integral) label in iteration or in a previous iteration. Iteration potentially modifies points only in the set . The variable stores the lowest threshold chosen for label up to and including iteration , and is only required for the proofs.
While the algorithm is defined using pointwise operations, it is welldefined in the sense that for fixed , the sequence , viewed as elements in , does not depend on the specific representative of in its equivalence class in . The sequences and depend on the representative, but are unique up to negligible sets.
In an actual implementation, the algorithm could be terminated as soon as all points in have been assigned a label, i.e., . However, in our framework used for analysis the algorithm never terminates explicitly. Instead, for fixed input we regard the algorithm as a mapping between sequences of parameters (or instances of random variables) and sequences of states , and . We drop the subscript if it does not create ambiguities. The elements of the sequence are independently uniformly distributed, and by the Kolmogorov extension theorem (Oksendal2003, , Thm. 2.1.5) there exists a probability space and a stochastic process on the set of sequences with compatible marginal distributions.
In order to define the parametrized rounding step , we observe that once occurs for some , the sequence becomes stationary at . In this case the algorithm may be terminated, with output :
Definition 1
Let and . For some , if in Alg. 1 for some , we denote . We define
(45)  
We denote by the corresponding random variable induced by assuming to be uniformly distributed on .
As indicated above, is welldefined: if for some then for all . Instead of focusing on local properties of the random sequence as in the proofs for the finitedimensional case, we derive our results directly for the sequence . In particular, we show that the expectation of over all sequences can be bounded according to
(46) 
for some , cf. (43). Consequently, the rounding process may only increase the average objective in a controlled way.
3.2 Termination Properties
Theoretically, the algorithm may produce a sequence that does not become stationary, or becomes stationary with a solution that is not an element of . In Thm. 3.1 below we show that this happens only with zero probability, i.e., almost surely Alg. 1 generates (in a finite number of iterations) an integral labeling function . The following two propositions are required for the proof.
Proposition 1
For the sequence generated by Algorithm 1,
(47)  
holds. In particular,
(48) 
Proof
Denote by the number of such that , i.e., the number of times label was selected up to and including the th step. Then
(49) 
i.e., the probability of a specific instance is
(50) 
Therefore,
(52)  
Since is a sufficient condition for , we may bound the probability according to
(53)  
We now consider the distributions of the components of conditioned on the vector . Given , the probability of is the probability that in each of the steps where label was selected the threshold was randomly chosen to be at least as large as . For , we conclude
(54)  
(55)  
(56) 
The above formulation also covers the case (note that we assumed ). For fixed the distributions of the are independent when conditioned on . Therefore we obtain from (53) and (56)
(58)  
Expanding the product and swapping the summation order, we derive
(61)  
Using the multinomial summation formula, we conclude
(62) 
which proves (47). At the multinomial summation formula was invoked. Note that in (62) the do not occur explicitly anymore. To show the second assertion (48), we use the fact that, for any , can be bounded by . Therefore
(63)  
(64)  
(65) 
which proves (48).
We now show that Alg. 1 generates a sequence in almost surely. The perimeter of a set is defined as the total variation of its characteristic function .
Proposition 2
For the sequences , generated by Alg. 1, define
(66) 
Then
(67) 
If for all , then for all as well. Moreover,
(68) 
i.e., the algorithm almost surely generates a sequence of functions and a sequence of sets of finite perimeter .
Proof
We first show that if for all , then for all as well. For , the assertion holds, since by assumption. For ,
(69) 
Since , and are assumed to have finite perimeter, also has finite perimeter. Applying (Ambrosio2000, , Thm. 3.84) together with the boundedness of and by induction then provides .
We now denote
(70) 
and the event that the first set with nonfinite perimeter is encountered at step by
(71) 
Then
(72) 
As the sets are pairwise disjoint, and due to the countable additivity of the probability measure, we have
(73) 
Now , therefore and . For , we have
(74)  
By the argument from the beginning of the proof, we know that under the condition on the perimeter , therefore from (Ambrosio2000, , Thm. 3.40) we conclude that is finite for
Comments
There are no comments yet.