Sublabel-Accurate Discretization of Nonconvex Free-Discontinuity Problems

11/21/2016 ∙ by Thomas Möllenhoff, et al. ∙ Technische Universität München 0

In this work we show how sublabel-accurate multilabeling approaches can be derived by approximating a classical label-continuous convex relaxation of nonconvex free-discontinuity problems. This insight allows to extend these sublabel-accurate approaches from total variation to general convex and nonconvex regularizations. Furthermore, it leads to a systematic approach to the discretization of continuous convex relaxations. We study the relationship to existing discretizations and to discrete-continuous MRFs. Finally, we apply the proposed approach to obtain a sublabel-accurate and convex solution to the vectorial Mumford-Shah functional and show in several experiments that it leads to more precise solutions using fewer labels.

READ FULL TEXT VIEW PDF

Authors

page 1

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 A class of continuous optimization problems

Many tasks particularly in low-level computer vision can be formulated as optimization problems over mappings

between sets and . The energy functional is usually designed in such a way that the minimizing argument corresponds to a mapping with the desired solution properties. In classical discrete Markov random field (MRF) approaches, which we refer to as fully discrete optimization, is typically a set of nodes (e.g., pixels or superpixels) and a set of labels .

However, in many problems such as image denoising, stereo matching or optical flow where is naturally modeled as a continuum, this discretization into labels can entail unreasonably high demands in memory when using a fine sampling, or it leads to a strong label bias when using a coarser sampling, see Figure 3

. Furthermore, as jump discontinuities are ubiquitous in low-level vision (e.g., caused by object edges, occlusion boundaries, changes in albedo, shadows, etc.), it is important to model them in a meaningful manner. By restricting either

or to a discrete set, one loses the ability to mathematically distinguish between continuous and discontinuous mappings.

(a)
(b)
Figure 3: The classical way to discretize continuous convex relaxations such as the vectorial Mumford-Shah functional [26] leads to solutions (b), top-left) with a strong bias towards the chosen labels (here an equidistant sampling of the RGB space). This can be seen in the bottom left part of the image, where the green color is truncated to the nearest label which is gray. The proposed sublabel-accurate approximation of the continuous relaxation leads to bias-free solutions (b), bottom-right).

Motivated by these two points we consider fully-continuous optimization approaches, where the idea is to postpone the discretization of and as long as possible. The prototypical class of continuous optimization problems which we consider in this work are nonconvex free-discontinuity problems, inspired by the celebrated Mumford-Shah functional [4, 19]:

(1)

The first integral is defined on the region where is continuous. The integrand can be thought of as a combined data term and regularizer, where the regularizer can penalize variations in terms of the (weak) gradient . The second integral is defined on the -dimensional discontinuity set and penalizes jumps from to in unit direction . The appropriate function space for (1) are the special functions of bounded variation. These are functions of bounded variation (cf. Section 2 for a defintion) whose distributional derivative can be decomposed into a continuous part and a jump part in the spirit of (1):

(2)

where denotes the -dimensional Lebesgue measure and the -dimensional Hausdorff measure restricted to the jump set . For an introduction to functions of bounded variation and the study of existence of minimizers to (1) we refer the interested reader to [2].

Note that due to the possible nonconvexity of in the first two variables a surprisingly large class of low-level vision problems fits the general framework of (1). While (1) is a difficult nonconvex optimization problem, the state-of-the-art are convex relaxations [1, 6, 9]. We give an overview of the idea behind the convex reformulation in Section 3.

Extensions to the vectorial setting, i.e., , have been studied by Strekalovskiy in various works [12, 26, 27] and recently using the theory of currents by Windheuser and Cremers [29]. The case when is a manifold has been considered by Lellmann [17]. These advances have allowed for a wide range of difficult vectorial and joint optimization problems to be solved within a convex framework.

1.2 Related work

The first practical implementation of (1) was proposed by Pock [20], using a simple finite differencing scheme in both and which has remained the standard way to discretize convex relaxations. This leads to a strong label bias (see Figure 3b), top-left) despite the initially label-continuous formulation.

In the MRF community, a related approach to overcome this label-bias are discrete-continuous models (discrete and continuous ), pioneered by Zach [30, 31]. Most similar to the present work is the approach of Fix and Agarwal [11]

. They derive the discrete-continuous approaches as a discretization of an infinite dimensional dual linear program. Their approach differs from ours, as we start from a different (nonlinear) infinite-dimensional optimization problem and consider a representation of the dual variables which enforces continuity. The recent work of Bach

[3] extends the concept of submodularity from discrete to continuous

along with complexity estimates.

There are also continuous-discrete models, i.e. the range is discretized into labels but is kept continuous [10, 16]. Recently, these spatially continuous multilabeling models have been extended to allow for so-called sublabel accurate solutions [15, 18], i.e., solutions which lie between two labels. These are, however, limited to total variation regularization, due to the separate convexification of data term and regularizer. We show in this work that for general regularizers a joint convex relaxation is crucial.

Finally, while not focus of this work, there are of course also fully-discrete approaches, among many [14, 25, 28], which inspired some of the continuous formulations.

1.3 Contribution

In this work, we propose an approximation strategy for fully-continuous relaxations which retains continuous even after discretization (see Figure 3b), bottom-right). We summarize our contributions as:

  • We generalize the work [18] from total variation to general convex and nonconvex regularization.

  • We prove (see Prop. 2 and Prop. 4) that different approximations to a convex relaxation of (1) give rise to existing relaxations [20] and [18]. We investigate the relationship to discrete-continuous MRFs in Prop. 5.

  • On the example of the vectorial Mumford-Shah functional [26] we show that our framework yields also sublabel-accurate formulations of extensions to (1).

2 Notation and preliminaries

We denote the Iverson bracket as . Indicator functions from convex analysis which take on values and are denoted by . We denote by the convex conjugate of . Let be a bounded open set. For a function its total variation is defined by

(3)

The space of functions of bounded variation, i.e., for which (or equivalently for which the distributional derivative is a finite Radon measure) is denoted by [2]. We write for functions whose distributional derivative admits the decomposition (2). For the rest of this work, we will make the following simplifying assumptions:

  • The Lagrangian in (1) is separable, i.e.,

    (4)

    for possibly nonconvex and regularizers which are convex in .

  • The jump regularizer in (1) is isotropic and induced by a concave function :

    (5)

    with .

  • The range is a compact interval.

3 The convex relaxation

(a)
Figure 5: The central idea behind the convex relaxation for problem (1) is to reformulate the functional in terms of the complete graph of in the product space. This procedure is often referred to as “lifting”, as one lifts the dimensionality of the problem.

In [1, 6, 9] the authors propose a convex relaxation for the problem (1). Their basic idea is to reformulate the energy (1) in terms of the complete graph of , i.e. lifting the problem to one dimension higher as illustrated in Figure 5. The complete graph

is defined as the (measure-theoretic) boundary of the characteristic function of the subgraph

given by:

(6)

Furthermore we denote the inner unit normal to with . It is shown in [1] that for one has

(7)

with constraints on the dual variables given by

(8)
(9)

The functional (7

) can be interpreted as the maximum flux of admissible vector fields

through the cut given by the complete graph . The set can be seen as capacity constraints on the flux field . This is reminiscent to constructions from the discrete optimization community [14]. The constraints (8) correspond to the first integral in (1) and the non-local constraints (9) to the jump penalization.

Using the fact that the distributional derivative of the subgraph indicator function can be written as

(10)

one can rewrite the energy (7) as

(11)

A convex formulation is then obtained by relaxing the set of admissible primal variables to a convex set:

(12)

This set can be thought of as the convex hull of the subgraph functions . The final optimization problem is then a convex-concave saddle point problem given by:

(13)

In dimension one (), this convex relaxation is tight [8, 9]. For global optimality can be guaranteed by means of a thresholding theorem in case [7, 21]. If the primal solution to (13) is binary, the global optimum of (1) can be recovered simply by pointwise thresholding . If is not binary, in the general setting it is not clear how to obtain the global optimal solution from the relaxed solution. An a posteriori optimality bound to the global optimum of (1) for the thresholded solution can be computed by:

(14)

Using that bound, it has been observed that solutions are usually near globally optimal [26]. In the following section, we show how different discretizations of the continuous problem (13) lead to various existing lifting approaches and to generalizations of the recent sublabel-accurate continuous multilabeling approach [18].

4 Sublabel-accurate discretization

(a)
(b)
(c)
Figure 9: Overview of the notation and proposed finite dimensional approximation spaces.
(a)
(b)
Figure 12: Left: piecewise constant dual variables lead to a linear approximation (shown in black) to the original cost function (shown in red). The unaries are determined through min-pooling of the continuous cost in the Voronoi cells around the labels. Right: continuous piecewise linear dual variables convexify the costs on each interval.

4.1 Choice of primal and dual mesh

In order to discretize the relaxation (13), we partition the range into intervals. The individual intervals form a one dimensional simplicial complex (see e.g., [13]), and we have . The points are also referred to as labels. We assume that the labels are equidistantly spaced with label distance . The theory generalizes also to non-uniformly spaced labels, as long as the spacing is homogeneous in . Furthermore, we define and .

The mesh for dual variables is given by dual complex, which is formed by the intervals with nodes . An overview of the notation and the considered finite dimensional approximations is given in Figure 9.

4.2 Representation of the primal variable

As is a discontinuous jump function, we consider a piecewise constant approximation for ,

(15)

see Figure 9a). Due to the boundary conditions in Eq. (12), we set outside of to on the left and on the right. Note that the non-decreasing constraint in is implicitly realized as can be arbitrarily large.

For coefficients we have

(16)

As an example of this representation, consider the approximation of at point shown in Figure 5:

(17)

This leads to the sublabel-accurate representation also considered in [18]. In that work, the representation from the above example (17) encodes a convex combination between the labels and

with interpolation factor

. Here it is motivated from a different perspective: we take a finite dimensional subspace approximation of the infinite dimensional optimization problem (13).

4.3 Representation of the dual variables

4.3.1 Piecewise constant

The simplest discretization of the dual variable is to pick a piecewise constant approximation on the dual intervals as shown in Figure 9b): The functions are given by

(18)

As is a vector field in , the functions vanish outside of . For coefficient functions and we have:

(19)

To avoid notational clutter, we dropped in (19) and will do so also in the following derivations. Note that for we chose the same piecewise constant approximation as for , as we keep the model continuous in , and ultimately discretize it using finite differences in .

Discretization of the constraints

In the following, we will plug in the finite dimensional approximations into the constraints from the set . We start by reformulating the constraints in (8). Taking the infimum over they can be equivalently written as:

(20)

Plugging in the approximation (19) into the above leads to the following constraints for :

(21)

These constraints can be seen as min-pooling of the continuous unary potentials in a symmetric region centered on the label . To see that more easily, assume one-homogeneous regularization so that on its domain. Then two consecutive constraints from (21) can be combined into one where the infimum of is taken over centered the label . This leads to capacity constraints for the flow in vertical direction of the form

(22)

as well as similar constraints on and . The effect of this on a nonconvex energy is shown in Figure 12 on the left. The constraints (21) are convex inequality constraints, which can be implemented using standard proximal optimization methods and orthogonal projections onto the epigraph as described in [21, Section 5.3].

For the second part of the constraint set (9) we insert again the finite-dimensional representation (19) to arrive at:

(23)

where . These are infinitely many constraints, but similar to [18] these can be implemented with finitely many constraints.

Proposition 1.

For concave with , the constraints (23) are equivalent to

(24)
Proof.

Proofs are given in the appendix. ∎

This proposition reveals that only information from the labels enters into the jump regularizer . For we expect all regularizers to behave like the total variation.

Discretization of the energy

For the discretization of the saddle point energy (13) we apply the divergence theorem

(25)

and then discretize the divergence by inserting the piecewise constant representations of and :

(26)

The discretization of the other parts of the divergence are given as the following:

(27)

where the spatial derivatives are ultimately discretized using standard finite differences. It turns out that the above discretization can be related to the one from [20]:

Proposition 2.

For convex one-homogeneous the discretization with piecewise constant and leads to the traditional discretization as proposed in [20], except with min-pooled instead of sampled unaries.

4.3.2 Piecewise linear

As the dual variables in are continuous vector fields, a more faithful approximation is given by a continuous piecewise linear approximation, given for as:

(28)

They are shown in Figure 9c), and we set:

(29)

Note that the piecewise linear dual representation considered by Fix in [11] differs in this point, as they do not ensure a continuous representation. Unlike the proposed approach their approximation does not take a true subspace of the original infinite dimensional function space.

Discretization of the constraints

We start from the reformulation (20) of the original constraints (8). With (29) for and (19) for , we have for :

(30)

While the constraints (30) seem difficult to implement, they can be reformulated in a simpler way involving .

Proposition 3.

The constraints (30) can be equivalently reformulated by introducing additional variables , , where :

(31)

with .

The constraints (31) are implemented by projections onto the epigraphs of and , as they can be written as:

(32)

Epigraphical projections for quadratic and piecewise linear are described in [18]. In Section 5.1 we describe how to implement piecewise quadratic . As the convex conjugate of enters into the constraints, it becomes clear that this discretization only sees the convexified unaries on each interval, see also the right part of Figure 12.

Discretization of the energy

It turns out that the piecewise linear representation of leads to the same discrete bilinear saddle point term as (26). The other term remains unchanged, as we pick the same representation of .

Relation to existing approaches

In the following we point out the relationship of the approximation with piecewise linear to the sublabel-accurate multilabeling approaches [18] and the discrete-continuous MRFs [31].

Proposition 4.

The discretization with piecewise linear and piecewise constant , together with the choice and is equivalent to the relaxation [18].

Thus we extend the relaxation proposed in [18] to more general regularizations. The relaxation [18] was derived starting from a discrete label space and involved a separate relaxation of data term and regularizer. To see this, first note that the convex conjugate of a convex one-homogeneous function is the indicator function of a convex set [23, Corollary 13.2.1]. Then the constraints (8) can be written as

(33)
(34)

where (33) is the data term and (34) the regularizer. This provides an intuition why the separate convex relaxation of data term and regularizer in [18] worked well. However, for general choices of a joint relaxation of data term and regularizer as in (30) is crucial. The next proposition establishes the relationship between the data term from [31] and the one from [18].

Proposition 5.

The data term from [18] (which is in turn a special case of the discretization with piecewise linear ) can be (pointwise) brought into the primal form

(35)

where is a discretized integration operator.

The data term of Zach and Kohli [31] is precisely given by (35) except that the optimization is directly performed on . The variable can be interpreted as 1-sparse indicator of the interval and as a sublabel offset. The constraint connects this representation to the subgraph representation via the operator (see appendix for the definition). For general regularizers , the discretization with piecewise linear differs from [18] as we perform a joint convexification of data term and regularizer and from [31] as we consider the spatially continuous setting. Another important question to ask is which primal formulation is actually optimized after discretization with piecewise linear . In particular the distinction between jump and smooth regularization only makes sense for continuous label spaces, so it is interesting to see what is optimized after discretizing the label space.

Proposition 6.

Let and . The approximation with piecewise linear and piecewise constant of the continuous optimization problem (13) is equivalent to

(36)

where denotes the infimal convolution (cf. [23, Section 5]).

From Proposition 6 we see that the minimal discretization with amounts to approximating problem (1) by globally convexifying the data term. Furthermore, we can see that Mumford-Shah (truncated quadratic) regularization (, ) is approximated by a convex Huber regularizer in case . This is because the infimal convolution between and corresponds to the Huber function. While even for this is a reasonable approximation to the original model (1), we can gradually increase the number of labels to get an increasingly faithful approximation of the original nonconvex problem.

4.3.3 Piecewise quadratic

For piecewise quadratic the main difficulty are the constraints in (20). For piecewise linear the infimum over a linear function plus lead to (minus) the convex conjugate of . Quadratic dual variables lead to so called generalized -conjugates [24, Chapter 11L*, Example 11.66]. Such conjugates were also theoretically considered in the recent work [11] for discrete-continuous MRFs, however an efficient implementation seems challenging. The advantage of this representation would be that one can avoid convexification of the unaries on each interval and thus obtain a tighter approximation. While in principle the resulting constraints could be implemented using techniques from convex algebraic geometry and semi-definite programming [5] we leave this direction open to future work.

5 Implementation and extensions

(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 21: To verify the tightness of the approximation, we optimize a convex problem (quadratic data term with quadratic regularization). The discretization with piecewise linear recovers the exact solution with labels and remains tight (numerically) for all , while the traditional discretization from [21] leads to a strong label bias.
(a)
(b)
(c)
(d)
(e)
Figure 27: Joint segmentation and stereo matching. b), c) Using the proposed discretization we can arrive at smooth solutions using a moderate () discretization of the -dimensional RGB-D label space. d), e) When using such a coarse sampling of the label space, the classical discretization used in [26] leads to a strong label bias. Note that with the proposed approach, a piecewise constant segmentation as in d) could also be obtained by increasing the smoothness parameter.

5.1 Piecewise quadratic unaries

In some applications such as robust fusion of depth maps, the data term has a piecewise quadratic form:

(37)

The intervals on which the above function is a quadratic are formed by the breakpoints . In order to optimize this within our framework, we need to compute the convex conjugate of on the intervals , see Eq. (31). We can write the data term (37) on each as

(38)

where denotes the number of pieces and the intervals are given by the breakpoints and . The convex conjugate is then given by . As the epigraph of the maximum is the intersection of the epigraphs, , the constraints for the data term , can be broken down:

(39)

The projection onto the epigraphs of the are carried out as described in [18]. Such a convexified piecewise quadratic function is shown on the right in Figure 12.

(a)
(b)
(c)
(d)
(e)
(f)
(g)
Figure 35: Denoising of a synthetic piecewise smooth image degraded with Gaussian noise. The standard discretization of the vectorial Mumford-Shah functional shows a strong bias towards the chosen labels (see also Figure 38), while the proposed discretization has no bias and leads to the highest overall peak signal to noise ratio (PSNR).
(a)
(b)
Figure 38: We show a 1D-slice through the resulting image in Figure 35 (with ). The discretization [26] (left) shows a strong bias towards the labels, while the proposed discretization (right) yields a sublabel-accurate solution.

5.2 The vectorial Mumford-Shah functional

Recently, the free-discontinuity problem (1) has been generalized to vectorial functions by Strekalovskiy  [26]. The model they propose is

(40)

which consists of a separable data term and separable regularization on the continuous part. The individual channels are coupled through the jump part regularizer of the joint jump set across all channels. Using the same strategy as in Section 4, applied to the relaxation described in [26, Section 3], a sublabel-accurate representation of the vectorial Mumford-Shah functional can be obtained.

5.3 Numerical solution

We solve the final finite dimensional optimization problem after finite-difference discretization in spatial direction using the primal-dual algorithm [20] implemented in the convex optimization framework prost 111https://github.com/tum-vision/prost.

6 Experiments

6.1 Exactness in the convex case

We validate our discretization in Figure 21 on the convex problem , . The global minimizer of the problem is obtained by solving . For piecewise linear we recover the exact solution using only labels, and remain (experimentally) exact as we increase the number of labels. The discretization from [21] shows a strong label bias due to the piecewise constant dual variable . Even with labels their solution is different from the ground truth energy.

6.2 The vectorial Mumford-Shah functional

Joint depth fusion and segmentation

We consider the problem of joint image segmentation and robust depth fusion from [22] using the vectorial Mumford-Shah functional from Section 5.2. The data term for the depth channel is given by (37), where are the input depth hypotheses, is a depth confidence and

is a truncation parameter to be robust towards outliers. For the segmentation, we use a quadratic difference dataterm in RGB space. For Figure 

27 we computed multiple depth hypotheses on a stereo pair using different matching costs (sum of absolute (gradient) differences, and normalized cross correlation) with varying patch radii ( to ). Even for a moderate label space of we have no label discretization artifacts.

The piecewise linear approximation of the unaries in [26] leads to an almost piecewise constant segmentation of the image. To highlight the sublabel-accuracy of the proposed approach we chose a small smoothness parameter which leads to a piecewise smooth segmentation, but with a higher smoothness term or different choice of unaries a piecewise constant segmentation could also be obtained.

Piecewise-smooth approximations

In Figure 35 we compare the discretizations for the vectorial Mumford-Shah functional. We see that the approach [26] shows strong label bias (see also Figure 38 and 3) while the discretiziation with piecewise linear duals leads to a sublabel-accurate result.

7 Conclusion

We proposed a framework to numerically solve fully-continuous convex relaxations in a sublabel-accurate fashion. The key idea is to implement the dual variables using a piecewise linear approximation. We prove that different choices of approximations for the dual variables give rise to various existing relaxations: in particular piecewise constant duals lead to the traditional lifting [20] (with min-pooling of the unary costs), whereas piecewise linear duals lead to the sublabel lifting that was recently proposed for total variation regularized problems [18]. While the latter method is not easily generalized to other regularizers due to the separate convexification of data term and regularizer, the proposed representation generalizes to arbitrary convex and non-convex regularizers such as the scalar and the vectorial Mumford-Shah problem. The proposed approach provides a systematic technique to derive sublabel-accurate discretizations for continuous convex relaxation approaches, thereby boosting their memory and runtime efficiency for challenging large-scale applications.

Appendix

Proposition 1.

For concave with , the constraints

(41)

are equivalent to

(42)
Proof.

The implication (41) (42) clearly holds. Let us now assume the constraints (42) are fulfilled. First we show that the constraints (41) also hold for and . First, we start with :

(43)

In the same way, it can be shown that for we have:

(44)

We have shown the constraints to hold for and . Finally we show they also hold for :

(45)
(46)

Noticing that (42) is precisely (41) for (as ) completes the proof. ∎

Proposition 2.

For convex one-homogeneous the discretization with piecewise constant and leads to the traditional discretization as proposed in [20], except with min-pooled instead of sampled unaries.

Proof.

The constraints in [20, Eq. 18] have the form

(47)
(48)

with , and . The constraints (48) are equivalent to (42) up to a rescaling of with . For the constraints (47) (cf. [20, Eq. 18]), the unaries are sampled at the labels . The discretization with piecewise constant duals leads to a similar form, except for a min-pooling on dual intervals, :

(49)

The similarity between (49) and (47) becomes more evident by assuming convex one-homogeneous . Then (49) reduces to the following:

(50)

as well as

(51)

Proposition 3.

The constraints

(52)

can be equivalently reformulated by introducing additional variables , , where :

(53)

with .

Proof.

Rewriting the infimum in (52) as minus the convex conjugate of , and multiplying the inequality with the constraints become:

(54)

To show that (54) and (53) are equivalent, we prove that they imply each other. Assume (54) holds. Then without loss of generality set