1 Introduction
1.1 A class of continuous optimization problems
Many tasks particularly in lowlevel computer vision can be formulated as optimization problems over mappings
between sets and . The energy functional is usually designed in such a way that the minimizing argument corresponds to a mapping with the desired solution properties. In classical discrete Markov random field (MRF) approaches, which we refer to as fully discrete optimization, is typically a set of nodes (e.g., pixels or superpixels) and a set of labels .However, in many problems such as image denoising, stereo matching or optical flow where is naturally modeled as a continuum, this discretization into labels can entail unreasonably high demands in memory when using a fine sampling, or it leads to a strong label bias when using a coarser sampling, see Figure 3
. Furthermore, as jump discontinuities are ubiquitous in lowlevel vision (e.g., caused by object edges, occlusion boundaries, changes in albedo, shadows, etc.), it is important to model them in a meaningful manner. By restricting either
or to a discrete set, one loses the ability to mathematically distinguish between continuous and discontinuous mappings.Motivated by these two points we consider fullycontinuous optimization approaches, where the idea is to postpone the discretization of and as long as possible. The prototypical class of continuous optimization problems which we consider in this work are nonconvex freediscontinuity problems, inspired by the celebrated MumfordShah functional [4, 19]:
(1)  
The first integral is defined on the region where is continuous. The integrand can be thought of as a combined data term and regularizer, where the regularizer can penalize variations in terms of the (weak) gradient . The second integral is defined on the dimensional discontinuity set and penalizes jumps from to in unit direction . The appropriate function space for (1) are the special functions of bounded variation. These are functions of bounded variation (cf. Section 2 for a defintion) whose distributional derivative can be decomposed into a continuous part and a jump part in the spirit of (1):
(2) 
where denotes the dimensional Lebesgue measure and the dimensional Hausdorff measure restricted to the jump set . For an introduction to functions of bounded variation and the study of existence of minimizers to (1) we refer the interested reader to [2].
Note that due to the possible nonconvexity of in the first two variables a surprisingly large class of lowlevel vision problems fits the general framework of (1). While (1) is a difficult nonconvex optimization problem, the stateoftheart are convex relaxations [1, 6, 9]. We give an overview of the idea behind the convex reformulation in Section 3.
Extensions to the vectorial setting, i.e., , have been studied by Strekalovskiy in various works [12, 26, 27] and recently using the theory of currents by Windheuser and Cremers [29]. The case when is a manifold has been considered by Lellmann [17]. These advances have allowed for a wide range of difficult vectorial and joint optimization problems to be solved within a convex framework.
1.2 Related work
The first practical implementation of (1) was proposed by Pock [20], using a simple finite differencing scheme in both and which has remained the standard way to discretize convex relaxations. This leads to a strong label bias (see Figure 3b), topleft) despite the initially labelcontinuous formulation.
In the MRF community, a related approach to overcome this labelbias are discretecontinuous models (discrete and continuous ), pioneered by Zach [30, 31]. Most similar to the present work is the approach of Fix and Agarwal [11]
. They derive the discretecontinuous approaches as a discretization of an infinite dimensional dual linear program. Their approach differs from ours, as we start from a different (nonlinear) infinitedimensional optimization problem and consider a representation of the dual variables which enforces continuity. The recent work of Bach
[3] extends the concept of submodularity from discrete to continuousalong with complexity estimates.
There are also continuousdiscrete models, i.e. the range is discretized into labels but is kept continuous [10, 16]. Recently, these spatially continuous multilabeling models have been extended to allow for socalled sublabel accurate solutions [15, 18], i.e., solutions which lie between two labels. These are, however, limited to total variation regularization, due to the separate convexification of data term and regularizer. We show in this work that for general regularizers a joint convex relaxation is crucial.
1.3 Contribution
2 Notation and preliminaries
We denote the Iverson bracket as . Indicator functions from convex analysis which take on values and are denoted by . We denote by the convex conjugate of . Let be a bounded open set. For a function its total variation is defined by
(3) 
The space of functions of bounded variation, i.e., for which (or equivalently for which the distributional derivative is a finite Radon measure) is denoted by [2]. We write for functions whose distributional derivative admits the decomposition (2). For the rest of this work, we will make the following simplifying assumptions:

The Lagrangian in (1) is separable, i.e.,
(4) for possibly nonconvex and regularizers which are convex in .

The range is a compact interval.
3 The convex relaxation
In [1, 6, 9] the authors propose a convex relaxation for the problem (1). Their basic idea is to reformulate the energy (1) in terms of the complete graph of , i.e. lifting the problem to one dimension higher as illustrated in Figure 5. The complete graph
is defined as the (measuretheoretic) boundary of the characteristic function of the subgraph
given by:(6) 
Furthermore we denote the inner unit normal to with . It is shown in [1] that for one has
(7) 
with constraints on the dual variables given by
(8)  
(9) 
The functional (7
) can be interpreted as the maximum flux of admissible vector fields
through the cut given by the complete graph . The set can be seen as capacity constraints on the flux field . This is reminiscent to constructions from the discrete optimization community [14]. The constraints (8) correspond to the first integral in (1) and the nonlocal constraints (9) to the jump penalization.Using the fact that the distributional derivative of the subgraph indicator function can be written as
(10) 
one can rewrite the energy (7) as
(11) 
A convex formulation is then obtained by relaxing the set of admissible primal variables to a convex set:
(12)  
This set can be thought of as the convex hull of the subgraph functions . The final optimization problem is then a convexconcave saddle point problem given by:
(13) 
In dimension one (), this convex relaxation is tight [8, 9]. For global optimality can be guaranteed by means of a thresholding theorem in case [7, 21]. If the primal solution to (13) is binary, the global optimum of (1) can be recovered simply by pointwise thresholding . If is not binary, in the general setting it is not clear how to obtain the global optimal solution from the relaxed solution. An a posteriori optimality bound to the global optimum of (1) for the thresholded solution can be computed by:
(14) 
Using that bound, it has been observed that solutions are usually near globally optimal [26]. In the following section, we show how different discretizations of the continuous problem (13) lead to various existing lifting approaches and to generalizations of the recent sublabelaccurate continuous multilabeling approach [18].
4 Sublabelaccurate discretization





4.1 Choice of primal and dual mesh
In order to discretize the relaxation (13), we partition the range into intervals. The individual intervals form a one dimensional simplicial complex (see e.g., [13]), and we have . The points are also referred to as labels. We assume that the labels are equidistantly spaced with label distance . The theory generalizes also to nonuniformly spaced labels, as long as the spacing is homogeneous in . Furthermore, we define and .
The mesh for dual variables is given by dual complex, which is formed by the intervals with nodes . An overview of the notation and the considered finite dimensional approximations is given in Figure 9.
4.2 Representation of the primal variable
As is a discontinuous jump function, we consider a piecewise constant approximation for ,
(15) 
see Figure 9a). Due to the boundary conditions in Eq. (12), we set outside of to on the left and on the right. Note that the nondecreasing constraint in is implicitly realized as can be arbitrarily large.
For coefficients we have
(16) 
As an example of this representation, consider the approximation of at point shown in Figure 5:
(17)  
This leads to the sublabelaccurate representation also considered in [18]. In that work, the representation from the above example (17) encodes a convex combination between the labels and
with interpolation factor
. Here it is motivated from a different perspective: we take a finite dimensional subspace approximation of the infinite dimensional optimization problem (13).4.3 Representation of the dual variables
4.3.1 Piecewise constant
The simplest discretization of the dual variable is to pick a piecewise constant approximation on the dual intervals as shown in Figure 9b): The functions are given by
(18) 
As is a vector field in , the functions vanish outside of . For coefficient functions and we have:
(19) 
To avoid notational clutter, we dropped in (19) and will do so also in the following derivations. Note that for we chose the same piecewise constant approximation as for , as we keep the model continuous in , and ultimately discretize it using finite differences in .
Discretization of the constraints
In the following, we will plug in the finite dimensional approximations into the constraints from the set . We start by reformulating the constraints in (8). Taking the infimum over they can be equivalently written as:
(20) 
Plugging in the approximation (19) into the above leads to the following constraints for :
(21)  
These constraints can be seen as minpooling of the continuous unary potentials in a symmetric region centered on the label . To see that more easily, assume onehomogeneous regularization so that on its domain. Then two consecutive constraints from (21) can be combined into one where the infimum of is taken over centered the label . This leads to capacity constraints for the flow in vertical direction of the form
(22) 
as well as similar constraints on and . The effect of this on a nonconvex energy is shown in Figure 12 on the left. The constraints (21) are convex inequality constraints, which can be implemented using standard proximal optimization methods and orthogonal projections onto the epigraph as described in [21, Section 5.3].
For the second part of the constraint set (9) we insert again the finitedimensional representation (19) to arrive at:
(23)  
where . These are infinitely many constraints, but similar to [18] these can be implemented with finitely many constraints.
Proposition 1.
For concave with , the constraints (23) are equivalent to
(24) 
Proof.
Proofs are given in the appendix. ∎
This proposition reveals that only information from the labels enters into the jump regularizer . For we expect all regularizers to behave like the total variation.
Discretization of the energy
For the discretization of the saddle point energy (13) we apply the divergence theorem
(25) 
and then discretize the divergence by inserting the piecewise constant representations of and :
(26)  
The discretization of the other parts of the divergence are given as the following:
(27) 
where the spatial derivatives are ultimately discretized using standard finite differences. It turns out that the above discretization can be related to the one from [20]:
Proposition 2.
For convex onehomogeneous the discretization with piecewise constant and leads to the traditional discretization as proposed in [20], except with minpooled instead of sampled unaries.
4.3.2 Piecewise linear
As the dual variables in are continuous vector fields, a more faithful approximation is given by a continuous piecewise linear approximation, given for as:
(28) 
They are shown in Figure 9c), and we set:
(29) 
Note that the piecewise linear dual representation considered by Fix in [11] differs in this point, as they do not ensure a continuous representation. Unlike the proposed approach their approximation does not take a true subspace of the original infinite dimensional function space.
Discretization of the constraints
We start from the reformulation (20) of the original constraints (8). With (29) for and (19) for , we have for :
(30)  
While the constraints (30) seem difficult to implement, they can be reformulated in a simpler way involving .
Proposition 3.
The constraints (30) can be equivalently reformulated by introducing additional variables , , where :
(31)  
with .
The constraints (31) are implemented by projections onto the epigraphs of and , as they can be written as:
(32) 
Epigraphical projections for quadratic and piecewise linear are described in [18]. In Section 5.1 we describe how to implement piecewise quadratic . As the convex conjugate of enters into the constraints, it becomes clear that this discretization only sees the convexified unaries on each interval, see also the right part of Figure 12.
Discretization of the energy
It turns out that the piecewise linear representation of leads to the same discrete bilinear saddle point term as (26). The other term remains unchanged, as we pick the same representation of .
Relation to existing approaches
In the following we point out the relationship of the approximation with piecewise linear to the sublabelaccurate multilabeling approaches [18] and the discretecontinuous MRFs [31].
Proposition 4.
The discretization with piecewise linear and piecewise constant , together with the choice and is equivalent to the relaxation [18].
Thus we extend the relaxation proposed in [18] to more general regularizations. The relaxation [18] was derived starting from a discrete label space and involved a separate relaxation of data term and regularizer. To see this, first note that the convex conjugate of a convex onehomogeneous function is the indicator function of a convex set [23, Corollary 13.2.1]. Then the constraints (8) can be written as
(33)  
(34) 
where (33) is the data term and (34) the regularizer. This provides an intuition why the separate convex relaxation of data term and regularizer in [18] worked well. However, for general choices of a joint relaxation of data term and regularizer as in (30) is crucial. The next proposition establishes the relationship between the data term from [31] and the one from [18].
Proposition 5.
The data term from [18] (which is in turn a special case of the discretization with piecewise linear ) can be (pointwise) brought into the primal form
(35) 
where is a discretized integration operator.
The data term of Zach and Kohli [31] is precisely given by (35) except that the optimization is directly performed on . The variable can be interpreted as 1sparse indicator of the interval and as a sublabel offset. The constraint connects this representation to the subgraph representation via the operator (see appendix for the definition). For general regularizers , the discretization with piecewise linear differs from [18] as we perform a joint convexification of data term and regularizer and from [31] as we consider the spatially continuous setting. Another important question to ask is which primal formulation is actually optimized after discretization with piecewise linear . In particular the distinction between jump and smooth regularization only makes sense for continuous label spaces, so it is interesting to see what is optimized after discretizing the label space.
Proposition 6.
From Proposition 6 we see that the minimal discretization with amounts to approximating problem (1) by globally convexifying the data term. Furthermore, we can see that MumfordShah (truncated quadratic) regularization (, ) is approximated by a convex Huber regularizer in case . This is because the infimal convolution between and corresponds to the Huber function. While even for this is a reasonable approximation to the original model (1), we can gradually increase the number of labels to get an increasingly faithful approximation of the original nonconvex problem.
4.3.3 Piecewise quadratic
For piecewise quadratic the main difficulty are the constraints in (20). For piecewise linear the infimum over a linear function plus lead to (minus) the convex conjugate of . Quadratic dual variables lead to so called generalized conjugates [24, Chapter 11L*, Example 11.66]. Such conjugates were also theoretically considered in the recent work [11] for discretecontinuous MRFs, however an efficient implementation seems challenging. The advantage of this representation would be that one can avoid convexification of the unaries on each interval and thus obtain a tighter approximation. While in principle the resulting constraints could be implemented using techniques from convex algebraic geometry and semidefinite programming [5] we leave this direction open to future work.
5 Implementation and extensions













5.1 Piecewise quadratic unaries
In some applications such as robust fusion of depth maps, the data term has a piecewise quadratic form:
(37) 
The intervals on which the above function is a quadratic are formed by the breakpoints . In order to optimize this within our framework, we need to compute the convex conjugate of on the intervals , see Eq. (31). We can write the data term (37) on each as
(38) 
where denotes the number of pieces and the intervals are given by the breakpoints and . The convex conjugate is then given by . As the epigraph of the maximum is the intersection of the epigraphs, , the constraints for the data term , can be broken down:
(39) 
The projection onto the epigraphs of the are carried out as described in [18]. Such a convexified piecewise quadratic function is shown on the right in Figure 12.







5.2 The vectorial MumfordShah functional
Recently, the freediscontinuity problem (1) has been generalized to vectorial functions by Strekalovskiy [26]. The model they propose is
(40) 
which consists of a separable data term and separable regularization on the continuous part. The individual channels are coupled through the jump part regularizer of the joint jump set across all channels. Using the same strategy as in Section 4, applied to the relaxation described in [26, Section 3], a sublabelaccurate representation of the vectorial MumfordShah functional can be obtained.
5.3 Numerical solution
We solve the final finite dimensional optimization problem after finitedifference discretization in spatial direction using the primaldual algorithm [20] implemented in the convex optimization framework prost ^{1}^{1}1https://github.com/tumvision/prost.
6 Experiments
6.1 Exactness in the convex case
We validate our discretization in Figure 21 on the convex problem , . The global minimizer of the problem is obtained by solving . For piecewise linear we recover the exact solution using only labels, and remain (experimentally) exact as we increase the number of labels. The discretization from [21] shows a strong label bias due to the piecewise constant dual variable . Even with labels their solution is different from the ground truth energy.
6.2 The vectorial MumfordShah functional
Joint depth fusion and segmentation
We consider the problem of joint image segmentation and robust depth fusion from [22] using the vectorial MumfordShah functional from Section 5.2. The data term for the depth channel is given by (37), where are the input depth hypotheses, is a depth confidence and
is a truncation parameter to be robust towards outliers. For the segmentation, we use a quadratic difference dataterm in RGB space. For Figure
27 we computed multiple depth hypotheses on a stereo pair using different matching costs (sum of absolute (gradient) differences, and normalized cross correlation) with varying patch radii ( to ). Even for a moderate label space of we have no label discretization artifacts.The piecewise linear approximation of the unaries in [26] leads to an almost piecewise constant segmentation of the image. To highlight the sublabelaccuracy of the proposed approach we chose a small smoothness parameter which leads to a piecewise smooth segmentation, but with a higher smoothness term or different choice of unaries a piecewise constant segmentation could also be obtained.
Piecewisesmooth approximations
7 Conclusion
We proposed a framework to numerically solve fullycontinuous convex relaxations in a sublabelaccurate fashion. The key idea is to implement the dual variables using a piecewise linear approximation. We prove that different choices of approximations for the dual variables give rise to various existing relaxations: in particular piecewise constant duals lead to the traditional lifting [20] (with minpooling of the unary costs), whereas piecewise linear duals lead to the sublabel lifting that was recently proposed for total variation regularized problems [18]. While the latter method is not easily generalized to other regularizers due to the separate convexification of data term and regularizer, the proposed representation generalizes to arbitrary convex and nonconvex regularizers such as the scalar and the vectorial MumfordShah problem. The proposed approach provides a systematic technique to derive sublabelaccurate discretizations for continuous convex relaxation approaches, thereby boosting their memory and runtime efficiency for challenging largescale applications.
Appendix
Proposition 1.
For concave with , the constraints
(41)  
are equivalent to
(42) 
Proof.
The implication (41) (42) clearly holds. Let us now assume the constraints (42) are fulfilled. First we show that the constraints (41) also hold for and . First, we start with :
(43)  
In the same way, it can be shown that for we have:
(44) 
We have shown the constraints to hold for and . Finally we show they also hold for :
(45)  
(46)  
Noticing that (42) is precisely (41) for (as ) completes the proof. ∎
Proposition 2.
For convex onehomogeneous the discretization with piecewise constant and leads to the traditional discretization as proposed in [20], except with minpooled instead of sampled unaries.
Proof.
The constraints in [20, Eq. 18] have the form
(47)  
(48) 
with , and . The constraints (48) are equivalent to (42) up to a rescaling of with . For the constraints (47) (cf. [20, Eq. 18]), the unaries are sampled at the labels . The discretization with piecewise constant duals leads to a similar form, except for a minpooling on dual intervals, :
(49)  
The similarity between (49) and (47) becomes more evident by assuming convex onehomogeneous . Then (49) reduces to the following:
(50)  
as well as
(51) 
∎
Proposition 3.
The constraints
(52)  
can be equivalently reformulated by introducing additional variables , , where :
(53)  
with .