# Sublabel-Accurate Convex Relaxation of Vectorial Multilabel Energies

Convex relaxations of nonconvex multilabel problems have been demonstrated to produce superior (provably optimal or near-optimal) solutions to a variety of classical computer vision problems. Yet, they are of limited practical use as they require a fine discretization of the label space, entailing a huge demand in memory and runtime. In this work, we propose the first sublabel accurate convex relaxation for vectorial multilabel problems. The key idea is that we approximate the dataterm of the vectorial labeling problem in a piecewise convex (rather than piecewise linear) manner. As a result we have a more faithful approximation of the original cost function that provides a meaningful interpretation for the fractional solutions of the relaxed convex problem. In numerous experiments on large-displacement optical flow estimation and on color image denoising we demonstrate that the computed solutions have superior quality while requiring much lower memory and runtime.

## Authors

• 4 publications
• 10 publications
• 23 publications
• 14 publications
• 107 publications
• ### Sublabel-Accurate Discretization of Nonconvex Free-Discontinuity Problems

In this work we show how sublabel-accurate multilabeling approaches can ...
11/21/2016 ∙ by Thomas Möllenhoff, et al. ∙ 0

• ### Sublabel-Accurate Relaxation of Nonconvex Energies

We propose a novel spatially continuous framework for convex relaxations...
12/04/2015 ∙ by Thomas Möllenhoff, et al. ∙ 0

• ### Fast Convex Relaxations using Graph Discretizations

Matching and partitioning problems are fundamentals of computer vision a...
04/23/2020 ∙ by Jonas Geiping, et al. ∙ 1

• ### Tighter Lifting-Free Convex Relaxations for Quadratic Matching Problems

In this work we study convex relaxations of quadratic optimisation probl...
11/29/2017 ∙ by Florian Bernard, et al. ∙ 0

• ### Noisy Matrix Completion: Understanding Statistical Guarantees for Convex Relaxation via Nonconvex Optimization

This paper studies noisy low-rank matrix completion: given partial and c...
02/20/2019 ∙ by Yuxin Chen, et al. ∙ 0

• ### Semantic 3D Reconstruction with Finite Element Bases

We propose a novel framework for the discretisation of multi-label probl...
10/04/2017 ∙ by Audrey Richard, et al. ∙ 0

• ### The Impacts of Convex Piecewise Linear Cost Formulations on AC Optimal Power Flow

Despite strong connections through shared application areas, research ef...
05/28/2020 ∙ by Carleton Coffrin, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### 1.1 Nonconvex Vectorial Problems

In this paper, we derive a sublabel-accurate convex relaxation for vectorial optimization problems of the form

 (1)

where , and denotes a generally nonconvex pointwise dataterm. As regularization we focus on the total variation defined as:

 TV(u)=supq∈C∞c(Ω,Rn×d),∥q(x)∥S∞≤1 ∫Ω⟨u,Divq⟩ dx, (2)

where is the Schatten- norm on

, i.e., the largest singular value. For differentiable functions

we can integrate (2) by parts to find

 TV(u)=∫Ω∥∇u(x)∥S1 dx, (3)

where the dual norm penalizes the sum of the singular values of the Jacobian, which encourages the individual components of to jump in the same direction. This type of regularization is part of the framework of Sapiro and Ringach [19].

### 1.2 Related Work

Due to its nonconvexity the optimization of (1) is challenging. For the scalar case (), Ishikawa [9] proposed a pioneering technique to obtain globally optimal solutions in a spatially discrete setting, given by the minimum s-t-cut of a graph representing the space . A continuous formulation was introduced by Pock et al. [15] exhibiting several advantages such as less grid bias and parallelizability.

In a series of papers [16, 14], connections of the above approaches were made to the mathematical theory of cartesian currents [6] and the calibration method for the Mumford-Shah functional [1], leading to a generalization of the convex relaxation framework [15] to more general (in particular nonconvex) regularizers.

In the following, researchers have strived to generalize the concept of functional lifting and convex relaxation to the vectorial setting (). If the dataterm and the regularizer are both separable in the label dimension, one can simply apply the above convex relaxation approach in a channel-wise manner to each component separately. But when either the dataterm or the regularizer couple the label components, the situation becomes more complex [8, 20].

The approach which is most closely related to our work, and which we consider as a baseline method, is the one by Lellmann et al. [11]. They consider coupled dataterms with coupled total variation regularization of the form (2).

A drawback shared by all mentioned papers is that ultimately one has to discretize the label space. While Lellmann et al. [11] propose a sublabel-accurate regularizer, we show that their dataterm leads to solutions which still have a strong bias towards the label grid. For the scalar-valued setting, continuous label spaces have been considered in the MRF community by Zach et al. [22] and Fix et al. [5]. The paper [21] proposes a method for mixed continuous and discrete vectorial label spaces, where everything is derived in the spatially discrete MRF setting. Möllenhoff et al. [12] recently proposed a novel formulation of the scalar-valued case which retains fully continuous label spaces even after discretization. The contribution of this work is to extend [12] to vectorial label spaces, thereby complementing [11] with a sublabel-accurate dataterm.

### 1.3 Contribution

In this work we propose the first sublabel-accurate convex formulation of vectorial labeling problems. It generalizes the formulation for scalar-valued labeling problems [12] and thus includes important applications such as optical flow estimation or color image denoising. We show that our method, derived in a spatially continuous setting, has a variety of interesting theoretical properties as well as practical advantages over the existing labeling approaches:

• We generalize existing functional lifting approaches (see Sec. 2.2).

• We show that our method is the best convex under-approximation (in a local sense), see Prop. 1 and Prop. 2.

• Due to its sublabel-accuracy our method requires only a small amount of labels to produce good results which leads to a drastic reduction in memory. We believe that this is a vital step towards the real-time capability of lifting and convex relaxation methods. Moreover, our method eliminates the label bias, that previous lifting methods suffer from, even for many labels.

• In Sec. 2.3 we propose a regularizer that couples the different label components by enforcing a joint jump normal. This is in contrast to [8], where the components are regularized separately.

• For convex dataterms, our method is equivalent to the unlifted problem – see Prop. 4. Therefore, it allows a seamless transition between direct optimization and convex relaxation approaches.

### 1.4 Notation

We write for the standard inner product on or the Frobenius product if are matrices. Similarly without any subscript denotes the usual Euclidean norm, respectively the Frobenius norm for matrices.

We denote the convex conjugate of a function by . It is an important tool for devising convex relaxations, as the biconjugate is the largest lower-semicontinuous (lsc.) convex function below . For the indicator function of a set we write , i.e., if and otherwise. stands for the unit -simplex.

## 2 Convex Formulation

### 2.1 Lifted Representation

Motivated by Fig. 5, we construct an equivalent representation of (1) in a higher dimensional space, before taking the convex envelope.

Let be a compact and convex set. We partition into a set of -simplices so that is a disjoint union of up to a set of measure zero. Let be the -th vertex of and denote by the union of all vertices, referred to as labels, with , and . For , we refer to as a sublabel. Any sublabel can be written as a convex combination of the vertices of a simplex with for appropriate barycentric coordinates :

 u(x)=Tiα:=n+1∑j=1αjtij, Ti:=(ti1, ti2, …, tin+1)∈Rn×n+1. (4)

By encoding the vertices using a one-of- representation we can identify any

with a sparse vector

containing at least many zeros and vice versa:

 u(x)=Eiα:=n+1∑j=1αjeij, Ei:=(ei1, ei2,…, ein+1)∈R|V|×n+1,u(x)=|V|∑k=1tkuk(x), α∈ΔUn, 1≤i≤|T|. (5)

The entries of the vector are zero except for the -th entry, which is equal to one. We refer to as the lifted representation of . This one-to-one-correspondence between and is shown in Fig. 6. Note that both, and depend on . However, for notational convenience we drop the dependence on whenever we consider a fixed point .

### 2.2 Convexifying the Dataterm

Let for now the weight of the regularizer in (1) be zero. Then, at each point we minimize a generally nonconvex energy over a compact set :

 minu∈Γ ρ(u). (6)

We set up the lifted energy so that it attains finite values if and only if the argument is a sparse representation of a sublabel :

 ρ(u)=min1≤i≤|T| ρi(u),ρi(u)={ρ(Tiα),if  u=Eiα, α∈ΔUn,∞,otherwise. (7)

Problems (6) and (7) are equivalent due to the one-to-one correspondence of and . However, energy (7) is finite on a nonconvex set only. In order to make optimization tractable, we minimize its convex envelope.

###### Proposition 1

The convex envelope of (7) is given as:

 ρ∗∗(u)=supv∈R|V|⟨u,v⟩−max1≤i≤|T| ρ∗i(v),ρ∗i(v)=⟨Eibi,v⟩+ρ∗i(A⊤iE⊤iv),ρi:=ρ+δΔi. (8)

and are given as , , where are the columns of the matrix .

###### Proof

Follows from a calculation starting at the definition of . See Appendix 0.A for a detailed derivation.

The geometric intuition of this construction is depicted in Fig. 11. Note that if one prescribes the value of in (7) only on the vertices of the unit simplices , i.e., if and otherwise, one obtains the linear biconjugate on the feasible set. This coincides with the standard relaxation of the dataterm used in [16, 10, 4, 11]. In that sense, our approach can be seen as a relaxing the dataterm in a more precise way, by incorporating the true value of not only on the finite set of labels , but also everywhere in between, i.e., on every sublabel.

### 2.3 Lifting the Vectorial Total Variation

We define the lifted vectorial total variation as

 TV(u)=∫ΩΨ(Du), (9)

where denotes the distributional derivative of and is positively one-homogeneous, i.e., . For such functions, the meaning of (9) can be made fully precise using the polar decomposition of the Radon measure [2, Cor. 1.29, Thm. 2.38]. However, in the following we restrict ourselves to an intuitive motivation for the derivation of for smooth functions.

Our goal is to find so that whenever corresponds to some , in the sense that whenever . In order for the equality to hold, it must in particular hold for all that are classically differentiable, i.e., , and whose Jacobian is of rank 1, i.e., for some . This rank 1 constraint enforces the different components of to have the same jump normal, which is desirable in many applications. In that case, we observe

 TV(u)=∫Ω∥Tiα−Tjβ∥⋅∥ν(x)∥dx. (10)

For the corresponding lifted representation , we have . Therefore it is natural to require in order to achieve the goal . Motivated by these observations, we define

 Ψ(p):={∥Tiα−Tjβ∥⋅∥ν∥if  p=(Eiα−Ejβ)⊗ν,∞otherwise, (11)

where , and . Since the convex envelope of (9) is intractable, we derive a “locally” tight convex underapproximation:

 R(u)=supq:Ω→Rd×|V|∫Ω⟨u,Divq⟩−Ψ∗(q) dx. (12)
###### Proposition 2

The convex conjugate of is

 Ψ∗(q)=δK(q) (13)

with convex set

 K=⋂1≤i,j≤|T|{q∈Rd×|V|∣∣∥Qiα−Qjβ∥≤∥Tiα−Tjβ∥, α,β∈ΔUn+1}, (14)

and . are the columns of .

###### Proof

Follows from a calculation starting at the definition of the convex conjugate . See Appendix 0.A.

Interestingly, although in its original formulation (14) the set has infinitely many constraints, one can equivalently represent by finitely many.

###### Proposition 3

The set in equation (14) is the same as

 K={q∈Rd×|V|∣∥∥Diq∥∥S∞≤1, 1≤i≤|T|}, Diq=QiD(TiD)−1, (15)

where the matrices and are given as

 QiD:=(qi1−qin+1, …, qin−qin+1), TiD:=(ti1−tin+1, …, tin−tin+1).
###### Proof

Similar to the analysis in [11], equation (14) basically states the Lipschitz continuity of a piecewise linear function defined by the matrices . Therefore, one can expect that the Lipschitz constraint is equivalent to a bound on the derivative. For the complete proof, see Appendix 0.A.

### 2.4 Lifting the Overall Optimization Problem

Combining dataterm and regularizer, the overall optimization problem is given

 minu:Ω→R|V|supq:Ω→K∫Ωρ∗∗(u)+⟨u,Divq⟩ dx. (16)

A highly desirable property is that, opposed to any other vectorial lifting approach from the literature, our method with just one simplex applied to a convex problem yields the same solution as the unlifted problem.

###### Proposition 4

If the triangulation contains only 1 simplex, , i.e., , then the proposed optimization problem (16) is equivalent to

 minu:Ω→Δ ∫Ω(ρ+δΔ)∗∗(x,u(x)) dx+λTV(u), (17)

which is (1) with a globally convexified dataterm on .

###### Proof

For the substitution into and yields the result. For a complete proof, see Appendix 0.A.

## 3 Numerical Optimization

### 3.1 Discretization

For now assume that is a -dimensional Cartesian grid and let denote a finite-difference divergence operator with . Then the relaxed energy minimization problem becomes

 minu:Ω→R|V|maxq:Ω→K ∑x∈Ωρ∗∗(x,u(x))+⟨Divq,u⟩. (18)

In order to get rid of the pointwise maximum over in Eq. (8), we introduce additional variables and additional constraints , so that attains the value of the pointwise maximum:

 minu:Ω→R|V|max(v,w):Ω→Cq:Ω→K ∑x∈Ω⟨u(x),v(x)⟩−w(x)+⟨Divq,u⟩, (19)

where the set is given as

 C=⋂1≤i≤|T|Ci,Ci:={(x,y)∈R|V|+1∣ρ∗i(x)≤y}. (20)

For numerical optimization we use a GPU-based implementation of a first-order primal-dual method [14]. The algorithm requires the orthogonal projections of the dual variables onto the sets respectively in every iteration. However, the projection onto an epigraph of dimension is difficult for large values of . We rewrite the constraints , , as -dimensional epigraph constraints introducing variables , :

 ρ∗i(ri(x))≤si(x),ri(x)=A⊤iE⊤iv(x),si(x)=w(x)−⟨Eibi,v(x)⟩. (21)

These equality constraints can be implemented using Lagrange multipliers. For the projection onto the set we use an approach similar to [7, Figure 7].

### 3.2 Epigraphical Projections

Computing the Euclidean projection onto the epigraph of is a central part of the numerical implementation of the presented method. However, for this is nontrivial. Therefore we provide a detailed explanation of the projection methods used for different classes of . We will consider quadratic, truncated quadratic and piecewise linear .

Let be of the form . A direct projection onto the epigraph of for is difficult. However, the epigraph can be decomposed into separate epigraphs for which it is easier to project onto: For proper, convex, lsc. functions the epigraph of is the Minkowski sum of the epigraphs of and (cf. [17, Exercise 1.28, Theorem 11.23a]). This means that it suffices to compute the projections onto the epigraphs of a quadratic function and a convex, piecewise linear function by rewriting constraint (21) as

 ρ∗(rf)≤sf, δΔi∗(cg)≤dg  s.t. (r,s)=(rf,sf)+(cg,dg). (22)

For the projection onto the epigraph of a -dimensional quadratic function we use the method described in [20, Appendix B.2]. The projection onto a piecewise linear function is described in the last paragraph of this section.

Let be of the form as it is the case for the nonconvex robust ROF with a truncated quadratic dataterm in Sec. 4.2. Again, a direct projection onto the epigraph of is difficult. However, a decomposition of the epigraph into simpler epigraphs is possible as the epigraph of is the intersection of the epigraphs of and . Hence, one can separately project onto the epigraphs of and . Both of these projections can be handled using the methods from the other paragraphs.

#### Piecewise linear case:

In case is piecewise linear on each , i.e., attains finite values at a discrete set of sampled sublabels and interpolates linearly between them, we have that

 (ρ+δΔi)∗(v)=maxτ∈Vi ⟨τ,v⟩−ρ(τ). (23)

Again this is a convex, piecewise linear function. For the projection onto the epigraph of such a function, a quadratic program of the form

 min(x,y)∈Rn+1 12∥x−c∥2+12∥y−d∥2  s.t. ⟨τ,x⟩−ρ(τ)≤y,∀τ∈Vi (24)

needs to be solved. We implemented the primal active-set method described in [13, Algorithm 16.3], and found it solves the program in a few (usually ) iterations for a moderate number of constraints.

## 4 Experiments

### 4.1 Vectorial ROF Denoising

In order to validate experimentally, that our model is exact for convex dataterms, we evaluate it on the Rudin-Osher-Fatemi [18] (ROF) model with vectorial TV (2). In our model this corresponds to defining . As expected based on Prop. 4 the energy of the solution of the unlifted problem is equal to the energy of the projected solution of our method for up to machine precision, as can be seen in Fig. 15 and Fig. 21. We point out, that the sole purpose of this experiment is a proof of concept as our method introduces an overhead and convex problems can be solved via direct optimization. It can be seen in Fig. 15 and Fig. 21, that the baseline method [11] has a strong label bias.

### 4.2 Denoising with Truncated Quadratic Dataterm

For images degraded with both, Gaussian and salt-and-pepper noise we define the dataterm as . We solve the problem using the epigraph decomposition described in the second paragraph of Sec. 3.2. It can be seen, that increasing the number of labels leads to lower energies and at the same time to a reduced effect of the TV. This occurs as we always compute a piecewise convex underapproximation of the original nonconvex dataterm, that gets tighter with a growing number of labels. The baseline method [11] again produces strong discretization artefacts even for a large number of labels .

### 4.3 Optical Flow

We compute the optical flow between two input images . The label space is chosen according to the estimated maximum displacement between the images. The dataterm is , and is based on the norm of the image gradient .

In Fig. 43 we compare the proposed method to the product space approach [8]. Note that we implemented the product space dataterm using Lagrange multipliers, also referred to as the global approach in [8]. While this increases the memory consumption, it comes with lower computation time and guaranteed convergence. For our method, we sample the label space on sublabels and subsequently convexify the energy on each triangle using the quickhull algorithm [3]. For the product space approach we sample the label space at equidistant labels, from to . As the regularizer from the product space approach is different from the proposed one, we chose differently for each method. For the proposed method, we set and for the product space and baseline approach . We can see in Fig. 43, our method outperforms the product space approach w.r.t. the average end-point error. Our method outperforms previous lifting approaches: In Fig. 47 we compare our method on large displacement optical flow to the baseline [11]. To obtain competitive results on the Middlebury benchmark, one would need to engineer a better dataterm.

## 5 Conclusions

We proposed the first sublabel-accurate convex relaxation of vectorial multilabel problems. To this end, we approximate the generally nonconvex dataterm in a piecewise convex manner as opposed to the piecewise linear approximation done in the traditional functional lifting approaches. This assures a more faithful approximation of the original cost function and provides a meaningful interpretation for the non-integral solutions of the relaxed convex problem. In experimental validations on large-displacement optical flow estimation and color image denoising, we show that the computed solutions have superior quality to the traditional convex relaxation methods while requiring substantially less memory and runtime.

## Appendix 0.A Theory

###### Proof (Proof of Proposition 1)

By definition the biconjugate of is given as

 ρ∗∗(u)=supv∈R|V|⟨u,v⟩−(min1≤i≤|T|ρi(v))∗=supv∈R|V|⟨u,v⟩−max1≤i≤|T|ρ∗i(v). (25)

We proceed computing the conjugate of :

 ρ∗i(v)=supu∈R|V|⟨u,v⟩−ρi(u)=supα∈ΔUn+1⟨Eiα,v⟩−ρ(Tiα), (26)

We introduce the substitution and obtain

 α=K−1i(r1),Ki:=(Ti1⊤)∈Rn+1×n+1, (27)

since is invertible for being a non-degenerate triangulation and . With this we can further rewrite the conjugate as

 …=supr∈Δi⟨Air+bi,E⊤iv⟩−ρ(r)=⟨Eibi,v⟩+supr∈Rn⟨r,A⊤iE⊤iv⟩−ρ(r)−δΔi(r)=⟨Eibi,v⟩+ρ∗i(A⊤iE⊤iv). (28)
###### Proof (Proof of Proposition 2)

Define as

 Ψi,j(p):={∥Tiα−Tjβ∥⋅∥ν∥if  p=(Eiα−Ejβ)ν⊤, α,β∈ΔUn+1, ν∈Rd,∞otherwise. (29)

Then, can be rewritten as a pointwise minimum over the individual

 (30)

We begin computing the conjugate of

 Ψ∗i,j(q)=supp∈Rd×|V|⟨p,q⟩−Ψi,j(p)=supα,β∈ΔUn+1supν∈Rd⟨Qiα−Qjβ,ν⟩−∥Tiα−Tjβ∥⋅∥ν∥=supα,β∈ΔUn+1(∥Tiα−Tjβ∥⋅∥⋅∥)∗(Qiα−Qjβ)=δKi,j(q), (31)

with the set being defined as

 Ki,j:={q∈Rd×|V|∣∣∥Qiα−Qjβ∥≤∥Tiα−Tjβ∥, α,β∈ΔUn+1}. (32)

Since the maximum over indicator functions of sets is equal to the indicator function of the intersection of the sets we obtain for

 Ψ∗(q)=max1≤i,j≤|T|Ψ∗i,j(q)=δK(q). (33)
###### Proof (Proof of Proposition 3)

Let s.t. for all and . For any define

 fi:Rn→Rn,(α1,...,αn)↦n∑l=1αltil+(1−n∑l=1αl)tin+1=Tiα, (34)

and analogously

 gi:Rn→R|V|(α1,...,αn)↦n∑l=1αlqil+(1−n∑l=1αl)qin+1=Qiα. (35)

Let us choose an such that , . Then for all and implies that

 ∥gi(α)−gi(α−h)∥≤∥fi(α)−fi(α−h)∥, (36)

holds for all vectors with sufficiently small entries. Inserting the definitions of and we find that

 ∥QiDh∥≤∥TiDh∥ (37)

holds for all with sufficiently small entries. For a non-degenerate triangle, is invertible and a simple substitution yields that

 ∥QiD(TiD)−1~h∥2≤∥~h∥, (38)

holds for all with sufficiently small entries. This means that the operator norm of induced by the norm, i.e. the norm, is bounded by one.

Let us now show the other direction. For s.t. , note that inverting the above computation immediately yields that

 ∥Qkα−Qkβ∥≤∥Tkα−Tkβ∥ (39)

holds for all , . Our goal is to show that having this inequality on each simplex is sufficient to extend it to arbitrary pairs of simplices. The overall idea of this part of the proof is illustrated in Fig. 48.

Let and with , be given. Consider the line segment

 c(γ):[0,1]→Rdγ↦γTjβ+(1−γ)Tiα. (40)

Since the triangulated domain is convex, there exist and functions such that for , one can write for some . The continuity of implies that , i.e. these points correspond to both simplices, and . Note that this also means that . The intuition of this construction is that the are located on the boundaries of adjacent simplices on the line segment. We find

 ∥Tiα−Tjβ∥=r−1∑l=0(al+1−al)∥Tiα−Tjβ∥=r−1∑l=0∥(al+1−al)(Tiα−Tjβ)∥=r−1∑l=0∥al+1Tiα−alTiα−al+1Tjβ+alTjβ∥=r−1∑l=0∥∥alTjβ+(1−al)Tiα−(al+1Tjβ+(1−al+1)Tiα)∥∥=r−1∑l=0∥∥Tklαl(al)−Tklαl(al+1)∥∥r−1∑l=0∥∥Qklαl(al)−Qklαl(al+1)∥∥≥∥∥ ∥∥r−1∑l=0(Qklαl(al)−Qklαl(al+1))∥∥ ∥∥=∥∥ ∥∥r−1∑l=0(Qklαl(al)−Qkl+1αl+1(al+1))∥∥ ∥∥=∥∥Qk0α0(a0)−Qkrαr(ar)∥∥=∥∥Qiα−Qjβ∥∥, (41)

which yields the assertion.

###### Proof (Proof of Proposition 4)

Let be given by affinely independent vertices . We show that our lifting approach applied to the label space solves the convexified unlifted problem, where the dataterm was replaced by its convex hull on . Let the matrices and be defined through

 T=(t1,…,tn+1), D=⎛⎜ ⎜ ⎜ ⎜ ⎜⎝1⋱1−1…−1⎞⎟ ⎟ ⎟ ⎟ ⎟⎠, TD=(t1−tn+1,…,tn−tn+1), (42)

The transformation maps to . Now consider the following lifted function parametrized through :

 u(x)=(~u1(x),…,~un(x),1−∑nj=1~uj(x)). (43)

Consider a fixed . Plugging this lifted representation into the biconjugate of the lifted dataterm yields:

 ρ∗∗(u)