# A Message Passing Algorithm for the Minimum Cost Multicut Problem

We propose a dual decomposition and linear program relaxation of the NP -hard minimum cost multicut problem. Unlike other polyhedral relaxations of the multicut polytope, it is amenable to efficient optimization by message passing. Like other polyhedral elaxations, it can be tightened efficiently by cutting planes. We define an algorithm that alternates between message passing and efficient separation of cycle- and odd-wheel inequalities. This algorithm is more efficient than state-of-the-art algorithms based on linear programming, including algorithms written in the framework of leading commercial software, as we show in experiments with large instances of the problem from applications in computer vision, biomedical image analysis and data mining.

## Authors

• 14 publications
• 13 publications
• ### MAP Estimation, Message Passing, and Perfect Graphs

Efficiently finding the maximum a posteriori (MAP) configuration of a gr...
05/09/2012 ∙ by Tony S. Jebara, et al. ∙ 0

• ### A Study of Lagrangean Decompositions and Dual Ascent Solvers for Graph Matching

We study the quadratic assignment problem, in computer vision also known...
12/16/2016 ∙ by Paul Swoboda, et al. ∙ 0

• ### Augmentative Message Passing for Traveling Salesman Problem and Graph Partitioning

The cutting plane method is an augmentative constrained optimization pro...
06/04/2014 ∙ by Siamak Ravanbakhsh, et al. ∙ 0

• ### Generalized sequential tree-reweighted message passing

This paper addresses the problem of approximate MAP-MRF inference in gen...
05/29/2012 ∙ by Vladimir Kolmogorov, et al. ∙ 0

• ### Joint Graph Decomposition and Node Labeling: Problem, Algorithms, Applications

We state a combinatorial optimization problem whose feasible solutions d...
11/14/2016 ∙ by Evgeny Levinkov, et al. ∙ 0

• ### Join-graph based cost-shifting schemes

We develop several algorithms taking advantage of two common approaches ...
10/16/2012 ∙ by Alexander T. Ihler, et al. ∙ 0

• ### Learning Data Dependency with Communication Cost

In this paper, we consider the problem of recovering a graph that repres...
04/29/2018 ∙ by Hyeryung Jang, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Decomposing a graph into meaningful clusters is a fundamental primitive in computer vision, biomedical image analysis and data mining. In settings where no information is given about the number or size of clusters, and information is given only about the pairwise similarity or dissimilarity of nodes, a canonical mathematical abstraction is the minimum cost multicut (or correlation clustering) problem [14]. The feasible solutions of this problem, multicuts, relate one-to-one to the decompositions of the graph. A multicut is the set of edges that straddle distinct clusters. The cost of a multicut is the sum of costs attributed to its edges.

In the field of computer vision, the minimum cost multicut problem has been applied in [3, 4, 39, 6] to the task of unsupervised image segmentation defined by the BSDS data sets and benchmarks [30] . In the field of biomedical image analysis, the minimum cost multicut problem has been applied to an image segmentation task for connectomics [5]. In the field of data mining, applications include [7, 33, 12, 13]. As the minimum cost multicut problem is np-hard [9, 16], even for planar graphs [8] large and complex instances with millions of edges, especially those for connectomics, pose a challenge for existing algorithms.

Related Work. Due to the importance of multicuts for applications, many algorithms for the minimum cost multicut problem have been proposed. They are grouped below into three categories: primal feasible local search algorithms, linear programming algorithms and fusion algorithms.

Primal feasible local search algorithms [35, 31, 20, 18, 19] attempt to improve an initial feasible solution by means of local transformations from a set that can be indexed or searched efficiently. Local search algorithms are practical for large instances, as the cost of all operations is small compared to the cost of solving the entire problem at once. On the downside, the feasible solution that is output typically depends on the initialization. And even if a solution is found, optimality is not certified, as no lower bound is computed.

Linear programming algorithms [24, 25, 27, 32, 38] operate on an outer polyhedral relaxation of the feasible set. Their output is independent of their initialization and provides a lower bound. This lower bound can be used directly inside a branch-and-bound search for certified optimal solutions. Alternatively, the LP relaxation can be tightened by cutting planes. Several classes of planes are known that define a facet of the multicut polytope and can be separated efficiently [14]. On the downside, algorithms for general LPs that are agnostic to the structure of the multicut problem scale super-linearly with the size of the instance.

Fusion algorithms attempt to combine feasible solutions of subproblems obtained by combinatorial or random procedures into successively better multicuts. The fusion process can either rely on column generation [39], binary quadratic programming [11] or any algorithm for solving integer LPs [10]. In particular, [39] provides dual lower bounds but is restricted to planar graphs. [11, 10] explore the primal solution space in a clever way, but do not output dual information.

Outline. Below, a discussion of preliminaries (Sec. 2) is followed by the definition of our proposed decomposition (Sec. 3) and algorithm (Sec. 4) for the minimum cost multicut problem. Our approach combines the efficiency of local search with the lower bounds of LPs and the subproblems of fusion, as we show in experiments with large and diverse instances of the problem (Sec. 5). All code and data will be made publicly available upon acceptance of the paper.

## 2 Preliminaries

### 2.1 Minimum Cost Multicut Problem

A decomposition (or clustering) of a graph is a partition of the node set such that and every cluster , is connected. The multicut induced by a decomposition is the subset of those edges that straddle distinct clusters (cf. Fig. 1). Such edges are said to be cut. Every multicut induced by any decomposition of is called a multicut of . We denote by the set of all multicuts of .

Given, for every edge , a cost of this edge being cut, the instance of the minimum cost multicut problem w.r.t. these costs is the optimization problem (1) whose feasible solutions are all multicuts of . For any edge , negative costs favour the nodes and to be in distinct components. Positive costs favour these nodes to lie in the same component.

 minM∈MG∑e∈Mθe (1)

This problem is np-hard [9, 16], even for planar graphs [8]. Below, we recapitulate its formulation as a binary LP and then turn to LP relaxations: For any 01-labeling of the edges of , the subset of those edges labeled 1 is a multicut of if and only if satisfies the system (3) of cycle inequalities [14]. Hence, (1) can be stated equivalently in the form of the binary LP (2)–(4).

 minx∈RE ∑e∈Eθexe (2) subject to ∀C∈cycles(G):xe≤∑to0.0pt\hss$e′∈C∖{e}$\hssxe′ (3) x∈{0,1}E (4)

An LP relaxation is obtained by replacing the integrality constraints (4) by with . This results in an outer relaxation of the multicut polytope

, which is the convex hull of the characteristic functions of all multicuts of

. The LP relaxation obtained for , i.e., with only the cycle inequalities, will not in general be tight.

A tighter LP relaxation is obtained by enforing also the odd wheel inequalities [14]. A -wheel is a cycle in with nodes all of which are connected to an additional node that is not in the cycle and is called the center of the -wheel (cf. Fig. 3). For any odd number , any -wheel of , the cycle and the center of the -wheel, every characteristic function of a multicut of satisfies the odd wheel inequality

 (5)

For completeness, we note that other inqualities known to further tighten the LP relaxation can be included in our algorithm, e.g., the bicycle inequalities [14] defind on graphs as in Fig. 3. We, however, do not consider inequalities other than cycles and odd wheels in the algorithm we propose.

### 2.2 Integer relaxed pairwise separable LPs

LP relaxations of the multicut problem can in principle be solved with algorithms for general LPs which are available in excellent software such as CPlex [2] and Gurobi [22]. However, these algorithms scale super-linearly with the size of the problem and are hence impractical for large instances.

We define in Sec. 3 an LP relaxation of the multicut problem in form of an IRPS-LP (Def. 1). IRPS-LPs are a special case of dual decomposition [21]. In Def. 1, every defines a subproblem, and every edge defines a dependency of subproblems. Def. 1 is more specific in that, firstly, the subproblems are binary and, secondly, the linear constraints (9

) that describe the dependence of subproblems are defined by 01-matrices that map 01-vectors to 01-vectors. IRPS-LPs are amenable to efficient optimization by message passing in the framework of

[36].

###### Definition 1 (Irps-Lp [36]).

Let and let be a graph with . For every , let , let , and let . Let . For every , let , and such that

 ∀x∈Xj:A(j,k)x∈{0,1}me (6) ∀x∈Xk:A(k,j)x∈{0,1}me (7)

Then, the LP written below is called integer relaxed pairwise separable w.r.t. the graph .

 minμ∈Λ ∑j∈Vdj∑k=1θjkμjk (8) subject to ∀{j,k}∈E:A(j,k)μj=A(k,j)μk (9)

## 3 Dual Decomposition

A straight-forward decomposition of the minimum cost multicut problem (2)–(4) in the form of an IRPS-LP (Def. 1) consists of one subproblem for every edge, one subproblem for every cycle inequality and one subproblem for every odd-wheel inequality. From a computational perspective, it is however advantageous to triangulate cycles and odd wheels, and to consider the resulting smaller subproblems. Below, three classes of subproblems are defined rigorously.

##### Edge Subproblems.

For every edge , we consider a subproblem with the feasible set , encoding whether edge is cut (1) or uncut (0).

##### Triangle Subproblems

For every cycle , we consider the triangles to , as depicted in Fig. 4. If some edge of a triangle is not in , we add it to with cost zero, i.e., we triangulate the cycle in . For each triangle , we introduce a subproblem whose feasible set consists of the five feasible multicuts of the triangle, i.e., .

##### Lollipop Subproblems

For every odd number and every -wheel of consisting of a center node and cycle nodes , we introduce two classes of subproblems. For the 5-wheel depicted in Fig. 3, these subproblems are depicted in Fig. 5.

For every , we add the triangle subproblem , as described in the previous section.

For every , we add the subproblem for the lollipop graph that consists of the triangle and the additional edge . The feasible set of a lollipop graph has ten elements, five feasible multicuts of the triangle times two for accounting for the additional edge.

### 3.1 Dependencies

The dependency between triangle subproblems and edge subproblems are expressed below in the form of a linear system. It fits into thee form (9) of an IRPS-LP.

 μuv =μuvw(1,1,0)+μuvw(1,0,1)+μuvw(1,1,1) μuw =μuvw(1,1,0)+μuvw(0,1,1)+μuvw(1,1,1) μvw =μuvw(1,0,1)+μuvw(0,1,1)+μuvw(1,1,1)

The dependency between a lollipop subproblem with edge set and a triangle subproblem with edge set is stated below as a linear system with sums over edges not shared between and . This linear system has the form (9) of an IRPS-LP.

 ∀xL∩T:∑xL∖TμL(xL∩T,xL∖T)=∑xT∖LμT(xT∩L,xT∖C)

### 3.2 Remarks

Remark 1. The triangulation of cycles can be understood as the constructing of a junction tree [37] in such a way that the minimum cost multicut problem over the cycle can be solved by dynamic programming. The triangulation of cycles can also be understood as a tightening of an outer polyhedral relaxation of the multicut polytope: A cycle inequality (3) defines a facet of the multicut polytope if and only if the cycle is chordless [14]. By triangulating a cycle, we obtain a set of minimal chordless cycles (triangles) whose cycle inequalities together imply that of the entire cycle.

Remark 2. Technically, we would not have needed to include triangle subproblems for odd wheels. Instead, we could have introduced dependencies between lollipops directly in the form of an IRPS-LP. However, by introducing triangle factors in addition and by expressing dependencies between lollipops and triangles, we couple lollipop factors from different odd wheels more tightly whenever they share the same triangles.

## 4 Algorithm

We now define an algorithm for the minimum cost multicut problem (2)–(4). This algorithm takes an instance of the problem as input and alternates for a fixed number of iterations between two main procedures.

The first procedure, defined in Sec. 4.1, solves an instance of a dual of the IRPS-LP relaxation defined in the previous section. The output consists in a lower bound and a re-parameterization of the instance of the minimum cost multicut problem given as input. The second procedure tightens the IRPS-LP relaxation by adding subproblems for cycle inequalities (3) and odd wheel inequalities (5) violated by the current solution. Separation procedures for finding such violated inequalities, more efficiently than in cutting plane algorithms for the primal [24, 25, 27], are defined in Sec. 4.2.

To find feasible solutions of the instance of the minimum cost multicut problem given as input, we apply a state-of-the-art local search algorithm on the computed re-parameterizations, a procedure commonly referred to as rounding (Sec. 4.3).

### 4.1 Message Passing

Like other algorithms based on dual decomposition, the algorithm we propose does not solve the IRPS-LP directly, in the primal domain, but optimizes a dual of (8)–(9). Specifically, it operates on a space of re-parametrizations of the problem defined below: For any two dependent subproblems , we can change the costs and by an arbitrary vector according to the update rules

 θ′j :=θj+A⊤(j,k)Δ (10) θ′k :=θk−A⊤(k,j)Δ. (11)

We refer to any update of according to the rules (10)–(11) as message passing. Message passing does not change the cost of any primal feasible solution, as

 ⟨θ′j,μj⟩+⟨θ′k,μk⟩ = ⟨θj+A⊤(j,k)Δ,μj⟩+⟨θk−A⊤(k,j)Δ,μk⟩ (12) = ⟨θj,μj⟩+⟨θk,μk⟩+⟨Δ,A(j,k)μj−A(k,j)μk⟩ (13) ⟨θj,μj⟩+⟨θk,μk⟩. (14)

Message passing does, however, change the dual lower bound to (8) given by

 L(θ):=∑j∈Vminx∈Xi⟨θj,xj⟩. (15)

The maximum of over all costs obtainable by message passing is equal to the minimum of (8), by linear programming duality. We seek to alter the costs by means of message passing so as to maximize the lower bound . For the general IRPS-LP, a framework of algorithms to achieve this goal is defined in [36]. For the minimum cost multicut problem, we define and implement Alg. 1 within this framework. The specifics of this algorithm for the minimum cost multicut problem are discussed below. General properties of message passing for IRPS-LP s are discussed in [36].

##### Factor Order.

Alg. 1 iterates through all edge and triangle subproblems. The order is specified as follows: We assume that a node order is given. With respect to this node order, edges are ordered lexicographically. For every triangle and its edge set with , we define the ordering constraint . For every lollipop graph and its edge set with , we define the ordering constraint . The strict partial order defined by these constraints is extended to a total order by topological sorting.

##### Message Passing Description.

When an edge subproblem is visited, Alg. 1 receives messages from all dependent triangle subproblems. Having received a message from triangle , the costs satisfy the condition

 minxuw,xvwθuvw(0,xuw,xvw)=minxuw,xvwθuvw(1,xuw,xvw).

In other words, the cost of the triangle factor has no preference for either or . Sending messages from is analoguous: Having sent messages from , we have , i.e., there is again no preference for either or .

When we visit a triangle subproblem , we do the analogous with all dependent lollipop subproblems: Once messages have been received, lollipop subproblems have no preference for incident edges. Once messages have been sent, this holds true for the triangle subproblems.

Once Alg. 1 has visited all subproblems and terminates, we reverse the order of subproblems and invoke Alg. 1 again. This double call of Alg. 1 is repeated for a fixed number of iterations that is a parameter of our algorithm.

### 4.2 Separation

Applying Alg. 1 with all cycles and all odd wheels of a graph is impractical, as the number of triangles for cycle inequalities (3) is cubic, and the number of lollipop graphs for odd wheels (5) is quartic in . In order to arrive at a practical algorithm, we take a cutting plane approach in which we separate and add subproblems for violated cycle and odd wheel inequalities periodically. Initially, contains only one element for every edge , and is empty.

In the primal, given some fractional , it is common to look for maximally violated inequalities (3) and (5). This is possible in polynomial time via shortest path computations [14, 17]. In our dual formulation, we have no primal solution to search for violated inequalities. Here, a suitable criterion is to consider those additional triangle or lollipop subproblems that necessarily increase the dual lower bound by some constant . Among these subproblems, we choose those for which the increase is maximal and add them to the graph . A similar dual cutting plane approach has shown to be useful for graphical models in [34]. As we discuss below, separation is more efficient in the dual than in the primal.

#### 4.2.1 Cycle Inequalities

We characterize those cycles whose subproblem increases the dual lower bound by at least .

###### Proposition 1.

Let be a cycle with and for . Then, the dual lower bound can be increased by by including a triangulation of .

In order to find such cycles, we apply Alg. 2. This algorithm first records in a disjoint set data structure whether distinct nodes are connected via edges with weight . Then, it visits all edges with . If the endpoints of are connected by a path along which all edges have weight at least , it searches for a shortest such path by means of breadth first search.

In the primal, finding a maximally violated cycle inequality (3) is more expensive, requiring, for every edge , the search for a -path with minimum cost  [14] by, e.g., Dijkstra’s algorithm.

#### 4.2.2 Odd Wheel Inequalities

We characterize those odd wheels whose lollipop subproblem increases the lower bound by at least .

###### Proposition 2.

Let an odd wheel with center node and cycle nodes . Adding the lollipop subproblems for increases by at least if the costs of each triangle are such that the minimal cost of any edge labeling of the triangle cutting precisely one edge incident to is smaller by than the minimal cost of any edge labeling of the triangle cutting or edges incident to . That is:

 min{x:xuvi+xuvi+1=1}θuvivi+1(x)+ϵ ≤ min{x:xuvi+xuvi+1≠1}θuvivi+1(x). (16)

In order to find such odd wheels, we apply Alg. 3. This algorithm builds on our observation that we need to look only at triangles whose subproblem has already been added. Hence, Alg. 3 visits each node and builds a bipartite graph as follows. (An example is depicted in Fig. 6 for a 5-wheel and (16) holding true for all triangles of the wheel.) For each triangle such that (16) holds true, four nodes are added to , two copies of each original node. These are joined by edges . If a path from to exists in , we have found a violated odd wheel inequality (5). As is bipartite, a -path in corresponds to an odd cycle in . As before, the search for paths is accelerated by connectivity tests via a disjoint set data structure and is carried out by breadth first search.

In the primal, finding a maximally violated odd wheel inequality (5) entails the same construction of the bipartite graph for each node  [17]. However, a shortest path search w.r.t. edge costs needs to be carried out by Dijkstra’s algorithm instead of breadth first search. Further complication in the primal comes from the fact that a separation algorithm needs to visit all in order to compute the shortest -path in .

### 4.3 Rounding

Our message passing Alg. 1 improves a dual lower bound on (2), but does not provide a feasible solution of (2)–(4). In order to obtain a feasible multicut, we apply a local search algorithm defined in [26], namely greedy additive edge contraction (GAEC), followed by Kernighan-Lin with joins (KLj). GAEC computes a multicut by greedily contracting those edges for which the join decreases the cost maximally. It stops as soon as no contraction of any edge strictly decreases the cost. KLj attempts to improve a given multicut recursively by applying transformations from three classes: (1) moving nodes between two components, (2) moving nodes from a given component to a newly forming one or (3) joining two components. GAEC and KLj are local search algorithms that output a feasible multicut that need not be optimal.

We apply GAEC and KLj not only to the instance of the minimum cost multicut problem given as input but also to the re-parameterization of this instance output by Alg. 1. The rationale for doing so comes from LP duality:

###### Proposition 3.

Assume maximizes the dual lower bound and the relaxation is tight, i.e.

 L(θ)=min{x∈{0,1}E|x−1(1)∈MG}⟨θ,x⟩. (17)

Moreover, let such that is an optimal multicut of . Then,

 θe{≤0if ^xe=1≥0,if ^xe=0 (18)

Having run Alg. 1 for a while, we expect to fulfill the sign condition of Prop. 3 approximately. Therefore, the sign of will be a good hint of the edge being cut. Thus, informally, we expect local search algorithms operating on the re-parameterized instance of the problem to yield better feasible multicuts than local search algorithms operating on the given instance.

For MAP-inference in discrete graphical models, it is known from [28, 29] that primal rounding can be improved greatly when applied to cost functions re-parameterized by message passing.

## 5 Experiments

##### Solvers

We compare against several state of the art algorithms.

• The algorithm MC-ILP [25] is an efficient implementation of a cutting plane algorithm solving (2) using cycle inequalities (3) in a cutting plane fashion. CPlex [2] is used to solve the underlying ILP problems. The integrality conditions in (4) are directly given to the solver. According to [25] this is beneficial due to the excellent branch and cut capabilities of CPlex [2].

• Cut, Glue & Cut [11], abbreviated as CGC, is a move making algorithm using planar max-cut subproblems to improve multicuts.

• Fusion moves for correlation clustering [10], abbreviated as CC-Fusion, fuses multicuts generated by various proposal generator with the help of auxiliary multicut problems, solved in turn by MC-ILP

. We use randomized hierarchical clustering and randomized watersheds as proposal generators, identified by the suffixes

-RHC and -RWS. We use parameters for the proposal generators as recommended by the authors [10].

• MP-C denotes Algorithm 1 when we only separate for cycle inequalities (3) by Algorithm 2, while MP-COW denotes that we additionally separate for odd wheel inequalities (5) by Algorithm 3. We search for triangles and lollipops to add every 10th iteration.

• KL is the GAEC and KLj implementation [26] described in Section 4.3 for computing multicuts. We let KL run every 100th iteration of MP-C and MP-COW on the current reparametrized edge costs.

MC-ILP, CGC and CC-Fusion are implemented as part of the OpenGM suite [23]. Only MC-ILP and our solvers MP-C and MP-COW generate dual lower bounds. CGC also outputs dual lower bounds, but these are equivalent to the trivial lower bound , where edge weights are as given by the problem. It has been shown that CGC, CC-Fusion and KL

outperform other primal heuristics

[10], hence we do not compare to any other heuristic algorithm. Also MC-ILP outperforms the LP-based solver [32], due to the latter using the slower COIN-OR CLP [15] solver internally, hence we exclude it from the comparison as well.

All solvers were run on a laptop computer with a i5-5200 CPU with 2.2 GHz and 8GB RAM.

##### Datasets

We compare on 8 datasets of diverse origin.

• image-seg consists of images of the Berkeley segmentation dataset [30], presegmented with superpixels, for which pairwise affinity values have been computed as in [4].

• The knott-3d-{150|300|450|550} datasets come from a neural circuit reconstruction problem of tissue [5] with , , and voxels. The data is presegmented into supervoxels.

• modularity clustering aims to cluster a social network into subgroups based on affinity between individual persons.

• CREMI-{small|large} datasets were constructed as part of the CREMI [1] challenge, which aims to reconstruct neural circuits of the adult fly brain. The images are taken by electron microscopy. The -small instances are cropped versions of the -large ones. To our knowledge, the CREMI-large dataset contain the largest multicut problems approached with LP-based methods.

The image-seg, knott-3d and modularity clustering datasets were taken from the OpenGM benchmark [23], while the CREMI datasets were kindly provided by their authors and are not yet published.

The dataset consists of 100, 8, 8, 8, 8, 6, 3 and 3 instances, in total 144. Dataset details can be found in Table 1.

##### Evaluation

We have set a timelimit of one hour for all algorithms. In Table 1 results averaged over all instances in specific datasets are reported. In Figure 7 primal solution energy and dual lower bound (where applicable) averaged over all instances in specific datasets are drawn against runtime.

As can be seen from Table 1, except for dataset CREMI-large, our solver MP-COW gives dual bounds that are within 0.0045%, 1.9%, 0.0061%, 0.0068%, 0.0017%, 0.0007% and 0.0083% of the dual lower bound obtained by MC-ILP, which uses the advanced branch-and-cut facilities CPlex [2] provides. For CREMI-large only our solvers MP-C and MP-COW output dual lower bounds, as MC-ILP did not finish a single iteration after one hour. As can be seen from Fig. 7 our lower bound usually converges faster than MC-ILP’s. We conjecture that MC-C and MP-COW inside a branch-and-bound solver can significantly extend the reach of exact methods for the multicut problem.

Strangely, KL does not perform well on image-seg, even though the lower bound we achieve with MP-C and MP-COW are not far from the optimal lower bounds computed by MC-ILP. On the other hand, MP-C and MP-COW give much better dual and primal results for modularity-clustering early on. Generally, when compared to MC-ILP’s primal convergence, we give much lower values early on, and for the large-scale datasets knott-3d-550, CREMI-small, CREMI-large, MCI-ILP’s primal solutions are not useful anymore.

Unlike MC-ILP, our reparametrized costs can be used to improve heuristic primal algorithms. An example of this can be seen in Fig. 8, where reparametrized costs improve KL’s solutions.

##### Conclusion

We have shown that LP-based methods are feasible for solving large scale multicut problems on commodity hardware and one does not have to resort to heuristic primal algorithms. We achieve dual bounds very close to those computed by state-of-the-art branch-and-cut solvers. Additionally, our method usually gives much faster dual bound convergence, resulting in superior solutions when terminated early. Also the primal heuristic GAEC + KLj can be improved when run on costs as computed by our method.

It remains an interesting task to integrate primal heuristics more tightly into our message passing approach and further improve the dual lower bound by e.g. embedding our solver into branch and cut.

## 6 Acknowledgments

The authors would like to thank Vladimir Kolmogorov for helpful discussions. This work is partially funded by the European Research Council under the European Unions Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement no 616160.

## References

• [1] CREMI MICCAI Challenge on circuit reconstruction from Electron Microscopy Images.
• [2] IBM ILOG CPLEX Optimizer.
• [3] A. Alush and J. Goldberger. Break and conquer: Efficient correlation clustering for image segmentation. In E. R. Hancock and M. Pelillo, editors, SIMBAD, volume 7953 of Lecture Notes in Computer Science, pages 134–147. Springer, 2013.
• [4] B. Andres, J. H. Kappes, T. Beier, U. Köthe, and F. A. Hamprecht. Probabilistic image segmentation with closedness constraints. In D. N. Metaxas, L. Quan, A. Sanfeliu, and L. J. V. Gool, editors, ICCV, pages 2611–2618. IEEE Computer Society, 2011.
• [5] B. Andres, T. Kröger, K. L. Briggman, W. Denk, N. Korogod, G. Knott, U. Köthe, and F. A. Hamprecht. Globally optimal closed-surface segmentation for connectomics. In A. W. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid, editors, ECCV (3), volume 7574 of Lecture Notes in Computer Science, pages 778–791. Springer, 2012.
• [6] B. Andres, J. Yarkony, B. S. Manjunath, S. Kirchhoff, E. Turetken, C. C. Fowlkes, and H. Pfister. Segmenting planar superpixel adjacency graphs w.r.t. non-planar superpixel affinity graphs.

Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR)

, 2013.
• [7] A. Arasu, C. Ré, and D. Suciu. Large-scale deduplication with constraints using dedupalog. In Y. E. Ioannidis, D. L. Lee, and R. T. Ng, editors, ICDE, pages 952–963. IEEE Computer Society, 2009.
• [8] Y. Bachrach, P. Kohli, V. Kolmogorov, and M. Zadimoghaddam. Optimal coalition structure generation in cooperative graph games. In

Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, July 14-18, 2013, Bellevue, Washington, USA.

, 2013.
• [9] N. Bansal, A. Blum, and S. Chawla. Correlation clustering. Machine Learning, 56(1):89–113, 2004.
• [10] T. Beier, F. A. Hamprecht, and J. H. Kappes. Fusion moves for correlation clustering. In CVPR, pages 3507–3516. IEEE Computer Society, 2015.
• [11] T. Beier, T. Kröger, J. H. Kappes, U. Köthe, and F. A. Hamprecht. Cut, glue & cut: A fast, approximate solver for multicut partitioning. In CVPR. Proceedings, 2014.
• [12] Y. Chen, S. Sanghavi, and H. Xu. Clustering sparse graphs. In P. L. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, NIPS, pages 2213–2221, 2012.
• [13] F. Chierichetti, N. Dalvi, and R. Kumar. Correlation clustering in mapreduce. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 641–650, New York, NY, USA, 2014. ACM.
• [14] S. Chopra and M. R. Rao. The partition problem. Mathematical Programming, 59(1):87–115, 1993.
• [15] COIN-OR CLP, 2016.
• [16] E. D. Demaine, D. Emanuel, A. Fiat, and N. Immorlica. Correlation clustering in general weighted graphs. Theor. Comput. Sci., 361(2):172–187, Sept. 2006.
• [17] M. M. Deza and M. Laurent. Geometry of Cuts and Metrics. Springer Publishing Company, Incorporated, 1st edition, 2009.
• [18] M. Elsner and E. Charniak. You talking to me? a corpus and algorithm for conversation disentanglement. In K. McKeown, J. D. Moore, S. Teufel, J. Allan, and S. Furui, editors, ACL, pages 834–842. The Association for Computer Linguistics, 2008.
• [19] M. Elsner and W. Schudy. Bounding and comparing methods for correlation clustering beyond ILP. In Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing, ILP ’09, pages 19–27, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics.
• [20] A. Gionis, H. Mannila, and P. Tsaparas. Clustering aggregation. ACM Trans. Knowl. Discov. Data, 1(1):4, 2007.
• [21] M. Guignard and S. Kim. Lagrangean decomposition for integer programming: theory and applications. Revue française d’automatique, d’informatique et de recherche opérationnelle. Recherche opérationnelle, 21(4):307–323, 1987.
• [22] Gurobi Optimization, Inc., 2015.
• [23] J. H. Kappes, B. Andres, F. A. Hamprecht, C. Schnörr, S. Nowozin, D. Batra, S. Kim, B. X. Kausler, T. Kröger, J. Lellmann, N. Komodakis, B. Savchynskyy, and C. Rother. A comparative study of modern inference techniques for structured discrete energy minimization problems. International Journal of Computer Vision, 115(2):155–184, 2015.
• [24] J. H. Kappes, M. Speth, B. Andres, G. Reinelt, and C. Schnörr. Globally optimal image partitioning by multicuts. In EMMCVPR. Springer, Springer, 2011.
• [25] J. H. Kappes, M. Speth, G. Reinelt, and C. Schnörr. Higher-order segmentation via multicuts. CoRR, abs/1305.6387, 2013.
• [26] M. Keuper, E. Levinkov, N. Bonneel, G. Lavoué, T. Brox, and B. Andres. Efficient decomposition of image and mesh graphs by lifted multicuts. In ICCV, 2015.
• [27] S. Kim, S. Nowozin, P. Kohli, and C. D. Yoo. Higher-order correlation clustering for image segmentation. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. C. N. Pereira, and K. Q. Weinberger, editors, NIPS, pages 1530–1538, 2011.
• [28] V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell., 28(10):1568–1583, 2006.
• [29] V. Kolmogorov. A new look at reweighted message passing. IEEE Trans. Pattern Anal. Mach. Intell., 37(5):919–930, 2015.
• [30] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, volume 2, pages 416–423 vol.2, 2001.
• [31] V. Ng and C. Cardie. Improving machine learning approaches to coreference resolution. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02, (July):104, 2001.
• [32] S. Nowozin and S. Jegelka.

Solution stability in linear programming relaxations: graph partitioning and unsupervised learning.

In A. P. Danyluk, L. Bottou, and M. L. Littman, editors, ICML, volume 382 of ACM International Conference Proceeding Series, pages 769–776. ACM, 2009.
• [33] E. Sadikov, J. Madhavan, L. Wang, and A. Halevy. Clustering query refinements by user intent. In World Wide Web Conference (WWW). ACM Press, April 2010.
• [34] D. Sontag, D. K. Choe, and Y. Li. Efficiently searching for frustrated cycles in MAP inference. In UAI, pages 795–804. AUAI Press, 2012.
• [35] W. M. Soon, H. T. Ng, and D. C. Y. Lim. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4):521–544, 2001.
• [36] P. Swoboda, J. Kuske, and B. Savchynskyy. A dual ascent framework for Lagrangean decomposition of combinatorial problems. CoRR, 2016.
• [37] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1-2):1–305, 2008.
• [38] J. Yarkony, T. Beier, P. Baldi, and F. A. Hamprecht. Parallel multicut segmentation via dual decomposition. In New Frontiers in Mining Complex Patterns - Third International Workshop, NFMCP 2014, Held in Conjunction with ECML-PKDD 2014, Nancy, France, September 19, 2014, Revised Selected Papers, pages 56–68, 2014.
• [39] J. Yarkony, A. Ihler, and C. C. Fowlkes. Fast Planar Correlation Clustering for Image Segmentation, pages 568–581. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.

## 7 Apendix

### 7.1 Proofs

##### Proof of Proposition 1
###### Proposition.

Let be a cycle with and for . Then, the dual lower bound can be increased by by including a triangulation of .

###### Proof.

Let cycle have vertices and assume that with for notational purposes. After triangulation, triangle factors on vertices will be present in the model. Let the current reparametrization be .

The triangle factors corresponding to cycle will enforce the cycle inequality (3)

 xe1≤∑i=1,…,kxei. (19)

It holds that

 −θe1=minxe1,…,xek∈[0,1]k∑i=1θeixei≤−ϵ+minxe1,…,xek∈[0,1]k∑i=1θeixei s.t.~{}(???)≤ϵ+max⎧⎪⎨⎪⎩θe1,…,θekθv1v2v3,…,θv1vk−1vka reparametrization⎫⎪⎬⎪⎭LC(θ) (20)

where the dual lower bound on cycle . The first inequality above is due to either in the optimal solution or one being one due to (19). The second inequality is due to the fact that (i)  by linear programming duality and (ii) the triangle factors enforce more inequalities than only (19). ∎

##### Proof of Proposition 2
###### Proposition.

Let an odd wheel with center node and cycle nodes . Adding the lollipop subproblems for increases by at least if the costs of each triangle are such that the minimal cost of any edge labeling of the triangle cutting precisely one edge incident to is smaller by than the minimal cost of any edge labeling of the triangle cutting or edges incident to . That is:

 min{x:xuvi+xuvi+1=1}θuvivi+1(x)+ϵ ≤ min{x:xuvi+xuvi+1≠1}θuvivi+1(x). (21)
###### Proof.

Condition (16) means that in all triangles in the odd wheel , the minimal assignment with regard to the current reparametrization, has exactl one edge incident to . All other assignment have cost greater by at least . As is odd, there is no possiblity to combine those local assignments to a global assignment on .

On the other hand, our construction of lollipop factors ensures exactness on odd wheels. As at least one triangle must then be assigned costs that are not locally optimal and which is larger by than its minimal reparametrized cost, the result follows. ∎

##### Proof of Proposition 3
###### Proposition.

Assume maximizes the dual lower bound and the relaxation is tight, i.e.

 L(θ)=min{x∈{0,1}E|x−1(1)∈MG}⟨θ,x⟩. (22)

Moreover, let such that is an optimal multicut of . Then,

 θe{≤0if ^xe=1≥0,if ^xe=0 (23)
###### Proof.

Follows from the complementary slackness conditions in linear programming duality. ∎

### 7.2 Detailed experimental evaluation

In Table 2 a detailed per instance evaluation of all algorithms considered in the experimental section can be found.