1 Introduction
Decomposing a graph into meaningful clusters is a fundamental primitive in computer vision, biomedical image analysis and data mining. In settings where no information is given about the number or size of clusters, and information is given only about the pairwise similarity or dissimilarity of nodes, a canonical mathematical abstraction is the minimum cost multicut (or correlation clustering) problem [14]. The feasible solutions of this problem, multicuts, relate onetoone to the decompositions of the graph. A multicut is the set of edges that straddle distinct clusters. The cost of a multicut is the sum of costs attributed to its edges.
In the field of computer vision, the minimum cost multicut problem has been applied in [3, 4, 39, 6] to the task of unsupervised image segmentation defined by the BSDS data sets and benchmarks [30] . In the field of biomedical image analysis, the minimum cost multicut problem has been applied to an image segmentation task for connectomics [5]. In the field of data mining, applications include [7, 33, 12, 13]. As the minimum cost multicut problem is nphard [9, 16], even for planar graphs [8] large and complex instances with millions of edges, especially those for connectomics, pose a challenge for existing algorithms.
Related Work. Due to the importance of multicuts for applications, many algorithms for the minimum cost multicut problem have been proposed. They are grouped below into three categories: primal feasible local search algorithms, linear programming algorithms and fusion algorithms.
Primal feasible local search algorithms [35, 31, 20, 18, 19] attempt to improve an initial feasible solution by means of local transformations from a set that can be indexed or searched efficiently. Local search algorithms are practical for large instances, as the cost of all operations is small compared to the cost of solving the entire problem at once. On the downside, the feasible solution that is output typically depends on the initialization. And even if a solution is found, optimality is not certified, as no lower bound is computed.
Linear programming algorithms [24, 25, 27, 32, 38] operate on an outer polyhedral relaxation of the feasible set. Their output is independent of their initialization and provides a lower bound. This lower bound can be used directly inside a branchandbound search for certified optimal solutions. Alternatively, the LP relaxation can be tightened by cutting planes. Several classes of planes are known that define a facet of the multicut polytope and can be separated efficiently [14]. On the downside, algorithms for general LPs that are agnostic to the structure of the multicut problem scale superlinearly with the size of the instance.
Fusion algorithms attempt to combine feasible solutions of subproblems obtained by combinatorial or random procedures into successively better multicuts. The fusion process can either rely on column generation [39], binary quadratic programming [11] or any algorithm for solving integer LPs [10]. In particular, [39] provides dual lower bounds but is restricted to planar graphs. [11, 10] explore the primal solution space in a clever way, but do not output dual information.
Outline. Below, a discussion of preliminaries (Sec. 2) is followed by the definition of our proposed decomposition (Sec. 3) and algorithm (Sec. 4) for the minimum cost multicut problem. Our approach combines the efficiency of local search with the lower bounds of LPs and the subproblems of fusion, as we show in experiments with large and diverse instances of the problem (Sec. 5). All code and data will be made publicly available upon acceptance of the paper.
2 Preliminaries
2.1 Minimum Cost Multicut Problem
A decomposition (or clustering) of a graph is a partition of the node set such that and every cluster , is connected. The multicut induced by a decomposition is the subset of those edges that straddle distinct clusters (cf. Fig. 1). Such edges are said to be cut. Every multicut induced by any decomposition of is called a multicut of . We denote by the set of all multicuts of .
Given, for every edge , a cost of this edge being cut, the instance of the minimum cost multicut problem w.r.t. these costs is the optimization problem (1) whose feasible solutions are all multicuts of . For any edge , negative costs favour the nodes and to be in distinct components. Positive costs favour these nodes to lie in the same component.
(1) 
This problem is nphard [9, 16], even for planar graphs [8]. Below, we recapitulate its formulation as a binary LP and then turn to LP relaxations: For any 01labeling of the edges of , the subset of those edges labeled 1 is a multicut of if and only if satisfies the system (3) of cycle inequalities [14]. Hence, (1) can be stated equivalently in the form of the binary LP (2)–(4).
(2)  
subject to  (3)  
(4) 
An LP relaxation is obtained by replacing the integrality constraints (4) by with . This results in an outer relaxation of the multicut polytope
, which is the convex hull of the characteristic functions of all multicuts of
. The LP relaxation obtained for , i.e., with only the cycle inequalities, will not in general be tight.A tighter LP relaxation is obtained by enforing also the odd wheel inequalities [14]. A wheel is a cycle in with nodes all of which are connected to an additional node that is not in the cycle and is called the center of the wheel (cf. Fig. 3). For any odd number , any wheel of , the cycle and the center of the wheel, every characteristic function of a multicut of satisfies the odd wheel inequality
(5) 
2.2 Integer relaxed pairwise separable LPs
LP relaxations of the multicut problem can in principle be solved with algorithms for general LPs which are available in excellent software such as CPlex [2] and Gurobi [22]. However, these algorithms scale superlinearly with the size of the problem and are hence impractical for large instances.
We define in Sec. 3 an LP relaxation of the multicut problem in form of an IRPSLP (Def. 1). IRPSLPs are a special case of dual decomposition [21]. In Def. 1, every defines a subproblem, and every edge defines a dependency of subproblems. Def. 1 is more specific in that, firstly, the subproblems are binary and, secondly, the linear constraints (9
) that describe the dependence of subproblems are defined by 01matrices that map 01vectors to 01vectors. IRPSLPs are amenable to efficient optimization by message passing in the framework of
[36].Definition 1 (IrpsLp [36]).
Let and let be a graph with . For every , let , let , and let . Let . For every , let , and such that
(6)  
(7) 
Then, the LP written below is called integer relaxed pairwise separable w.r.t. the graph .
(8)  
subject to  (9) 
3 Dual Decomposition
A straightforward decomposition of the minimum cost multicut problem (2)–(4) in the form of an IRPSLP (Def. 1) consists of one subproblem for every edge, one subproblem for every cycle inequality and one subproblem for every oddwheel inequality. From a computational perspective, it is however advantageous to triangulate cycles and odd wheels, and to consider the resulting smaller subproblems. Below, three classes of subproblems are defined rigorously.
Edge Subproblems.
For every edge , we consider a subproblem with the feasible set , encoding whether edge is cut (1) or uncut (0).
Triangle Subproblems
For every cycle , we consider the triangles to , as depicted in Fig. 4. If some edge of a triangle is not in , we add it to with cost zero, i.e., we triangulate the cycle in . For each triangle , we introduce a subproblem whose feasible set consists of the five feasible multicuts of the triangle, i.e., .
Lollipop Subproblems
For every odd number and every wheel of consisting of a center node and cycle nodes , we introduce two classes of subproblems. For the 5wheel depicted in Fig. 3, these subproblems are depicted in Fig. 5.
For every , we add the triangle subproblem , as described in the previous section.
For every , we add the subproblem for the lollipop graph that consists of the triangle and the additional edge . The feasible set of a lollipop graph has ten elements, five feasible multicuts of the triangle times two for accounting for the additional edge.
3.1 Dependencies
The dependency between triangle subproblems and edge subproblems are expressed below in the form of a linear system. It fits into thee form (9) of an IRPSLP.
The dependency between a lollipop subproblem with edge set and a triangle subproblem with edge set is stated below as a linear system with sums over edges not shared between and . This linear system has the form (9) of an IRPSLP.
3.2 Remarks
Remark 1. The triangulation of cycles can be understood as the constructing of a junction tree [37] in such a way that the minimum cost multicut problem over the cycle can be solved by dynamic programming. The triangulation of cycles can also be understood as a tightening of an outer polyhedral relaxation of the multicut polytope: A cycle inequality (3) defines a facet of the multicut polytope if and only if the cycle is chordless [14]. By triangulating a cycle, we obtain a set of minimal chordless cycles (triangles) whose cycle inequalities together imply that of the entire cycle.
Remark 2. Technically, we would not have needed to include triangle subproblems for odd wheels. Instead, we could have introduced dependencies between lollipops directly in the form of an IRPSLP. However, by introducing triangle factors in addition and by expressing dependencies between lollipops and triangles, we couple lollipop factors from different odd wheels more tightly whenever they share the same triangles.
4 Algorithm
We now define an algorithm for the minimum cost multicut problem (2)–(4). This algorithm takes an instance of the problem as input and alternates for a fixed number of iterations between two main procedures.
The first procedure, defined in Sec. 4.1, solves an instance of a dual of the IRPSLP relaxation defined in the previous section. The output consists in a lower bound and a reparameterization of the instance of the minimum cost multicut problem given as input. The second procedure tightens the IRPSLP relaxation by adding subproblems for cycle inequalities (3) and odd wheel inequalities (5) violated by the current solution. Separation procedures for finding such violated inequalities, more efficiently than in cutting plane algorithms for the primal [24, 25, 27], are defined in Sec. 4.2.
To find feasible solutions of the instance of the minimum cost multicut problem given as input, we apply a stateoftheart local search algorithm on the computed reparameterizations, a procedure commonly referred to as rounding (Sec. 4.3).
4.1 Message Passing
Like other algorithms based on dual decomposition, the algorithm we propose does not solve the IRPSLP directly, in the primal domain, but optimizes a dual of (8)–(9). Specifically, it operates on a space of reparametrizations of the problem defined below: For any two dependent subproblems , we can change the costs and by an arbitrary vector according to the update rules
(10)  
(11) 
We refer to any update of according to the rules (10)–(11) as message passing. Message passing does not change the cost of any primal feasible solution, as
(12)  
(13)  
(14) 
Message passing does, however, change the dual lower bound to (8) given by
(15) 
The maximum of over all costs obtainable by message passing is equal to the minimum of (8), by linear programming duality. We seek to alter the costs by means of message passing so as to maximize the lower bound . For the general IRPSLP, a framework of algorithms to achieve this goal is defined in [36]. For the minimum cost multicut problem, we define and implement Alg. 1 within this framework. The specifics of this algorithm for the minimum cost multicut problem are discussed below. General properties of message passing for IRPSLP s are discussed in [36].
Factor Order.
Alg. 1 iterates through all edge and triangle subproblems. The order is specified as follows: We assume that a node order is given. With respect to this node order, edges are ordered lexicographically. For every triangle and its edge set with , we define the ordering constraint . For every lollipop graph and its edge set with , we define the ordering constraint . The strict partial order defined by these constraints is extended to a total order by topological sorting.
Message Passing Description.
When an edge subproblem is visited, Alg. 1 receives messages from all dependent triangle subproblems. Having received a message from triangle , the costs satisfy the condition
In other words, the cost of the triangle factor has no preference for either or . Sending messages from is analoguous: Having sent messages from , we have , i.e., there is again no preference for either or .
When we visit a triangle subproblem , we do the analogous with all dependent lollipop subproblems: Once messages have been received, lollipop subproblems have no preference for incident edges. Once messages have been sent, this holds true for the triangle subproblems.
4.2 Separation
Applying Alg. 1 with all cycles and all odd wheels of a graph is impractical, as the number of triangles for cycle inequalities (3) is cubic, and the number of lollipop graphs for odd wheels (5) is quartic in . In order to arrive at a practical algorithm, we take a cutting plane approach in which we separate and add subproblems for violated cycle and odd wheel inequalities periodically. Initially, contains only one element for every edge , and is empty.
In the primal, given some fractional , it is common to look for maximally violated inequalities (3) and (5). This is possible in polynomial time via shortest path computations [14, 17]. In our dual formulation, we have no primal solution to search for violated inequalities. Here, a suitable criterion is to consider those additional triangle or lollipop subproblems that necessarily increase the dual lower bound by some constant . Among these subproblems, we choose those for which the increase is maximal and add them to the graph . A similar dual cutting plane approach has shown to be useful for graphical models in [34]. As we discuss below, separation is more efficient in the dual than in the primal.
4.2.1 Cycle Inequalities
We characterize those cycles whose subproblem increases the dual lower bound by at least .
Proposition 1.
Let be a cycle with and for . Then, the dual lower bound can be increased by by including a triangulation of .
In order to find such cycles, we apply Alg. 2. This algorithm first records in a disjoint set data structure whether distinct nodes are connected via edges with weight . Then, it visits all edges with . If the endpoints of are connected by a path along which all edges have weight at least , it searches for a shortest such path by means of breadth first search.
4.2.2 Odd Wheel Inequalities
We characterize those odd wheels whose lollipop subproblem increases the lower bound by at least .
Proposition 2.
Let an odd wheel with center node and cycle nodes . Adding the lollipop subproblems for increases by at least if the costs of each triangle are such that the minimal cost of any edge labeling of the triangle cutting precisely one edge incident to is smaller by than the minimal cost of any edge labeling of the triangle cutting or edges incident to . That is:
(16) 
In order to find such odd wheels, we apply Alg. 3. This algorithm builds on our observation that we need to look only at triangles whose subproblem has already been added. Hence, Alg. 3 visits each node and builds a bipartite graph as follows. (An example is depicted in Fig. 6 for a 5wheel and (16) holding true for all triangles of the wheel.) For each triangle such that (16) holds true, four nodes are added to , two copies of each original node. These are joined by edges . If a path from to exists in , we have found a violated odd wheel inequality (5). As is bipartite, a path in corresponds to an odd cycle in . As before, the search for paths is accelerated by connectivity tests via a disjoint set data structure and is carried out by breadth first search.
In the primal, finding a maximally violated odd wheel inequality (5) entails the same construction of the bipartite graph for each node [17]. However, a shortest path search w.r.t. edge costs needs to be carried out by Dijkstra’s algorithm instead of breadth first search. Further complication in the primal comes from the fact that a separation algorithm needs to visit all in order to compute the shortest path in .
4.3 Rounding
Our message passing Alg. 1 improves a dual lower bound on (2), but does not provide a feasible solution of (2)–(4). In order to obtain a feasible multicut, we apply a local search algorithm defined in [26], namely greedy additive edge contraction (GAEC), followed by KernighanLin with joins (KLj). GAEC computes a multicut by greedily contracting those edges for which the join decreases the cost maximally. It stops as soon as no contraction of any edge strictly decreases the cost. KLj attempts to improve a given multicut recursively by applying transformations from three classes: (1) moving nodes between two components, (2) moving nodes from a given component to a newly forming one or (3) joining two components. GAEC and KLj are local search algorithms that output a feasible multicut that need not be optimal.
We apply GAEC and KLj not only to the instance of the minimum cost multicut problem given as input but also to the reparameterization of this instance output by Alg. 1. The rationale for doing so comes from LP duality:
Proposition 3.
Assume maximizes the dual lower bound and the relaxation is tight, i.e.
(17) 
Moreover, let such that is an optimal multicut of . Then,
(18) 
Having run Alg. 1 for a while, we expect to fulfill the sign condition of Prop. 3 approximately. Therefore, the sign of will be a good hint of the edge being cut. Thus, informally, we expect local search algorithms operating on the reparameterized instance of the problem to yield better feasible multicuts than local search algorithms operating on the given instance.
5 Experiments
Solvers
We compare against several state of the art algorithms.

The algorithm MCILP [25] is an efficient implementation of a cutting plane algorithm solving (2) using cycle inequalities (3) in a cutting plane fashion. CPlex [2] is used to solve the underlying ILP problems. The integrality conditions in (4) are directly given to the solver. According to [25] this is beneficial due to the excellent branch and cut capabilities of CPlex [2].

Cut, Glue & Cut [11], abbreviated as CGC, is a move making algorithm using planar maxcut subproblems to improve multicuts.

Fusion moves for correlation clustering [10], abbreviated as CCFusion, fuses multicuts generated by various proposal generator with the help of auxiliary multicut problems, solved in turn by MCILP
. We use randomized hierarchical clustering and randomized watersheds as proposal generators, identified by the suffixes
RHC and RWS. We use parameters for the proposal generators as recommended by the authors [10].
MCILP, CGC and CCFusion are implemented as part of the OpenGM suite [23]. Only MCILP and our solvers MPC and MPCOW generate dual lower bounds. CGC also outputs dual lower bounds, but these are equivalent to the trivial lower bound , where edge weights are as given by the problem. It has been shown that CGC, CCFusion and KL
outperform other primal heuristics
[10], hence we do not compare to any other heuristic algorithm. Also MCILP outperforms the LPbased solver [32], due to the latter using the slower COINOR CLP [15] solver internally, hence we exclude it from the comparison as well.All solvers were run on a laptop computer with a i55200 CPU with 2.2 GHz and 8GB RAM.
Datasets
We compare on 8 datasets of diverse origin.

The knott3d{150300450550} datasets come from a neural circuit reconstruction problem of tissue [5] with , , and voxels. The data is presegmented into supervoxels.

modularity clustering aims to cluster a social network into subgroups based on affinity between individual persons.

CREMI{smalllarge} datasets were constructed as part of the CREMI [1] challenge, which aims to reconstruct neural circuits of the adult fly brain. The images are taken by electron microscopy. The small instances are cropped versions of the large ones. To our knowledge, the CREMIlarge dataset contain the largest multicut problems approached with LPbased methods.
The imageseg, knott3d and modularity clustering datasets were taken from the OpenGM benchmark [23], while the CREMI datasets were kindly provided by their authors and are not yet published.
The dataset consists of 100, 8, 8, 8, 8, 6, 3 and 3 instances, in total 144. Dataset details can be found in Table 1.
Evaluation
We have set a timelimit of one hour for all algorithms. In Table 1 results averaged over all instances in specific datasets are reported. In Figure 7 primal solution energy and dual lower bound (where applicable) averaged over all instances in specific datasets are drawn against runtime.
As can be seen from Table 1, except for dataset CREMIlarge, our solver MPCOW gives dual bounds that are within 0.0045%, 1.9%, 0.0061%, 0.0068%, 0.0017%, 0.0007% and 0.0083% of the dual lower bound obtained by MCILP, which uses the advanced branchandcut facilities CPlex [2] provides. For CREMIlarge only our solvers MPC and MPCOW output dual lower bounds, as MCILP did not finish a single iteration after one hour. As can be seen from Fig. 7 our lower bound usually converges faster than MCILP’s. We conjecture that MCC and MPCOW inside a branchandbound solver can significantly extend the reach of exact methods for the multicut problem.
Strangely, KL does not perform well on imageseg, even though the lower bound we achieve with MPC and MPCOW are not far from the optimal lower bounds computed by MCILP. On the other hand, MPC and MPCOW give much better dual and primal results for modularityclustering early on. Generally, when compared to MCILP’s primal convergence, we give much lower values early on, and for the largescale datasets knott3d550, CREMIsmall, CREMIlarge, MCIILP’s primal solutions are not useful anymore.
Unlike MCILP, our reparametrized costs can be used to improve heuristic primal algorithms. An example of this can be seen in Fig. 8, where reparametrized costs improve KL’s solutions.
Dataset / Algorithm  MPC  MPCOW  CGC  MCILP  CCFusionRWS  CCFusionRHC  

imageseg  #I  100  UB  4730.66  4732.66  4600.81  4434.91  4447.06  4436.33 
#V  LB  4434.69  4434.71  4129.70  4434.91  
#E  time(s)  21.92  41.35  0.14  11.89  1.19  1.30  
modularity clustering  #I  6  UB  0.49  0.49  0.30  0.44  0.00  0.44 
#V  LB  0.53  0.53  0.79  0.52  
#E  time(s)  0.68  0.80  0.15  2911.10  0.00  17.78  
knott3d150  #I  8  UB  4570.61  4570.26  4220.66  4571.69  4534.76  4552.51 
#V  LB  4572.65  4571.97  4855.18  4571.69  
#E  time(s)  0.84  3.81  0.04  2.37  0.26  0.53  
knott3d300  #I  8  UB  27285.41  27285.15  24864.59  27302.78  27242.03  27247.29 
#V  LB  27307.36  27304.64  28901.58  27302.78  
#E  time(s)  475.63  747.81  2.73  227.33  2.96  8.15  
knott3d450  #I  8  UB  78426.70  78426.70  70865.27  78391.32  78386.14  78381.06 
#V  LB  78527.23  78523.89  83272.85  78522.51  
#E  time(s)  3649.05  3640.66  31.56  1840.47  16.52  119.23  
knott3d550  #I  8  UB  136439.78  136439.78  123841.47  135766.90  136464.05  136395.89 
#V  LB  136814.88  136803.34  144703.64  136755.36  
#E  time(s)  3783.33  3745.76  102.42  3683.22  72.94  594.60  
CREMIsmall  #I  3  UB  213167.45  213167.45  194616.60  209594.49  168905.17  213117.84 
#V  LB  213225.65  213225.67  215473.98  213208.94  
#E  time(s)  3645.51  3661.71  319.01  2775.81  3543.61  2555.48  
CREMIlarge  #I  3  UB  3886840.98  3886840.98  3772597.37  3619190.20  
#V  LB  3892753.21  3893090.30  
#E  time(s)  3667.67  3806.33  5978.08  23139.40 
Conclusion
We have shown that LPbased methods are feasible for solving large scale multicut problems on commodity hardware and one does not have to resort to heuristic primal algorithms. We achieve dual bounds very close to those computed by stateoftheart branchandcut solvers. Additionally, our method usually gives much faster dual bound convergence, resulting in superior solutions when terminated early. Also the primal heuristic GAEC + KLj can be improved when run on costs as computed by our method.
It remains an interesting task to integrate primal heuristics more tightly into our message passing approach and further improve the dual lower bound by e.g. embedding our solver into branch and cut.
6 Acknowledgments
The authors would like to thank Vladimir Kolmogorov for helpful discussions. This work is partially funded by the European Research Council under the European Unions Seventh Framework Programme (FP7/20072013)/ERC grant agreement no 616160.
References
 [1] CREMI MICCAI Challenge on circuit reconstruction from Electron Microscopy Images. https://cremi.org.
 [2] IBM ILOG CPLEX Optimizer. http://www01.ibm.com/software/integration/optimization/cplexoptimizer/.
 [3] A. Alush and J. Goldberger. Break and conquer: Efficient correlation clustering for image segmentation. In E. R. Hancock and M. Pelillo, editors, SIMBAD, volume 7953 of Lecture Notes in Computer Science, pages 134–147. Springer, 2013.
 [4] B. Andres, J. H. Kappes, T. Beier, U. Köthe, and F. A. Hamprecht. Probabilistic image segmentation with closedness constraints. In D. N. Metaxas, L. Quan, A. Sanfeliu, and L. J. V. Gool, editors, ICCV, pages 2611–2618. IEEE Computer Society, 2011.
 [5] B. Andres, T. Kröger, K. L. Briggman, W. Denk, N. Korogod, G. Knott, U. Köthe, and F. A. Hamprecht. Globally optimal closedsurface segmentation for connectomics. In A. W. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid, editors, ECCV (3), volume 7574 of Lecture Notes in Computer Science, pages 778–791. Springer, 2012.

[6]
B. Andres, J. Yarkony, B. S. Manjunath, S. Kirchhoff, E. Turetken, C. C.
Fowlkes, and H. Pfister.
Segmenting planar superpixel adjacency graphs w.r.t. nonplanar
superpixel affinity graphs.
Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR)
, 2013.  [7] A. Arasu, C. Ré, and D. Suciu. Largescale deduplication with constraints using dedupalog. In Y. E. Ioannidis, D. L. Lee, and R. T. Ng, editors, ICDE, pages 952–963. IEEE Computer Society, 2009.

[8]
Y. Bachrach, P. Kohli, V. Kolmogorov, and M. Zadimoghaddam.
Optimal coalition structure generation in cooperative graph games.
In
Proceedings of the TwentySeventh AAAI Conference on Artificial Intelligence, July 1418, 2013, Bellevue, Washington, USA.
, 2013.  [9] N. Bansal, A. Blum, and S. Chawla. Correlation clustering. Machine Learning, 56(1):89–113, 2004.
 [10] T. Beier, F. A. Hamprecht, and J. H. Kappes. Fusion moves for correlation clustering. In CVPR, pages 3507–3516. IEEE Computer Society, 2015.
 [11] T. Beier, T. Kröger, J. H. Kappes, U. Köthe, and F. A. Hamprecht. Cut, glue & cut: A fast, approximate solver for multicut partitioning. In CVPR. Proceedings, 2014.
 [12] Y. Chen, S. Sanghavi, and H. Xu. Clustering sparse graphs. In P. L. Bartlett, F. C. N. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors, NIPS, pages 2213–2221, 2012.
 [13] F. Chierichetti, N. Dalvi, and R. Kumar. Correlation clustering in mapreduce. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 641–650, New York, NY, USA, 2014. ACM.
 [14] S. Chopra and M. R. Rao. The partition problem. Mathematical Programming, 59(1):87–115, 1993.
 [15] COINOR CLP, 2016. http://www.coinor.org/projects/Clp.xml.
 [16] E. D. Demaine, D. Emanuel, A. Fiat, and N. Immorlica. Correlation clustering in general weighted graphs. Theor. Comput. Sci., 361(2):172–187, Sept. 2006.
 [17] M. M. Deza and M. Laurent. Geometry of Cuts and Metrics. Springer Publishing Company, Incorporated, 1st edition, 2009.
 [18] M. Elsner and E. Charniak. You talking to me? a corpus and algorithm for conversation disentanglement. In K. McKeown, J. D. Moore, S. Teufel, J. Allan, and S. Furui, editors, ACL, pages 834–842. The Association for Computer Linguistics, 2008.
 [19] M. Elsner and W. Schudy. Bounding and comparing methods for correlation clustering beyond ILP. In Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing, ILP ’09, pages 19–27, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics.
 [20] A. Gionis, H. Mannila, and P. Tsaparas. Clustering aggregation. ACM Trans. Knowl. Discov. Data, 1(1):4, 2007.
 [21] M. Guignard and S. Kim. Lagrangean decomposition for integer programming: theory and applications. Revue française d’automatique, d’informatique et de recherche opérationnelle. Recherche opérationnelle, 21(4):307–323, 1987.
 [22] Gurobi Optimization, Inc., 2015. http://www.gurobi.com.
 [23] J. H. Kappes, B. Andres, F. A. Hamprecht, C. Schnörr, S. Nowozin, D. Batra, S. Kim, B. X. Kausler, T. Kröger, J. Lellmann, N. Komodakis, B. Savchynskyy, and C. Rother. A comparative study of modern inference techniques for structured discrete energy minimization problems. International Journal of Computer Vision, 115(2):155–184, 2015.
 [24] J. H. Kappes, M. Speth, B. Andres, G. Reinelt, and C. Schnörr. Globally optimal image partitioning by multicuts. In EMMCVPR. Springer, Springer, 2011.
 [25] J. H. Kappes, M. Speth, G. Reinelt, and C. Schnörr. Higherorder segmentation via multicuts. CoRR, abs/1305.6387, 2013.
 [26] M. Keuper, E. Levinkov, N. Bonneel, G. Lavoué, T. Brox, and B. Andres. Efficient decomposition of image and mesh graphs by lifted multicuts. In ICCV, 2015.
 [27] S. Kim, S. Nowozin, P. Kohli, and C. D. Yoo. Higherorder correlation clustering for image segmentation. In J. ShaweTaylor, R. S. Zemel, P. L. Bartlett, F. C. N. Pereira, and K. Q. Weinberger, editors, NIPS, pages 1530–1538, 2011.
 [28] V. Kolmogorov. Convergent treereweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell., 28(10):1568–1583, 2006.
 [29] V. Kolmogorov. A new look at reweighted message passing. IEEE Trans. Pattern Anal. Mach. Intell., 37(5):919–930, 2015.
 [30] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on, volume 2, pages 416–423 vol.2, 2001.
 [31] V. Ng and C. Cardie. Improving machine learning approaches to coreference resolution. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics  ACL ’02, (July):104, 2001.

[32]
S. Nowozin and S. Jegelka.
Solution stability in linear programming relaxations: graph partitioning and unsupervised learning.
In A. P. Danyluk, L. Bottou, and M. L. Littman, editors, ICML, volume 382 of ACM International Conference Proceeding Series, pages 769–776. ACM, 2009.  [33] E. Sadikov, J. Madhavan, L. Wang, and A. Halevy. Clustering query refinements by user intent. In World Wide Web Conference (WWW). ACM Press, April 2010.
 [34] D. Sontag, D. K. Choe, and Y. Li. Efficiently searching for frustrated cycles in MAP inference. In UAI, pages 795–804. AUAI Press, 2012.
 [35] W. M. Soon, H. T. Ng, and D. C. Y. Lim. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27(4):521–544, 2001.
 [36] P. Swoboda, J. Kuske, and B. Savchynskyy. A dual ascent framework for Lagrangean decomposition of combinatorial problems. CoRR, 2016.
 [37] M. J. Wainwright and M. I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(12):1–305, 2008.
 [38] J. Yarkony, T. Beier, P. Baldi, and F. A. Hamprecht. Parallel multicut segmentation via dual decomposition. In New Frontiers in Mining Complex Patterns  Third International Workshop, NFMCP 2014, Held in Conjunction with ECMLPKDD 2014, Nancy, France, September 19, 2014, Revised Selected Papers, pages 56–68, 2014.
 [39] J. Yarkony, A. Ihler, and C. C. Fowlkes. Fast Planar Correlation Clustering for Image Segmentation, pages 568–581. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.
7 Apendix
7.1 Proofs
Proof of Proposition 1
Proposition.
Let be a cycle with and for . Then, the dual lower bound can be increased by by including a triangulation of .
Proof.
Let cycle have vertices and assume that with for notational purposes. After triangulation, triangle factors on vertices will be present in the model. Let the current reparametrization be .
The triangle factors corresponding to cycle will enforce the cycle inequality (3)
(19) 
It holds that
(20) 
where the dual lower bound on cycle . The first inequality above is due to either in the optimal solution or one being one due to (19). The second inequality is due to the fact that (i) by linear programming duality and (ii) the triangle factors enforce more inequalities than only (19). ∎
Proof of Proposition 2
Proposition.
Let an odd wheel with center node and cycle nodes . Adding the lollipop subproblems for increases by at least if the costs of each triangle are such that the minimal cost of any edge labeling of the triangle cutting precisely one edge incident to is smaller by than the minimal cost of any edge labeling of the triangle cutting or edges incident to . That is:
(21) 
Proof.
Condition (16) means that in all triangles in the odd wheel , the minimal assignment with regard to the current reparametrization, has exactl one edge incident to . All other assignment have cost greater by at least . As is odd, there is no possiblity to combine those local assignments to a global assignment on .
On the other hand, our construction of lollipop factors ensures exactness on odd wheels. As at least one triangle must then be assigned costs that are not locally optimal and which is larger by than its minimal reparametrized cost, the result follows. ∎
Proof of Proposition 3
Proposition.
Assume maximizes the dual lower bound and the relaxation is tight, i.e.
(22) 
Moreover, let such that is an optimal multicut of . Then,
(23) 
Proof.
Follows from the complementary slackness conditions in linear programming duality. ∎
7.2 Detailed experimental evaluation
In Table 2 a detailed per instance evaluation of all algorithms considered in the experimental section can be found.
Instance  MPC  MCCOW  CGC  MCILP  CCFusionRWS  CCFusionRHC  

imageseg  
101087.bmp  UB  2906.16  2906.16  2853.56  2789.90  2800.22  2789.90 
LB  2788.69  2788.95  2622.38  2789.90  
runtime(s)  0.31  0.59  0.02  5.11  0.63  0.81  
102061.bmp  UB  3017.57  3017.57  3090.33  2943.77  2963.42  2944.46 
LB  2932.80  2933.39  2750.99  2943.77  
runtime(s)  3.26  5.83  0.05  8.75  0.61  0.96  
103070.bmp  UB  4437.16  4444.27  4457.13  4199.38  4205.03  4200.64 
LB  4196.88  4196.94  3842.84  4199.38  
runtime(s)  11.84  15.93  0.15  6.97  1.32  0.98  
105025.bmp  UB  6332.73  6333.88  6290.71  6055.33  6070.84  6061.05 
LB  6045.21  6046.08  5506.01  6055.33  
runtime(s)  33.75  55.69  0.35  32.15  1.91  1.59  
106024.bmp  UB  1832.16  1832.16  1654.27  1599.25  1618.18  1599.83 
LB  1597.07  1597.21  1466.60  1599.25  
runtime(s)  0.64  4.09  0.02  4.76  0.18  0.25  
108005.bmp  UB  6839.76  6841.08  6855.33  6578.03  6584.62  6578.18 
LB  6567.48  6569.63  6151.29  6578.03  
runtime(s)  4.58  22.38  0.26  10.86  2.49  1.54  
108070.bmp  UB  9082.04  9083.42  8612.32  8422.24  8445.73  8424.36 
LB  8414.64  8414.99  7818.70  8422.24  
runtime(s)  28.23  52.32  0.41  26.67  3.35  1.75  
108082.bmp  UB  5125.72  5127.99  5090.88  4800.15  4815.78  4806.04 
LB  4786.46  4789.19  4380.66  4800.15  
runtime(s)  6.85  25.07  0.16  13.51  1.20  1.89  
109053.bmp  UB  4575.76  4579.97  4616.61  4421.13  4424.05  4421.13 
LB  4412.45  4413.08  4021.22  4421.13  
runtime(s)  20.74  71.75  0.17  9.64  1.50  0.84  
119082.bmp  UB  4512.04  4512.04  4642.96  4530.71  4535.85  4532.29 
LB  4526.91  4526.91  4346.24  4530.71  
runtime(s)  0.10  0.10  0.02  0.48  1.53  1.57  
12084.bmp  UB  7502.34  7502.34  7443.28  7284.45  7301.17  7287.68 
LB  7276.26  7277.38  6941.02  7284.45  
runtime(s)  3.48  8.10  0.15  2.47  2.20  3.99  
123074.bmp  UB  4200.21  4204.24  4031.03  3842.74  3856.82  3847.83 
LB  3829.51  3829.04  3439.47  3842.74  
runtime(s)  35.27  24.55  0.14  23.01  0.47  0.86  
126007.bmp  UB  2756.86  2760.64  2747.72  2684.83  2706.76  2685.26 
LB  2677.02  2677.08  2512.08  2684.83  
runtime(s)  0.23  0.53  0.01  0.79  0.38  0.82  
130026.bmp  UB  6066.91  6066.91  5580.58  5350.83  5369.95  5354.31 
LB  5331.00  5336.93  4828.82  5350.83  
runtime(s)  36.91  167.54  0.26  19.66  0.99  1.40  
134035.bmp  UB  6840.69  6840.69  6679.89  6578.98  6595.87  6579.62 
LB  6562.20  6565.03  6166.95  6578.98  
runtime(s)  8.44  43.34  0.19  28.82  1.81  1.33  
14037.bmp  UB  1566.34  1582.13  1431.56  1383.14  1393.66  1383.14 
LB  1375.55  1375.55  1274.27  1383.14  
runtime(s)  0.09  0.17  0.01  0.25  0.13  0.22  
143090.bmp  UB  1728.22  1728.22  1807.41  1714.38  1725.88  1715.76 
LB  1712.44  1712.26  1595.54  1714.38  
runtime(s)  0.43  2.49  0.01  0.56  0.44  0.34  
145086.bmp  UB  3806.23  3808.75  3407.83  3322.21  3329.14  3322.59 
LB  3319.36  3319.34  3197.53  3322.21  
runtime(s)  0.32  0.45  0.01  0.83  0.41  1.59  
147091.bmp  UB  4092.25  4092.25  4129.72  3973.71  3982.30  3975.15 
LB  3968.35  3970.58  3734.67  3973.71  
runtime(s)  1.50  10.96  0.10  13.24  0.90  0.83  
148026.bmp  UB  8411.55  8411.55  8436.70  8205.98  8226.20  8207.72 
LB  8198.62  8199.84  7780.68  8205.98  
runtime(s)  6.48  29.65  0.15  4.49  3.10  2.73  
148089.bmp  UB  6680.96  6682.82  6666.83  6439.58  6455.48  6440.33 
LB  6432.06  6431.52  6030.94  6439.58  
runtime(s)  10.44  15.57  0.17  18.64  2.00  1.53  
156065.bmp  UB  5798.42  5801.59  5429.45  5234.15  5248.18  5234.76 
LB  5224.22  5225.10  4857.14  5234.15  
runtime(s)  10.19  35.24  0.15  18.51  1.49  1.12  
157055.bmp  UB  4797.86  4798.32  4768.87  4685.17  4696.42  4685.17 
LB  4679.04  4679.39  4472.39  4685.17  
runtime(s)  0.49  3.10  0.02  2.60  1.37  1.48  
159008.bmp  UB  4688.87  4694.40  4814.85  4540.87  4569.12  4541.22 
LB  4534.88  4535.76  4217.95  4540.87  
runtime(s)  3.53  21.00  0.10  16.64  0.83  1.29  
160068.bmp  UB  3263.91  3265.02  3264.31  3089.32  3103.23  3089.32 
LB  3088.23  3088.17  2866.45  3089.32  
runtime(s)  1.18  2.28  0.05  1.80  0.61  1.07  
16077.bmp  UB  4440.51  4443.05  4408.62  4227.88  4236.46  4228.78 
LB  4224.10  4224.60  3921.75  4227.88  
runtime(s)  4.37  13.81  0.07  3.22  1.18  1.63  
163085.bmp  UB  4493.17  4493.17  4577.62  4381.13  4406.98  4384.59 
LB  4370.36  4371.00  3983.52  4381.13  
runtime(s)  21.91  35.30  0.15  9.82  0.91  1.30  
167062.bmp  UB  1623.20  1623.20  1281.48  1273.72  1275.87  1275.67 
LB  1273.01  1273.29  1233.39  1273.72  
runtime(s)  0.10  0.66  0.00  1.00  0.11  0.17  
167083.bmp  UB  8979.42  8979.42  8545.37  8331.63  8344.35  8331.90 
LB  8325.80  8328.03  7921.06  8331.63  
runtime(s)  9.83  53.28  0.23  11.74  2.27  1.80  
170057.bmp  UB  3602.31  3602.31  3355.95  3266.17  3273.20  3266.73 
LB  3260.19  3260.98  2989.38  3266.17  
runtime(s)  21.07  26.30  0.07  18.20  0.53  0.67  
175032.bmp  UB  11863.40  11863.40  11926.00  11542.63  11566.74  11547.67 
LB  11525.63  11526.57  10543.16  11542.63  
runtime(s)  623.66  632.98  1.27  165.83  3.25  3.98  
175043.bmp  UB  8022.65  8033.50  8224.34  7816.92  7844.49  7822.01 
LB  7809.60  7811.17  7136.44  7816.92 
Comments
There are no comments yet.