1 Introduction
Image segmentation literature distinguishes semantic segmentation  associating each pixel with a class label  and instance segmentation, i.e. detecting and segmenting individual objects while ignoring the background. The joint task of simultaneously assigning a class label to each pixel and grouping pixels to instances has been addressed under different names, including semantic instance segmentation, scene parsing [39], image parsing [40]
, holistic scene understanding
[43] or instanceseparating semantic segmentation [28]. Recently, a new metric and evaluation approach to such problems has been introduced under the name of panoptic segmentation [19].From a graph theory perspective, semantic instance segmentation corresponds to the simultaneous partitioning and labeling of a graph. Most greedy graph partitioning algorithms are defined on graphs encoding attractive interactions only. Clusters are then formed through agglomeration or division until a userdefined termination criterion is met (often a threshold or a desired number of clusters). These algorithms perform pure instance segmentation. The semantic labels for the segmented instances need to be generated independently.
If repulsive  as well as attractive  forces are defined between the nodes of the graph, partitioning can be formulated as a Multicut problem [2]. In this formulation clusters emerge naturally without the need for a termination criterion. Furthermore, the Multicut problem can be extended to include the labeling of the graph, delivering a semantic instance segmentation from a joint optimization of partitioning and labeling [24]. The main drawback of this formulation is that the Multicut problem is NPhard.
We propose to solve the joint partitioning and labeling problem by an efficient algorithm which we term Semantic Mutex Watershed (SMW), inspired by the Mutex Watershed [41]. In more detail, in this contribution we:

[nosep]

propose a fast algorithm for joint graph partitioning and labeling

prove that the algorithm minimizes (exactly) an objective function closely related to the Symmetric Multiway Cut objective

demonstrate competitive performance on natural and biological images.
2 Related Work
Semantic segmentation.
Stateoftheart semantic segmentation algorithms are based on convolutional neural networks (CNNs) which are trained endtoend. The networks commonly follow the design principles of image classification networks (
e.g. [16, 38, 23]), replacing the fully connected layers at the end with convolutional layers to form a fully convolutional network [30]. This architecture can be further extended to include encoderdecoder paths [37], dilated or atrous convolutions [45, 6] and pyramid pooling modules [7, 46].Instance segmentation.
Many instance segmentation methods use a detection or a region proposal framework as their basis; object segmentation masks are then predicted inside region proposals. A cascade of multiple networks is employed by [12], each solving a specific subtask to find the instance labeling. MaskRCNN [15] builds on the bounding box prediction capabilities of FasterRCNN [36] to simultaneously produce masks and class predictions. An extension of this method with an additional semantic segmentation branch has been proposed in [18] as a single network for semantic instance segmentation.
Graphbased segmentation.
Graphbased methods, used independently or in combination with machine learning on pixels, form another popular basis for image segmentation algorithms
[14]. In this case, the graph is built from pixels or superpixels of the image and the instance segmentation problem is formulated as graph partitioning. When the number of instances is not known in advance and repulsive interactions are present between the graph nodes, graph partitioning can in turn be formulated as a Multicut or correlation clustering problem [2]. This NPhard problem can be solved reasonably fast for small problem sizes with integer linear programming solvers
[1] or approximate algorithms [34, 5]. A modified Multicut objective is introduced by [41] together with the Mutex Watershed  an efficient clustering algorithm for its optimization.The Multicut objective can be extended to solve a joint graph partitioning and labeling problem [17, 24] for simultaneous instance and semantic segmentation. In practice, the computational complexity of the joint problem only allows for approximate solutions [28], possibly combined with reducing the problem size by oversegmentation into superpixels. This formulation has been applied to natural images by [20] and to biological images by [22].
3 The Semantic Mutex Watershed
The centerpiece of this paper, the Semantic Mutex Watershed algorithm, solves the semantic instance segmentation problem by jointly finding a graph partitioning and labeling. In this section we present the graphbased formulation of the semantic instance segmentation problem and define an objective function related to the Symmetric Multiway Cut problem [24]. Then we introduce the Semantic Mutex Watershed algorithm and prove that it can optimize this objective efficiently. Finally we show that the proposed objective constitutes a generalization of the Mutex Watershed Objective introduced in [41].
3.1 Joint Partitioning and Labeling of Graphs
Similar to instance segmentation algorithms, we build a graph of image pixels (voxels) or superpixels and formulate the semantic instance segmentation problem as joint partitioning and labeling of the graph.
Weighted graph with terminal nodes.
For an undirected weighted graph we refer to the nodes as internal nodes and the edges as internal edges. We differentiate between attractive edges and repulsive edges that make up the internal edges . Each edge is associated with a realvalued positive weight . The weights encode the attraction and repulsion between the incident nodes of each edge. A large attractive weight encodes a high tendency for the nodes and to belong to the same partition element. Equivalently, a large repulsive weight indicates a strong inclination of and to belong to separate clusters.
Semantic instance segmentation can be achieved by clustering the internal nodes and assigning a semantic label to each cluster. We extend by terminal nodes where each is associated with a label . Every internal node is connected to every by a weighted semantic edge . Here, a large semantic weight implies a strong association of internal node with the label of the terminal node . The extended graph thus becomes with and Figure 1(a) shows a simple example of such an extended graph.
Symmetric Multiway Cut.
In the Symmetric Multiway Cut an optimal semantic instance segmentation of such a graph is formulated as a constrained energy minimization/integer linear program (ILP) [24]:
(1)  
(2)  
(3)  
(4)  
(5)  
(6) 
where . The segmentation consistency is ensured by the cycle inequalities (2) and (3). Equation 4 enforces that every node is uniquely assigned to a terminal node. Equations 6 and 5 ensure the consistency between partition labeling and semantic labeling. A detailed discussion of the objective’s properties will follow in section 3.3 and the relation to the objective in [24] is derived in appendix A. Although this is in general a hard optimization problem, we will show that for sufficiently large this objective can be solved exactly and efficiently by the algorithm introduced in the next section.
3.2 The Semantic Mutex Watershed Algorithm.
We will now introduce a simple algorithm that greedily constructs a solution to eq. 1. Although this will most likely not be an optimal solution to the NPhard Symmetric Multiway Cut in general, we will show in section 3.3 that it becomes optimal when is large.
The clustering and label assignment of is described by a set of active edges, which are chosen by the algorithm: where , and encode merges, mutual exclusions and label assignments, respectively. In order to restrict to a consistent partitioning and labeling we will make the following definitions:
We define two internal nodes as connected if they are connected by active attractive edges, i.e.
(7) 
Here denotes a path from node to node . We also define the mutual exclusion between two nodes as
(8) 
Two nodes are thus mutual exclusive if they are connected by a path from to with exactly one repulsive edge. Furthermore, a label is assigned to a node if this node is connected to the corresponding terminal node by attractive and semantic edges:
(9) 
For unlabeled nodes we use the notation .
Algorithm.
The Semantic Mutex Watershed algorithm is an extension of the Mutex Watershed algorithm introduced by [42]. It augments the partitioning of the latter with a consistent labeling. The algorithm is shown in algorithm 1 with the additions to [42] highlighted. In the following we explain the syntax and procedure of the shown pseudocode.
All edges are sorted in descending order of their weight and put in a priority queue. While traversing the queue, the decision to add an edge to the set is made depending on the type of edge:
Attractive edges: The edge is added if the incident nodes are not mutual exclusive and not labeled differently.
Repulsive edges: The edge is added if the incident nodes are not connected.
Semantic edges: The edge is added if the node is either unlabeled or already has the same label as the edge’s terminal node.
Following these rules, the set of attractive edges in the final set form clusters in the graph, which are each connected to a single terminal node indicating the labeling. Figure 1(b) shows a simple example of such an active set.
Efficient Implementation with MaximumSpanningTrees.
The SMW is similar to the efficient Kruskal’s maximum spanning tree algorithm [25] and can feasibly be applied to pixelgraphs of large images and even image volumes. Our implementation utilizes an efficient unionfind data structure, mutex relations are realized/searched through a hash table.
Mutex Watershed as Special Case.
The Mutex Watershed algorithm is embedded in the Semantic Mutex Watershed as the special case when there are zero or one label ().
3.3 The Semantic Mutex Watershed Objective
In this section we prove that the Semantic Mutex Watershed Algorithm solves the ILP objective in eq. 1 for sufficiently large (dominant) powers . To this end, we will extend the proof of [41] by semantic edges. First, we will review the definitions of dominant powers and mutex constraints. Second, we introduce an additional set of constraints acting on semantic edges and use it to define the Semantic Mutex Watershed Objective as a relaxed version of the Symmetric Multiway Cut. Finally, we prove that the solution found by the SMW is indeed optimal.
Dominant Power.
Let be an edgeweighted graph, with unique weights . We call a dominant power if:
(10) 
Note that there exists a dominant power for any finite set of edges, since for any we can divide (10) by and observe that the normalized weights (and any finite sum of these weights) converges to 0 when tends to infinity.
Semantic Mutex Watershed Constraints.
To formalize the rules of the algorithm defined above, we first define special subsets of the active set . First, the set of all cycles containing exactly one repulsive edge is defined as
(11) 
We define the mutex constraint as requiring , which is exactly the rule that two mutual exclusive nodes must not be connected.
Furthermore, we define the set of all paths that connect two distinct terminal nodes through attractive and semantic edges:
(12) 
The algorithm must never connect two terminal nodes through such a path, thus we define the label constraint . This ensures the consistency between the partitioning and labeling. The mutex and label constraint are necessary but not sufficient to fulfill the linear constraints in eqs. 6, 5, 4, 3 and 2 (the formal derivation can be found in appendix B).
Lemma 3.1 (Optimality of the Semantic Mutex Watershed).
Let be an edgeweighted graph extended by terminal nodes , with unique weights and a dominant power. Then the Semantic Mutex Watershed Algorithm 1 finds the optimal solution to the integer linear program
(13)  
s.t.  (14)  
(15)  
with  (16) 
Proof.
[41] show that for the SMW finds the optimal solution because it enjoys the properties greedy choice and optimal substructure. Their proof of optimal substructure does not rely on the specific constraints in the ILP. Thus it can also be applied with the additional constraint in eq. 15, giving the ILP eqs. 16, 15, 14 and 13 optimal substructure.
In every iteration the SMW adds the feasible edge with the largest weight to the active set. Due to the dominant power, its energy contribution is larger than for any combination of edges with . Thus, SMW has the greedy choice property [11]. It follows by induction that the SMW algorithm finds the globally optimal solution to the SMW objective.
∎
We can now finally observe that the SMW algorithm always yields a consistent graph partitioning and labeling which fulfills the Symmetric Multiway Cut constraints. Thus, the Semantic Mutex Watershed algorithm returns an optimal solution of eqs. 6, 5, 4, 3, 2 and 1 if is set to a dominant power. In particular, if is dominant then the SMW solution is also an optimal solution to the Symmetric Multiway Cut.
4 Experiments



Results on Cityscapes using semantic unaries (Deeplab 3+ network) and affinities derived from MaskRCNN foreground probability. Colors indicate predicted semantic classes with variations for separate instances.
Rightmost column: Results for the sponge dataset. Cellbodies are colored in blue, microvilli in green and flagella in red.We will now demonstrate how to apply the SMW algorithm to semantic instance segmentation of 2D and 3D images. We start from showing how existing CNNs can be used as graph weight estimators and compare different sources of edge weights on the Cityscapes dataset. Additionally, we apply the SMW algorithm to a 3D electron microscopy volume and demonstrate its efficiency and scalability.
4.1 Affinity Generation with Neural Networks
The only input to the SMW are the graph weights; it does not require any hyperparamters such as thresholds. Consequently, its segmentation quality relies on good estimates of the graph weights . In this section we present how stateoftheart CNNs can be used as sources for these weights.
Affinity Learning.
Affinities are commonly used in instance segmentation; many modern algorithms train CNNs to directly predict pixel affinities. A universal approach is to employ a stencil pattern that describes for each pixel which neighbours to consider for the affinity computation. Regularly spaced, multiscale stencil patterns are widely used for natural images [31, 29] and biomedical data [42, 27].
The predicted affinities are usually in the interval and can be interpreted as pseudoprobabilities. We use these affinities directly as weights for the attractive edges and invert them to get the repulsive edge weights.
MaskRCNN
produces overlapping masks that have to be resolved for a consistent panoptic segmentation. We achieve this with the SMW by deriving affinities from the foreground probabilities of each mask. A straightforward approach is to compute the (attractive) affinity of two pixels as their joint foreground probability, weighted by the classification score : .
We find that sparse repulsive edges work well in practice, as they lead to faster inference and reduced oversegmentation on the instance boundaries. +For this reason, we sample random points from all pairs of masks and add (repulsive) edges with weight proportional to a soft intersection over union of two masks and :
(17) 
Semantic Segmentation CNNs.
achieve high quality results on semantic segmentation tasks. The output of the last softmax layer usually used in these networks can be interpreted as the normalized probability of each pixel belonging to each class. Thus, we can use these predictions directly as semantic weights
.Additionally, we derive affinities from the stuff class probabilities; we treat each stuff class separately and again compute the affinity of two pixels as their joint probability of being in each stuff class , i.e.: . This cannot be done for thing classes since they can have multiple instances.
4.2 Panoptic Segmentation on Cityscapes
We apply the SMW on the challenging task of panoptic segmentation on the Cityscapes dataset [10]. We illustrate how the different sources of affinities can be used and combined and show their different strengths and weaknesses.
Dataset.
The Cityscapes dataset consists of urban street scene images taken from a driver’s perspective. It has 5k densely annotated images separated into train (2975), val (500) and test (1525) set. Since there is no public evaluation server for panoptic segmentation on the test set, we report all results on the validation set. There are 19 classes with 11 stuff classes and 8 thing classes.
Implementation Details.
We employ and combine multiple sources of graph weights to build the SMW graph. We train a Deeplab 3+ [8] network for semantic edge weight and affinities prediction following [29]. We employ the MaskRCNN [15] implementation provided by [32] and train a model on Cityscapes following [15]’s training configuration. Further implementation details can be found in section C.1.
Study of Affinity Sources.
We evaluate the semantic instance segmentation performance of the SMW in terms of the “panoptic” metric using different combinations of the graph weight sources discussed above. In table 2 we compare the PQ metric on the Cityscapes dataset.
The best performance can be achieved with a combination of MaskRCNN affinities and Deeplab 3+ for semantic predictions outperforming the strong baseline of [18] listed in table 2 and shown in fig. 2 and the supplementary fig. 3. Through observations on the images, we find that MaskRCNN affinities are more reliable in detecting small objects as well as in connecting fragmented instances. Note that PQ mostly measures detection quality which is then weighted by the segmentation quality of the found instances, hence the detection strength of the MaskRCNN shines through.
We observe that using all sources together leads to a performance drop of 10 percentage points below the best result. We believe this is due to the greedy nature of the SMW which selects the strongest of all provided edges. This example demonstrates how important it is to carefully select/train the algorithm input.
4.3 Semantic Instance Segmentation of 3D EM Volumes
Semantic instance segmentation is an important task in biomedical image analysis where classes naturally arise through cellular structure. We use a 3D EM image dataset to compare the SMW to algorithms that separately optimize instance segmentation and semantic class assignment.
Dataset.
The dataset consists of two FIBSEM volumes of a sponge choanocye chamber. The data was acquired in [33] to investigate protoneural cells in sponges using the segmentation approach introduced in [35]. These cells filter nutrients from water by creating a flow with the beating of a flagellum and absorbing the nutrients through microvilli that surround the flagellum in a collar [26] (see fig. 2). In order to investigate this process in detail, a precise semantic instance segmentation of the cellbodies, flagella and microvilli is needed. The dataset consists of three EM image volumes of size pixel ( m).
Implementation Details.
We predict affinities with two separate 3D UNets [9] to derive graph edge weights and semantic class probabilities respectively. We adopt the training procedure of [42]
which uses the Dice Coefficient as the loss function. We use two volumes for training and one for testing.
We also implement baseline approaches which start from the same network predictions, but do not perform joint labeling and partitioning. First, we compare to instance segmentation with the Mutex Watershed, followed by assigning instances the semantic label of the strongest semantic edge (MWSMAX). In addition, we compute connected components of the semantic predictions () and shortrange affinities ().
Results.
The PQ values in table 2 show that the SMW outperforms the baselines approaches that separately optimize instance segmentation and semantic class assignment. An additional analysis can be found in the appendix fig. 4, where we measure the runtimes for different volume sizes and observe almost linear scaling behavior.
5 Conclusion
We have introduced a new method for joint partitioning and labeling of weighted graphs as a generalization of the Mutex Watershed algorithm. We have shown that it optimally solves an objective function closely related to the objective of the Symmetric Multiway Cut problem. Our experiments demonstrate that SMW with graph edge weights predicted by convolutional neural networks outperform strong baselines on natural and biological images. Any improvement in the CNN performance will translate directly to an improvement of the SMW results. However, we also observe that the extreme value selection used by the SMW to assign edges to the active set can lead to suboptimal performance when diverse edge weights sources are combined. Empirically, the algorithm scales almost linearly with the number of graph edges making it applicable to large images and volumes without prior oversegmentation into superpixels. The source code will be made available upon publication.
References
 [1] Bjoern Andres, Kevin L. Briggman, Natalya Korogod, Graham Knott, Ullrich Koethe, and Fred A. Hamprecht. Globally Optimal ClosedSurface Segmentation for Connectomics. In Computer Vision – ECCV 2012, volume 7574, pages 778–791. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.
 [2] Bjoern Andres, Jörg H. Kappes, Thorsten Beier, Ullrich Köthe, and Fred A. Hamprecht. Probabilistic image segmentation with closedness constraints. In Computer Vision (ICCV), 2011 IEEE International Conference On, pages 2611–2618. IEEE, 2011.

[3]
Anurag Arnab and Philip HS Torr.
Pixelwise instance segmentation with a dynamically instantiated
network.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 441–450, 2017.  [4] Min Bai and Raquel Urtasun. Deep watershed transform for instance segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2858–2866. IEEE, 2017.
 [5] Thorsten Beier, Constantin Pape, Nasim Rahaman, Timo Prange, Stuart Berg, Davi D Bock, Albert Cardona, Graham W Knott, Stephen M Plaza, Louis K Scheffer, et al. Multicut brings automated neurite segmentation closer to human performance. Nature Methods, 14(2):101, 2017.
 [6] LiangChieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. In ICLR, 2016.
 [7] LiangChieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
 [8] LiangChieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. EncoderDecoder with Atrous Separable Convolution for Semantic Image Segmentation. arXiv preprint arXiv:1802.02611, 2018.
 [9] Özgün Çiçek, Ahmed Abdulkadir, Soeren S Lienkamp, Thomas Brox, and Olaf Ronneberger. 3D UNet: Learning dense volumetric segmentation from sparse annotation. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 424–432. Springer, 2016.
 [10] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The Cityscapes Dataset for Semantic Urban Scene Understanding. arXiv:1604.01685 [cs], Apr. 2016.
 [11] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Third Edition. The MIT Press, 3rd edition, 2009.
 [12] Jifeng Dai, Kaiming He, and Jian Sun. InstanceAware Semantic Segmentation via Multitask Network Cascades. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3150–3158, Las Vegas, NV, USA, June 2016. IEEE.
 [13] Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, and Kevin P. Murphy. Semantic Instance Segmentation via Deep Metric Learning. arXiv:1703.10277 [cs], Mar. 2017.
 [14] Pedro F. Felzenszwalb and Daniel P. Huttenlocher. Efficient GraphBased Image Segmentation. International Journal of Computer Vision, 59(2):167–181, Sept. 2004.
 [15] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask RCNN. arXiv:1703.06870 [cs], Mar. 2017.
 [16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs], Dec. 2015.
 [17] Jörg H. Kappes, Markus Speth, Björn Andres, Gerhard Reinelt, and Christoph Schn. Globally optimal image partitioning by multicuts. In Yuri Boykov, Fredrik Kahl, Victor Lempitsky, and Frank R. Schmidt, editors, Energy Minimization Methods in Computer Vision and Pattern Recognition, volume 6819, pages 31–44. Springer Berlin Heidelberg, 2011.
 [18] Alexander Kirillov, Ross Girshick, Kaiming He, and Piotr Dollár. Panoptic Feature Pyramid Networks. arXiv:1901.02446 [cs], Jan. 2019.
 [19] Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollár. Panoptic Segmentation. arXiv:1801.00868 [cs], Jan. 2018.
 [20] Alexander Kirillov, Evgeny Levinkov, Bjoern Andres, Bogdan Savchynskyy, and Carsten Rother. Instancecut: From edges to instances with multicut. In CVPR, volume 3, page 9, 2017.
 [21] Shu Kong and Charless Fowlkes. Recurrent Pixel Embedding for Instance Grouping. arXiv preprint arXiv:1712.08273, 2017.
 [22] N. E. Krasowski, T. Beier, G. W. Knott, U. Kothe, F. A. Hamprecht, and A. Kreshuk. Neuron Segmentation With HighLevel Biological Priors. IEEE Transactions on Medical Imaging, 37(4):829–839, Apr. 2018.
 [23] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, May 2017.
 [24] Thorben Kroeger, Jörg H. Kappes, Thorsten Beier, Ullrich Koethe, and Fred A. Hamprecht. Asymmetric Cuts: Joint Image Labeling and Partitioning. In Pattern Recognition, volume 8753, pages 199–211. Springer International Publishing, Cham, 2014.
 [25] Joseph B Kruskal. On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. Proceedings of the American Mathematical Society, page 3, 1956.
 [26] PaulFriedrich Langenbruch and Norbert Weissenfels. Canal systems and choanocyte chambers in freshwater sponges (porifera, spongillidae). Zoomorphology, 107(1):11–16, 1987.
 [27] Kisuk Lee, Jonathan Zung, Peter Li, Viren Jain, and H Sebastian Seung. Superhuman Accuracy on the SNEMI3D Connectomics Challenge. arXiv preprint arXiv:1706.00120, 2017.
 [28] Evgeny Levinkov, Jonas Uhrig, Siyu Tang, Mohamed Omran, Eldar Insafutdinov, Alexander Kirillov, Carsten Rother, Thomas Brox, Bernt Schiele, and Bjoern Andres. Joint Graph Decomposition & Node Labeling: Problem, Algorithms, Applications. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1904–1912, Honolulu, HI, July 2017. IEEE.
 [29] Yiding Liu, Siyu Yang, Bin Li, Wengang Zhou, JiZeng Xu, Houqiang Li, and Yan Lu. Affinity Derivation and Graph Merge for Instance Segmentation. In The European Conference on Computer Vision (ECCV), page 18, Sept. 2018.
 [30] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015.
 [31] Michael Maire, Takuya Narihira, and Stella X. Yu. Affinity CNN: Learning pixelcentric pairwise relations for figure/ground embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 174–182, 2016.

[32]
Francisco Massa and Ross Girshick.
Maskrcnnbenchmark: Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
2018.  [33] Jacob M Musser, Klaske J Schippers, Michael Nickel, Giulia Mizzon, Andrea B Kohn, Constantin Pape, Jörg U Hammel, Florian Wolf, Cong Liang, Ana HernándezPlaza, et al. Profiling cellular diversity in sponges informs animal cell type and nervous system evolution. BioRxiv, page 758276, 2019.
 [34] Constantin Pape, Thorsten Beier, Peter Li, Viren Jain, Davi D. Bock, and Anna Kreshuk. Solving Large Multicut Problems for Connectomics via Domain Decomposition. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pages 1–10, Venice, Oct. 2017. IEEE.
 [35] Constantin Pape, Alex Matskevych, Adrian Wolny, Julian Hennies, Giulia Mizzon, Marion Louveaux, Jacob Musser, Alexis Maizel, Detlev Arendt, and Anna Kreshuk. Leveraging domain knowledge to improve microscopy image segmentation with lifted multicuts. Frontiers in Computer Science, 1:6, 2019.
 [36] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster RCNN: Towards RealTime Object Detection with Region Proposal Networks. arXiv:1506.01497 [cs], June 2015.
 [37] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. Unet: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 234–241. Springer, 2015.
 [38] Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for LargeScale Image Recognition. arXiv:1409.1556 [cs], Sept. 2014.
 [39] Joseph Tighe, Marc Niethammer, and Svetlana Lazebnik. Scene Parsing with Object Instance Inference Using Regions and Perexemplar Detectors. International Journal of Computer Vision, 112(2):150–171, Apr. 2015.
 [40] Zhuowen Tu, Xiangrong Chen, Alan L. Yuille, and SongChun Zhu. Image Parsing: Unifying Segmentation, Detection, and Recognition. International Journal of Computer Vision, 63(2):113–140, July 2005.
 [41] Steffen Wolf, Alberto Bailoni, Constantin Pape, Nasim Rahaman, Anna Kreshuk, Ullrich Köthe, and Fred A Hamprecht. The Mutex Watershed and its Objective: Efficient, ParameterFree Image Partitioning. arXiv preprint arXiv:1904.12654, May 2019.
 [42] Steffen Wolf, Constantin Pape, Nasim Rahaman, Anna Kreshuk, Ullrich Kothe, and Fred Hamprecht. The Mutex Watershed: Efficient, ParameterFree Image Partitioning. In The European Conference on Computer Vision (ECCV), page 17, Sept. 2018.
 [43] Jian Yao, S. Fidler, and R. Urtasun. Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 702–709, Providence, RI, June 2012. IEEE.
 [44] Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. Learning a Discriminative Feature Network for Semantic Segmentation. arXiv:1804.09337 [cs], Apr. 2018.
 [45] Fisher Yu and Vladlen Koltun. MultiScale Context Aggregation by Dilated Convolutions. arXiv:1511.07122 [cs], Nov. 2015.
 [46] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. Pyramid scene parsing network. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 2881–2890, 2017.
Appendix A Symmetric Multiway Cut in Related Literature
We will now relate the Symmetric Multiway Cut definition in eqs. 6, 5, 4, 3, 2 and 1 with the the objective given in [24]. In contrast to this work Kroeger et al. [24] do not split the set of edges in to attractive and repulsive edges. Instead they model repulsion with negative weights and formulate the SMWC as the following constrained energy minimization/integer linear program (ILP):
(A.18) 
The variables are indicators for cuts in the graph, i.e. when the edge is cut, and is the polytope of consistent solutions defined by linear constraints:
(A.19)  
(A.20)  
(A.21)  
(A.22) 
The cycle constraints (A.19) form the so called Multicut polytope [2]; they forbid dangling edges thus all noncut internal edges form clusters on the graph . Equation A.20 ensures that each internal node is connected to exactly one terminal node. Finally, (A.21) and (A.22) are cycle constraints on all cycles with one terminal node; they enforce that an internal edge is always cut when its incident nodes are connected to different terminal nodes. This ensures that the resulting partitioning and labeling is always consistent. Note that an edge between two nodes connected to the same terminal is allowed to be cut, so two instances of the same class may touch.
We will now transform the objective given in eq. A.18 and introduce an additional parameter . Instead of finding a smallweight set to cut from the graph, we try to find a largeweight set to keep in the graph.
First, we split the internal edges into repulsive () and attractive edges () so the energy function becomes
(A.23) 
For the ILP corresponds to the Symmetric Multiway Cut. Subtracting the constant sum of all positive edge weights, using yields
(A.24) 
Finally, by substituting
(A.25) 
we obtain the equivalent objective
(A.26) 
Here, is the polytope formed by eqs. 6, 5, 4, 3 and 2. Since all weights are positive in the SMW graph, the absolute value is omitted in eq. 1.
Appendix B Mutex and Label Constraints
We will now formality derive that the constraints in eqs. 15 and 14 are necessary for the Symmetric Multiway Cut constraints eqs. A.22, A.21, A.20 and A.19. Wolf et al. [41] show that the eq. 14 is necessary for the multicut constraints eq. A.19. Therefore, it is left to show that
(A.27) 
The righthand side is the label constraint which will be shown to be a subset of the constraints formed by eqs. A.22, A.21 and A.20 (here on the left).
First we show by contradiction and using eq. A.20 that an internal node can only be connected to a single terminal : Assume that there is a which is also connected to ; then we have and . Now rewrite eq. A.20 and insert these two variables so that we get the contradiction
(A.28) 
We further show that two connected nodes and are always connected to the same terminal node . Without losing generality we assume and are connected and and are connected, i.e. and . Then eqs. A.22 and A.21 give us
(A.29) 
Finally, we can prove eq. A.27: any path starting from begins with an edge to some node ; all nodes connected to (and itself) are connected to and no other terminal node. Therefore there can not be any path from to another terminal node satisfying the label constraint .
Appendix C Additional Details of the Cityscapes Experiments
c.1 Implementation Details
We use the class probabilities from a Deeplab 3+ [8]
as semantic edge weights. We use a trained model provided by Tensorflow. employ the MaskRCNN
[15] implementation provided by [32] and trained a model on Cityscapes following [15]’s training configuration. The graph weights are derived as explained above. We derive graph weights for different offsets: for attractive edges we use (1) 8neighbourhood with distances of {1, 2, 4} pixels, (2) random pairs inside each bounding box. For repulsive edges we sample 5 random pixel pairs for each mask and compute the soft IOU (eq. 17). [29] trained a Deeplab 3+ to predict affinities for their graphclustering algorithm. They kindly provided their trained models allowing us to use the same affinities. Since their clustering utilizes a threshold, we treat the threshold as the splitting point between attractive and repulsive edge weights; affinities below the threshold are inverted and scaled to [0, 1]. In addition to the model by GMIS that is trained on scaled bounding boxes, we train a Deeplab3+ for affinity predictions on the full images. Because [29] only tackle instance segmentation, their model does not predict affinities for stuff classes. We train the network with Sorensen Dice Loss and the same stencil pattern as [29]. The training protocol follows the settings in [8], using a batch size of 12 and 70k training iterations. We do not employ any test time augmentations.c.2 Additional images


