DeepAI
Log In Sign Up

The Semantic Mutex Watershed for Efficient Bottom-Up Semantic Instance Segmentation

12/29/2019
by   Steffen Wolf, et al.
2

Semantic instance segmentation is the task of simultaneously partitioning an image into distinct segments while associating each pixel with a class label. In commonly used pipelines, segmentation and label assignment are solved separately since joint optimization is computationally expensive. We propose a greedy algorithm for joint graph partitioning and labeling derived from the efficient Mutex Watershed partitioning algorithm. It optimizes an objective function closely related to the Symmetric Multiway Cut objective and empirically shows efficient scaling behavior. Due to the algorithm's efficiency it can operate directly on pixels without prior over-segmentation of the image into superpixels. We evaluate the performance on the Cityscapes dataset (2D urban scenes) and on a 3D microscopy volume. In urban scenes, the proposed algorithm combined with current deep neural networks outperforms the strong baseline of `Panoptic Feature Pyramid Networks' by Kirillov et al. (2019). In the 3D electron microscopy images, we show explicitly that our joint formulation outperforms a separate optimization of the partitioning and labeling problems.

READ FULL TEXT VIEW PDF

page 6

page 11

03/17/2018

Learning to Cluster for Proposal-Free Instance Segmentation

This work proposed a novel learning objective to train a deep neural net...
04/25/2019

The Mutex Watershed and its Objective: Efficient, Parameter-Free Image Partitioning

Image partitioning, or segmentation without semantics, is the task of de...
01/25/2021

Embedding-based Instance Segmentation of Microscopy Images

Automatic detection and segmentation of objects in microscopy images is ...
02/07/2019

Single Network Panoptic Segmentation for Street Scene Understanding

In this work, we propose a single deep neural network for panoptic segme...
03/29/2022

Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation

Large-scale training data with high-quality annotations is critical for ...
06/27/2019

A Generalized Framework for Agglomerative Clustering of Signed Graphs applied to Instance Segmentation

We propose a novel theoretical framework that generalizes algorithms for...

1 Introduction

Image segmentation literature distinguishes semantic segmentation - associating each pixel with a class label - and instance segmentation, i.e. detecting and segmenting individual objects while ignoring the background. The joint task of simultaneously assigning a class label to each pixel and grouping pixels to instances has been addressed under different names, including semantic instance segmentation, scene parsing [39], image parsing [40]

, holistic scene understanding 

[43] or instance-separating semantic segmentation [28]. Recently, a new metric and evaluation approach to such problems has been introduced under the name of panoptic segmentation [19].

From a graph theory perspective, semantic instance segmentation corresponds to the simultaneous partitioning and labeling of a graph. Most greedy graph partitioning algorithms are defined on graphs encoding attractive interactions only. Clusters are then formed through agglomeration or division until a user-defined termination criterion is met (often a threshold or a desired number of clusters). These algorithms perform pure instance segmentation. The semantic labels for the segmented instances need to be generated independently.

If repulsive - as well as attractive - forces are defined between the nodes of the graph, partitioning can be formulated as a Multicut problem [2]. In this formulation clusters emerge naturally without the need for a termination criterion. Furthermore, the Multicut problem can be extended to include the labeling of the graph, delivering a semantic instance segmentation from a joint optimization of partitioning and labeling  [24]. The main drawback of this formulation is that the Multicut problem is NP-hard.

We propose to solve the joint partitioning and labeling problem by an efficient algorithm which we term Semantic Mutex Watershed (SMW), inspired by the Mutex Watershed [41]. In more detail, in this contribution we:

  • [nosep]

  • propose a fast algorithm for joint graph partitioning and labeling

  • prove that the algorithm minimizes (exactly) an objective function closely related to the Symmetric Multiway Cut objective

  • demonstrate competitive performance on natural and biological images.

2 Related Work

Semantic segmentation.

State-of-the-art semantic segmentation algorithms are based on convolutional neural networks (CNNs) which are trained end-to-end. The networks commonly follow the design principles of image classification networks (

e.g[16, 38, 23]), replacing the fully connected layers at the end with convolutional layers to form a fully convolutional network  [30]. This architecture can be further extended to include encoder-decoder paths  [37], dilated or atrous convolutions [45, 6] and pyramid pooling modules [7, 46].

Instance segmentation.

Many instance segmentation methods use a detection or a region proposal framework as their basis; object segmentation masks are then predicted inside region proposals. A cascade of multiple networks is employed by [12], each solving a specific subtask to find the instance labeling. Mask-RCNN [15] builds on the bounding box prediction capabilities of Faster-RCNN [36] to simultaneously produce masks and class predictions. An extension of this method with an additional semantic segmentation branch has been proposed in [18] as a single network for semantic instance segmentation.

In contrast to the region-based methods, proposal-free algorithms often start with a pixel-wise representation which is then clustered into instances [44, 21, 13]. Alternatively, the distance transform of instance masks can be predicted and clustered by thresholding [4].

Graph-based segmentation.

Graph-based methods, used independently or in combination with machine learning on pixels, form another popular basis for image segmentation algorithms  

[14]. In this case, the graph is built from pixels or superpixels of the image and the instance segmentation problem is formulated as graph partitioning. When the number of instances is not known in advance and repulsive interactions are present between the graph nodes, graph partitioning can in turn be formulated as a Multicut or correlation clustering problem [2]

. This NP-hard problem can be solved reasonably fast for small problem sizes with integer linear programming solvers 

[1] or approximate algorithms [34, 5]. A modified Multicut objective is introduced by  [41] together with the Mutex Watershed - an efficient clustering algorithm for its optimization.

The Multicut objective can be extended to solve a joint graph partitioning and labeling problem [17, 24] for simultaneous instance and semantic segmentation. In practice, the computational complexity of the joint problem only allows for approximate solutions  [28], possibly combined with reducing the problem size by over-segmentation into superpixels. This formulation has been applied to natural images by [20] and to biological images by [22].

Similar to the semantic segmentation use case, CNNs can be used to predict pixel and superpixel affinities which serve as edge weights in the graph partitioning problem  [27, 31, 29].

3 The Semantic Mutex Watershed

The centerpiece of this paper, the Semantic Mutex Watershed algorithm, solves the semantic instance segmentation problem by jointly finding a graph partitioning and labeling. In this section we present the graph-based formulation of the semantic instance segmentation problem and define an objective function related to the Symmetric Multiway Cut problem [24]. Then we introduce the Semantic Mutex Watershed algorithm and prove that it can optimize this objective efficiently. Finally we show that the proposed objective constitutes a generalization of the Mutex Watershed Objective introduced in [41].

3.1 Joint Partitioning and Labeling of Graphs

Similar to instance segmentation algorithms, we build a graph of image pixels (voxels) or superpixels and formulate the semantic instance segmentation problem as joint partitioning and labeling of the graph.

Weighted graph with terminal nodes.

For an undirected weighted graph we refer to the nodes as internal nodes and the edges as internal edges. We differentiate between attractive edges and repulsive edges that make up the internal edges . Each edge is associated with a real-valued positive weight . The weights encode the attraction and repulsion between the incident nodes of each edge. A large attractive weight encodes a high tendency for the nodes and to belong to the same partition element. Equivalently, a large repulsive weight indicates a strong inclination of and to belong to separate clusters.

Semantic instance segmentation can be achieved by clustering the internal nodes and assigning a semantic label to each cluster. We extend by terminal nodes where each is associated with a label . Every internal node is connected to every by a weighted semantic edge . Here, a large semantic weight implies a strong association of internal node with the label of the terminal node . The extended graph thus becomes with and Figure 1(a) shows a simple example of such an extended graph.

Input: weighted graph
Output: clusters and labeling defined by
Initialization : 
for  in descending order of  do
        if  then
               if not mutex(i, j)
                 and not class(i) class(j) then
                      merge(i,j):
              
       else if  then
               if not connected(i, j) then
                      addMutex(i, j):
              
         else if  then
                 if  or  then
                        assignLabel(i, j):
              
       
return
Algorithm 1 The Semantic Mutex Watershed algorithm. The differences to the Mutex Watershed are marked in blue.

(a)

(b)

Figure 1: (a) Example of an extended graph. Nodes on top are terminal nodes with each color representing a label class. The associated semantic edges are colored correspondingly. The internal nodes are on the bottom with attractive (green) and repulsive (red) edges between them. (b) Semantic instance segmentation. Edges that are part of the active set are shown in bold. Note that two adjacent nodes with the same label are not necessarily clustered together.

Symmetric Multiway Cut.

In the Symmetric Multiway Cut an optimal semantic instance segmentation of such a graph is formulated as a constrained energy minimization/integer linear program (ILP) [24]:

(1)
(2)
(3)
(4)
(5)
(6)

where . The segmentation consistency is ensured by the cycle inequalities (2) and (3). Equation 4 enforces that every node is uniquely assigned to a terminal node. Equations 6 and 5 ensure the consistency between partition labeling and semantic labeling. A detailed discussion of the objective’s properties will follow in section 3.3 and the relation to the objective in [24] is derived in appendix A. Although this is in general a hard optimization problem, we will show that for sufficiently large this objective can be solved exactly and efficiently by the algorithm introduced in the next section.

3.2 The Semantic Mutex Watershed Algorithm.

We will now introduce a simple algorithm that greedily constructs a solution to eq. 1. Although this will most likely not be an optimal solution to the NP-hard Symmetric Multiway Cut in general, we will show in section 3.3 that it becomes optimal when is large.

The clustering and label assignment of is described by a set of active edges, which are chosen by the algorithm: where , and encode merges, mutual exclusions and label assignments, respectively. In order to restrict to a consistent partitioning and labeling we will make the following definitions:

We define two internal nodes as connected if they are connected by active attractive edges, i.e.

(7)

Here denotes a path from node to node . We also define the mutual exclusion between two nodes as

(8)

Two nodes are thus mutual exclusive if they are connected by a path from to with exactly one repulsive edge. Furthermore, a label is assigned to a node if this node is connected to the corresponding terminal node by attractive and semantic edges:

(9)

For unlabeled nodes we use the notation .

Algorithm.

The Semantic Mutex Watershed algorithm is an extension of the Mutex Watershed algorithm introduced by [42]. It augments the partitioning of the latter with a consistent labeling. The algorithm is shown in algorithm 1 with the additions to [42] highlighted. In the following we explain the syntax and procedure of the shown pseudocode.

All edges are sorted in descending order of their weight and put in a priority queue. While traversing the queue, the decision to add an edge to the set is made depending on the type of edge:

Attractive edges: The edge is added if the incident nodes are not mutual exclusive and not labeled differently.

Repulsive edges: The edge is added if the incident nodes are not connected.

Semantic edges: The edge is added if the node is either unlabeled or already has the same label as the edge’s terminal node.

Following these rules, the set of attractive edges in the final set form clusters in the graph, which are each connected to a single terminal node indicating the labeling. Figure 1(b) shows a simple example of such an active set.

Efficient Implementation with Maximum-Spanning-Trees.

The SMW is similar to the efficient Kruskal’s maximum spanning tree algorithm [25] and can feasibly be applied to pixel-graphs of large images and even image volumes. Our implementation utilizes an efficient union-find data structure, mutex relations are realized/searched through a hash table.

Mutex Watershed as Special Case.

The Mutex Watershed algorithm is embedded in the Semantic Mutex Watershed as the special case when there are zero or one label ().

3.3 The Semantic Mutex Watershed Objective

In this section we prove that the Semantic Mutex Watershed Algorithm solves the ILP objective in eq. 1 for sufficiently large (dominant) powers . To this end, we will extend the proof of [41] by semantic edges. First, we will review the definitions of dominant powers and mutex constraints. Second, we introduce an additional set of constraints acting on semantic edges and use it to define the Semantic Mutex Watershed Objective as a relaxed version of the Symmetric Multiway Cut. Finally, we prove that the solution found by the SMW is indeed optimal.

Dominant Power.

Let be an edge-weighted graph, with unique weights . We call a dominant power if:

(10)

Note that there exists a dominant power for any finite set of edges, since for any we can divide (10) by and observe that the normalized weights (and any finite sum of these weights) converges to 0 when tends to infinity.

Semantic Mutex Watershed Constraints.

To formalize the rules of the algorithm defined above, we first define special subsets of the active set . First, the set of all cycles containing exactly one repulsive edge is defined as

(11)

We define the mutex constraint as requiring , which is exactly the rule that two mutual exclusive nodes must not be connected.

Furthermore, we define the set of all paths that connect two distinct terminal nodes through attractive and semantic edges:

(12)

The algorithm must never connect two terminal nodes through such a path, thus we define the label constraint . This ensures the consistency between the partitioning and labeling. The mutex and label constraint are necessary but not sufficient to fulfill the linear constraints in eqs. 6, 5, 4, 3 and 2 (the formal derivation can be found in appendix B).

Lemma 3.1 (Optimality of the Semantic Mutex Watershed).

Let be an edge-weighted graph extended by terminal nodes , with unique weights and a dominant power. Then the Semantic Mutex Watershed Algorithm 1 finds the optimal solution to the integer linear program

(13)
s.t. (14)
(15)
with (16)
Proof.

[41] show that for the SMW finds the optimal solution because it enjoys the properties greedy choice and optimal substructure. Their proof of optimal substructure does not rely on the specific constraints in the ILP. Thus it can also be applied with the additional constraint in eq. 15, giving the ILP eqs. 16, 15, 14 and 13 optimal substructure.

In every iteration the SMW adds the feasible edge with the largest weight to the active set. Due to the dominant power, its energy contribution is larger than for any combination of edges with . Thus, SMW has the greedy choice property [11]. It follows by induction that the SMW algorithm finds the globally optimal solution to the SMW objective.

We can now finally observe that the SMW algorithm always yields a consistent graph partitioning and labeling which fulfills the Symmetric Multiway Cut constraints. Thus, the Semantic Mutex Watershed algorithm returns an optimal solution of eqs. 6, 5, 4, 3, 2 and 1 if is set to a dominant power. In particular, if is dominant then the SMW solution is also an optimal solution to the Symmetric Multiway Cut.

4 Experiments

Figure 2: Semantic instance segmentation. First three columns:

Results on Cityscapes using semantic unaries (Deeplab 3+ network) and affinities derived from Mask-RCNN foreground probability. Colors indicate predicted semantic classes with variations for separate instances.

Rightmost column: Results for the sponge dataset. Cell-bodies are colored in blue, microvilli in green and flagella in red.

We will now demonstrate how to apply the SMW algorithm to semantic instance segmentation of 2D and 3D images. We start from showing how existing CNNs can be used as graph weight estimators and compare different sources of edge weights on the Cityscapes dataset. Additionally, we apply the SMW algorithm to a 3D electron microscopy volume and demonstrate its efficiency and scalability.

4.1 Affinity Generation with Neural Networks

The only input to the SMW are the graph weights; it does not require any hyperparamters such as thresholds. Consequently, its segmentation quality relies on good estimates of the graph weights . In this section we present how state-of-the-art CNNs can be used as sources for these weights.

Affinity Learning.

Affinities are commonly used in instance segmentation; many modern algorithms train CNNs to directly predict pixel affinities. A universal approach is to employ a stencil pattern that describes for each pixel which neighbours to consider for the affinity computation. Regularly spaced, multi-scale stencil patterns are widely used for natural images [31, 29] and bio-medical data [42, 27].

The predicted affinities are usually in the interval and can be interpreted as pseudo-probabilities. We use these affinities directly as weights for the attractive edges and invert them to get the repulsive edge weights.

Mask-RCNN

produces overlapping masks that have to be resolved for a consistent panoptic segmentation. We achieve this with the SMW by deriving affinities from the foreground probabilities of each mask. A straightforward approach is to compute the (attractive) affinity of two pixels as their joint foreground probability, weighted by the classification score : .

We find that sparse repulsive edges work well in practice, as they lead to faster inference and reduced over-segmentation on the instance boundaries. +For this reason, we sample random points from all pairs of masks and add (repulsive) edges with weight proportional to a soft intersection over union of two masks and :

(17)

Semantic Segmentation CNNs.

State of the art CNNs [8, 46]

achieve high quality results on semantic segmentation tasks. The output of the last softmax layer usually used in these networks can be interpreted as the normalized probability of each pixel belonging to each class. Thus, we can use these predictions directly as semantic weights

.

Additionally, we derive affinities from the stuff class probabilities; we treat each stuff class separately and again compute the affinity of two pixels as their joint probability of being in each stuff class , i.e.: . This cannot be done for thing classes since they can have multiple instances.

4.2 Panoptic Segmentation on Cityscapes

We apply the SMW on the challenging task of panoptic segmentation on the Cityscapes dataset [10]. We illustrate how the different sources of affinities can be used and combined and show their different strengths and weaknesses.

Dataset.

The Cityscapes dataset consists of urban street scene images taken from a driver’s perspective. It has 5k densely annotated images separated into train (2975), val (500) and test (1525) set. Since there is no public evaluation server for panoptic segmentation on the test set, we report all results on the validation set. There are 19 classes with 11 stuff classes and 8 thing classes.

Implementation Details.

We employ and combine multiple sources of graph weights to build the SMW graph. We train a Deeplab 3+ [8] network for semantic edge weight and affinities prediction following [29]. We employ the Mask-RCNN [15] implementation provided by [32] and train a model on Cityscapes following [15]’s training configuration. Further implementation details can be found in section C.1.

MRCNN[15] GMIS[29] Deeplab[8] Cityscapes att rep att rep att rep sem PQ PQTh PQSt 59.3 50.6 65.7 58.6 48.8 65.7 56.1 42.8 65.7 48.7 38.7 55.9 47.3 35.5 55.9 46.3 33.1 56.0
Table 1: Panoptic segmentation quality PQ of the SMW on top of diverse sources of graph weights.
Cityscapes PQ PQTh PQSt SMW 59.3 50.6 65.7 PFPN[18] 58.1 52.0 62.5 DIN[3] 53.8 42.5 62.1 Sponge PQ PQTh PQSt SMW 51.6 62.1 20.0 MWS-MAX 48.1 56.2 23.8 CC 43.4 55.6 06.7 CC 24.3 27.7 13.9
Table 2: Comparison to other segmentation strategies.

Study of Affinity Sources.

We evaluate the semantic instance segmentation performance of the SMW in terms of the “panoptic” metric using different combinations of the graph weight sources discussed above. In table 2 we compare the PQ metric on the Cityscapes dataset.

The best performance can be achieved with a combination of Mask-RCNN affinities and Deeplab 3+ for semantic predictions outperforming the strong baseline of [18] listed in table 2 and shown in fig. 2 and the supplementary fig. 3. Through observations on the images, we find that Mask-RCNN affinities are more reliable in detecting small objects as well as in connecting fragmented instances. Note that PQ mostly measures detection quality which is then weighted by the segmentation quality of the found instances, hence the detection strength of the Mask-RCNN shines through.

We observe that using all sources together leads to a performance drop of 10 percentage points below the best result. We believe this is due to the greedy nature of the SMW which selects the strongest of all provided edges. This example demonstrates how important it is to carefully select/train the algorithm input.

4.3 Semantic Instance Segmentation of 3D EM Volumes

Semantic instance segmentation is an important task in bio-medical image analysis where classes naturally arise through cellular structure. We use a 3D EM image dataset to compare the SMW to algorithms that separately optimize instance segmentation and semantic class assignment.

Dataset.

The data-set consists of two FIBSEM volumes of a sponge choanocye chamber. The data was acquired in [33] to investigate proto-neural cells in sponges using the segmentation approach introduced in [35]. These cells filter nutrients from water by creating a flow with the beating of a flagellum and absorbing the nutrients through microvilli that surround the flagellum in a collar [26] (see fig. 2). In order to investigate this process in detail, a precise semantic instance segmentation of the cell-bodies, flagella and microvilli is needed. The dataset consists of three EM image volumes of size pixel ( m).

Implementation Details.

We predict affinities with two separate 3D U-Nets [9] to derive graph edge weights and semantic class probabilities respectively. We adopt the training procedure of [42]

which uses the Dice Coefficient as the loss function. We use two volumes for training and one for testing.

We also implement baseline approaches which start from the same network predictions, but do not perform joint labeling and partitioning. First, we compare to instance segmentation with the Mutex Watershed, followed by assigning instances the semantic label of the strongest semantic edge (MWS-MAX). In addition, we compute connected components of the semantic predictions () and short-range affinities ().

Results.

The PQ values in table 2 show that the SMW outperforms the baselines approaches that separately optimize instance segmentation and semantic class assignment. An additional analysis can be found in the appendix fig. 4, where we measure the runtimes for different volume sizes and observe almost linear scaling behavior.

5 Conclusion

We have introduced a new method for joint partitioning and labeling of weighted graphs as a generalization of the Mutex Watershed algorithm. We have shown that it optimally solves an objective function closely related to the objective of the Symmetric Multiway Cut problem. Our experiments demonstrate that SMW with graph edge weights predicted by convolutional neural networks outperform strong baselines on natural and biological images. Any improvement in the CNN performance will translate directly to an improvement of the SMW results. However, we also observe that the extreme value selection used by the SMW to assign edges to the active set can lead to sub-optimal performance when diverse edge weights sources are combined. Empirically, the algorithm scales almost linearly with the number of graph edges making it applicable to large images and volumes without prior over-segmentation into superpixels. The source code will be made available upon publication.

References

Appendix A Symmetric Multiway Cut in Related Literature

We will now relate the Symmetric Multiway Cut definition in eqs. 6, 5, 4, 3, 2 and 1 with the the objective given in [24]. In contrast to this work Kroeger et al[24] do not split the set of edges in to attractive and repulsive edges. Instead they model repulsion with negative weights and formulate the SMWC as the following constrained energy minimization/integer linear program (ILP):

(A.18)

The variables are indicators for cuts in the graph, i.e. when the edge is cut, and is the polytope of consistent solutions defined by linear constraints:

(A.19)
(A.20)
(A.21)
(A.22)

The cycle constraints (A.19) form the so called Multicut polytope [2]; they forbid dangling edges thus all non-cut internal edges form clusters on the graph . Equation A.20 ensures that each internal node is connected to exactly one terminal node. Finally, (A.21) and (A.22) are cycle constraints on all cycles with one terminal node; they enforce that an internal edge is always cut when its incident nodes are connected to different terminal nodes. This ensures that the resulting partitioning and labeling is always consistent. Note that an edge between two nodes connected to the same terminal is allowed to be cut, so two instances of the same class may touch.

We will now transform the objective given in eq. A.18 and introduce an additional parameter . Instead of finding a small-weight set to cut from the graph, we try to find a large-weight set to keep in the graph.

First, we split the internal edges into repulsive () and attractive edges () so the energy function becomes

(A.23)

For the ILP corresponds to the Symmetric Multiway Cut. Subtracting the constant sum of all positive edge weights, using yields

(A.24)

Finally, by substituting

(A.25)

we obtain the equivalent objective

(A.26)

Here, is the polytope formed by eqs. 6, 5, 4, 3 and 2. Since all weights are positive in the SMW graph, the absolute value is omitted in eq. 1.

Appendix B Mutex and Label Constraints

We will now formality derive that the constraints in eqs. 15 and 14 are necessary for the Symmetric Multiway Cut constraints eqs. A.22, A.21, A.20 and A.19. Wolf et al[41] show that the eq. 14 is necessary for the multicut constraints eq. A.19. Therefore, it is left to show that

(A.27)

The right-hand side is the label constraint which will be shown to be a subset of the constraints formed by eqs. A.22, A.21 and A.20 (here on the left).

First we show by contradiction and using eq. A.20 that an internal node can only be connected to a single terminal : Assume that there is a which is also connected to ; then we have and . Now rewrite eq. A.20 and insert these two variables so that we get the contradiction

(A.28)

We further show that two connected nodes and are always connected to the same terminal node . Without losing generality we assume and are connected and and are connected, i.e. and . Then eqs. A.22 and A.21 give us

(A.29)

Finally, we can prove eq. A.27: any path starting from begins with an edge to some node ; all nodes connected to (and itself) are connected to and no other terminal node. Therefore there can not be any path from to another terminal node satisfying the label constraint .

Appendix C Additional Details of the Cityscapes Experiments

c.1 Implementation Details

We use the class probabilities from a Deeplab 3+ [8]

as semantic edge weights. We use a trained model provided by Tensorflow. employ the Mask-RCNN 

[15] implementation provided by [32] and trained a model on Cityscapes following [15]’s training configuration. The graph weights are derived as explained above. We derive graph weights for different offsets: for attractive edges we use (1) 8-neighbourhood with distances of {1, 2, 4} pixels, (2) random pairs inside each bounding box. For repulsive edges we sample 5 random pixel pairs for each mask and compute the soft IOU (eq. 17). [29] trained a Deeplab 3+ to predict affinities for their graph-clustering algorithm. They kindly provided their trained models allowing us to use the same affinities. Since their clustering utilizes a threshold, we treat the threshold as the splitting point between attractive and repulsive edge weights; affinities below the threshold are inverted and scaled to [0, 1]. In addition to the model by GMIS that is trained on scaled bounding boxes, we train a Deeplab3+ for affinity predictions on the full images. Because [29] only tackle instance segmentation, their model does not predict affinities for stuff classes. We train the network with Sorensen Dice Loss and the same stencil pattern as [29]. The training protocol follows the settings in [8], using a batch size of 12 and 70k training iterations. We do not employ any test time augmentations.

c.2 Additional images

Figure 3: Further examples panoptic results on Cityscapes using using semantic unaries (Deeplab 3+ network) and affinities derived from Mask-RCNN foreground probability. Prediction errors are highlighted in green.

Appendix D Scaling Behavior

Figure 4: Runtime scaling of the SMW. We evaluate the runtime of the SMW for different volume sizes of the 3D Sponge dataset. We find an almost linear relation between runtime and number of voxels.