1 Introduction
In computer vision, the partitioning of weighted graphs has been successfully applied to such tasks as image segmentation, object tracking and pose estimation. Most graph clustering methods work with positive edge weights only, which can be interpreted as similarities or distances between the nodes. These methods are parameterbased and require users to specify the desired numbers of clusters or a termination criterion (e.g. spectral clustering or iterated normalized cuts) or even to add a seed for each object (e.g. seeded watershed or random walker).
Other graph clustering methods work with socalled signed graphs, which include both positive and negative edge weights corresponding to attraction and repulsion between nodes. The advantage of signed graphs over positiveweighted graphs is that balancing attraction and repulsion allows us to perform the clustering without defining additional parameters. This can be done optimally by solving the socalled multicut optimization or correlation clustering problem kappes2011globally ; chopra1991multiway . This problem is NPhard, but approximate solvers have already been proposed beier2016efficient . Besides, the general problem of graph partitioning can be solved approximately by greedy agglomerative clustering keuper2015efficient ; levinkov2017comparative ; wolf2018mutex ; kardoostsolving .
Agglomerative clustering algorithms for signed graphs have clear advantages: they are parameterfree and efficient. Despite the fact that there exists a variety of these algorithms, no overarching study has so far been made to compare their robustness and efficiency or to provide guidelines for matching an algorithm to the partitioning problem at hand.
In this paper, we propose a novel theoretical framework that generalizes over agglomerative algorithms for signed graphs by linking them to hierarchical agglomerative clustering on positiveweighted graphs lance1967general . This framework defines an underlying basic algorithm and allows us to explore its combinations with different linkage criteria and cannotlink constraints. We then formally prove that some of the combinations correspond to existing clustering algorithms and introduce new algorithms for combinations which have not been explored yet.
We evaluate and compare these algorithms on instance segmentation  a computer vision task of assigning each pixel of an image to an object instance. We use a CNN to predict the edge weights of a graph such that each node represents a pixel of the image, similarly to liu2018affinity ; lee2017superhuman ; wolf2018mutex , and provide these weights as input to the algorithms in our framework (see Fig. 1).
With our comparison experiments, performed both on 2D urban scenes from the CityScapes dataset and 3D electron microscopy image volumes of neurons, we evaluate the properties of the algorithms in our framework, focusing on their efficiency, robustness and tendency to over or undercluster. We show that one of the new algorithms derived from our framework, based on an average linkage criterion, outperforms the previously known agglomeration methods expressed in the framework. It also achieves competitive performance on the challenging CREMI 2016 segmentation benchmark and represents the bestperforming proposalfree method on CityScapes. Our code is available at
https://github.com/abailoni/GASP.2 Related work
Proposalbased methods have been highly successful in instance segmentation competitions like MS COCO lin2014microsoft , Pascal VOC2012 everingham2010pascal and CityScapes cordts2016cityscapes . They decompose the instance segmentation task into two steps that consists in generating object proposals and assigning to each bounding box a class and a binary segmentation mask he2017mask ; liu2018path ; yang2012layered ; li2017fully ; ladicky2010and ; hariharan2014simultaneous ; chen2015multi ; dai2016instance ; liang2016reversible . They commonly rely on FasterRCNN ren2015faster and can be trained endtoend using nonmaximum suppression. Other methods use instead recurrent models to sequentially generate instances onebyone romera2016recurrent ; ren2017end .
Proposalfree methods adopt a bottomup approach by directly grouping pixels into instances. Recently, there has been a growing interest for such methods that do not involve object detection, since, in certain types of data, object instances cannot be approximated by bounding boxes. For example, the approach proposed in kirillov2017instancecut uses a combinatorial framework for instance segmentation; SGN liu2017sgn sequentially group pixels into lines and then instances; a watershed transform is learned in bai2017deep by also predicting its gradient direction, whereas the template matching uhrig2016pixel deploys scene depth information. Others use metric learning to predict highdimensional associative pixel embeddings that map pixels of the same instance close to each other, while mapping pixels belonging to different instances further apart fathi2017semantic ; newell2017associative ; de2017semantic ; kulikov2018instance . Final instances are then retrieved by applying a clustering algorithm, like in the endtoend trainable meanshift pipeline of kong2018recurrentPix .
Edge detection
also experienced recent progress thanks to deep learning, both on natural images
xie2015holistically ; kokkinos2015pushing and biological data lee2017superhuman ; schmidt2018cell ; meirovitch2016multi ; ciresan2012deep . In neuron segmentation for connectomics, a field of neuroscience we also address in our experiments, boundaries are converted to final instances with subsequent postprocessing and superpixelmerging: some use loopy graphs kaynig2015large ; krasowski2015improving or trees meirovitch2016multi ; liu2016sshmt ; liu2014modular ; funke2015learning ; uzunbas2016efficient to represent the region merging hierarchy; the lifted multicut beier2017multicut formulates the problem in a combinatorial framework, while floodfilling networks januszewski2018high eliminate superpixels by training a recurrent CNN to perform region growing one region at the time. A structured learning approach was also proposed in funke2018large ; turaga2009maximin .Agglomerative graph clustering has often been applied to instance segmentation ren2013image ; liu2016image ; salembier2000binary , because of its efficiency as compared to other topdown approaches like graph cuts. Novel termination criteria and merging strategies have often been proposed: the agglomeration in malmberg2011generalized deploys fixed sets of merge constraints; ultrametric contour maps arbelaez2011contour combine an oriented watershed transform with an edge detector, so that superpixels are merged until the ultrametric distance exceeds a learned threshold; the popular graphbased method felzenszwalb2004efficient stops the agglomeration when the merge costs exceed a measure of quality for the current clusters. The optimization approach in kiran2014global performs greedy merge decisions that minimize a certain energy, while other pipelines use classical HAC linkage criteria, e.g. average linkage liu2018affinity ; lee2017superhuman , median funke2018large
or a linkage learned by a random forest classifier
nunez2013machine ; knowles2016rhoananet .Clustering of signed graphs
has the goal of partitioning a graph with both attractive and repulsive cues. Finding an optimally balanced partitioning has a long history in combinatorial optimization
grotschel1989cutting ; grotschel1990facets ; chopra1993partition . NPhardness of the correlation clustering problem was shown in bansal2004correlation , while the connection with graph multicuts was made by demaine2006correlation. Modern integer linear programming solvers can tackle problems of considerable size
andres2012globally , but accurate approximations pape2017solving ; beier2016efficient ; yarkony2012fast , greedy agglomerative algorithms levinkov2017comparative ; wolf2019mutex ; keuper2015efficient ; kardoostsolving and persistence criteria lange2018partial ; lange2018combinatorial have been proposed for even larger graphs.This work reformulates the clustering algorithms of levinkov2017comparative ; wolf2018mutex ; keuper2015efficient in a generalized framework and adopt ideas from the proposalfree methods liu2018affinity ; wolf2018mutex ; lee2017superhuman to predict longrange relationships between pixels.
3 Generalized framework for agglomerative clustering of signed graphs
In this section, we first define notation and then introduce one of our main contributions: a signed graph partitioning algorithm (Sec. 3.2) that can be seen as a generalization of several existing and new clustering algorithms (Sec. 3.3).
3.1 Notation and graph formalism
We consider an undirected simple edgeweighted graph with both attractive and repulsive edge attributes. In computer vision applications, the nodes can represent either pixels, superpixels or voxels. We call the set a clustering or partitioning with clusters if , for different clusters and every cluster induces a connected subgraph of . We also denote as the cluster associated with node . The weight function associates to every edge a positive scalar attribute representing a merge affinity or a similarity measure: the higher this number, the higher the inclination of the two incident vertices to be assigned to the same cluster^{1}^{1}1Note that other formalisms for positively weighted graphs associate distances to the edges, thus, the lower the edge weight, the higher the attraction between the two linked nodes, contrary to our definition of .. On the other hand, associates to each edge a split tendency : the higher this weight, the more the incident vertices would like to be in different clusters. Graphs of the type are also often defined as signed graphs , featuring positive and negative edge weights . Following the theoretical considerations in lange2018partial , we define these signed weights as . Some approaches directly compute , whereas others compute and separately. In this formalism, graphs with purely attractive interactions are a special case of with .
Intercluster interaction We call two clusters adjacent if there exists at least one edge connecting a node to a node . In hierarchical agglomerative clustering, the interaction between the two clusters is usually defined as a function , named linkage criterion, depending on the weights of all edges connecting clusters and , i.e. . All the linkage criteria tested in this article are listed and defined in Table 1.
3.2 GASP: generalized algorithm for signed graph partitioning
In Algorithm 1, we provide simplified pseudocode for the proposed GASP algorithm. GASP implements a bottomup approach that starts by assigning each node to its own cluster and then iteratively merges pairs of adjacent clusters. The algorithm has two variants. The first one, with addCannotLinkConstraints=False, starts by merging clusters with the strongest attractive interaction and stops once the remaining clusters share only mutual repulsive interactions (see iterations on toy graphs in block 4 of Fig. 1). After each merging iteration, the interaction between the merged cluster and its neighbors is updated according to one of the linkage criteria listed in Table 1.
In the second variant, when addCannotLinkConstraints=True, Algorithm 1 also introduces cannotlink constraints, which represent mutual exclusion relationships between pairs of nodes that cannot be associated with the same cluster in the final clustering. This variant selects the pair of clusters with the highest absolute interaction , so that the most attractive and the most repulsive pairs are analyzed first (see example in Fig. 1(b)). If the interaction is repulsive, then the two clusters are constrained and its members can never merge in subsequent steps. If the interaction is attractive, then the clusters are merged, provided that they were not previously constrained. The algorithm terminates when all the remaining clusters are constrained.
In Appendix 7.1, we comment on the algorithm’s computational complexity and present our implementation given by the edge contraction Algorithm 2 based on a priority queue.
GASP linkage criteria  Unsigned Graphs  Signed Graphs  

No Constraints  With Constraints  
Sum:  Sum Linkage Hier. Aggl. Clust.  GAEC keuper2015efficient  Greedy Fixation levinkov2017comparative  
Absolute Max:  with  Single Linkage Hier. Aggl. Clust.  Mutex Watershed wolf2018mutex  Mutex Watershed wolf2018mutex  
Average:  Average Linkage Hier. Aggl. Clust.  NEW  NEW  
Max:  Single Linkage Hier. Aggl. Clust.  NEW  NEW  
Min:  Complete Linkage Hier. Aggl. Clust.  NEW  NEW 
3.3 GASP with different linkage criteria: new and existing algorithms
Our main contribution is the generalized algorithm for signed graph partitioning, short GASP, that encompasses several known and novel agglomerative algorithms on display in Table 1. In our framework, individual algorithms are differentiated by the linkage criterion employed. We review them in the following paragraphs.
In the special case of an unsigned graph with only positive interactions, i.e. and
, the algorithm performs a standard agglomerative hierarchical clustering by returning only a single cluster and a hierarchy of clusters defined by the order in which the clusters are merged (see Table
1, unsigned graphs).Given a graph with both attractive and repulsive cues, an edge contraction algorithm with a sum update rule was pioneered in levinkov2017comparative ; keuper2015efficient (Table 1, Sum linkage). The authors present both a version with cannotlink constraints and one without, and then compare them with other greedy localsearch algorithms approximating the multicut optimization problem. The Mutex Watershed wolf2018mutex is another signed graph partitioning algorithm that introduces dynamical cannotlink constraints. In Proposition 7.1 (see Appendix 7.2) we prove that, surprisingly, it can also be seen as an efficient implementation of GASP with Absolute maximum linkage (def. in Table 1). Moreover, in Proposition 7.2 we also prove that GASP with Abs Max linkage returns the same clustering with or without enforcing cannotlink constraints. On the other hand, to our knowledge, Average, Max or Min linkage criteria have never been used for signed graph agglomerative algorithms or been combined with cannotlink constraints.
Apart from the linkage criteria defined in Table 1, additional ones were proposed in the literature: nunez2013machine for example uses a learned approach where a random forest classifier updates the cluster interactions depending on predefined edge and node features; other approaches introduce a weight regularization depending on the size of the clusters felzenszwalb2004efficient ; kardoostsolving , whereas funke2018large uses a quantile linkage criteria by populating a histogram for each intercluster interaction. In our experiments, we decided to focus on the linkage criteria listed in Table 1, since they represent the most common options.
4 Experiments on neuron segmentation
We first evaluate and compare the agglomerative clustering algorithms described in the generalized framework on the task of neuron segmentation in electron microscopy (EM) image volumes. This application is of key interest in connectomics, a field of neuroscience with the goal of reconstructing neural wiring diagrams spanning complete central nervous systems. Currently, only proofreading or manual tracing yields sufficient accuracy for correct circuit reconstruction schlegel2017learning , thus further progress is required in automated reconstruction methods.
EM segmentation is commonly performed by first predicting boundary pixels beier2017multicut ; ciresan2012deep or undirected affinities wolf2018mutex ; lee2017superhuman ; funke2018large , which represent how likely it is for a pair of pixels to belong to the same neuron segment. The affinities do not have to be limited to immediately adjacent pixels. Thus, similarly to lee2017superhuman , we train a CNN to predict both short and longrange affinities and use them as edge weights of a 3D grid graph, where each node represents a pixel/voxel of the volume image.
4.1 Data: CREMI challenge
We evaluate all algorithms in the proposed framework on the competitive CREMI 2016 EM Segmentation Challenge cremiChallenge that is currently the neuron segmentation challenge with the largest amount of training data available. The dataset comes from serial section EM of Drosophila fruitfly tissue and consists of 6 volumes of 1250x1250x125 voxels at resolution 4x4x40nm, three of which present publicly available training ground truth. The results submitted to the leaderboard are evaluated using the CREMI score (https://cremi.org/leaderboard/), based on the Adapted RandScore (RandScore) and the Variation of Information Score arganda2015crowdsourcing . In Appendix 7.4, we provide more details about the training of our CNN model, inspired by work of lee2017superhuman ; funke2018large .
4.2 Results and discussion
Comparison of linkage criteria Table 2 shows how the agglomerative algorithms derived from our framework compare to each other. For a simple baseline, we also include a segmentation produced by thresholding the affinity predictions (THRESH). GASP with Average linkage, representing one of the new algorithms derived from our generalized framework, significantly outperformed all other previously proposed agglomerative methods like GAEC (GASP Sum) keuper2015efficient , Greedy Fixation (GASP Sum + Constraints) levinkov2017comparative or Mutex Watershed (GASP Abs. Max.) wolf2018mutex . The competitive performance of this simple parameterfree algorithm is also reported in Table 3, showing the current leaderboard of the challenge: all entries, apart from GASP, employ superpixelbased postprocessing pipelines, several of which rely on the lifted multicut formulation of beier2017multicut that uses several random forests to predict graph edge weights, relying not only on information derived from affinity maps but also raw data and shape information. Note that the test volumes contain several imaging artifacts that make segmentation particularly challenging and might profit from more robust edge statistics of superpixel based approaches. On the other hand, the fact that our algorithm can operate on pixels directly removes the parameter tuning necessary to obtain good superpixels and can also avoid errors that result from wrong superpixels that cannot be fixed during later agglomeration. In Appendix 7.5, we provide more details about how we scaled up GASP to the full datasets. Appendix Table 5 lists the performances and the runtimes for all tested GASP linkage.
for more details). For each experiment, some of the longrange CNN predictions were randomly selected with probability
and added as longrange edges to the pixel gridgraph. Experiments are performed on a crop of CREMI training sample B.Noise experiments Additionally, we conduct a set of experiments where the CNN predictions are perturbed by structured noise, in order to highlight the properties of each GASP variant and perform an indepth comparison that is as quantitative as possible. Appendix 7.6 introduces the type of spatially correlated noise that allowed us to perturb the CNN outputs by introducing simulated additional artifacts like missing or false positive boundary evidence. Fig. 4 summarizes our 12000 noise experiments: we focus on the best performing linkage criteria, i.e. Average, Sum and Abs Max, and test them with different amount of noise. In these experiments, we also want to assess how beneficial it is to use longrange CNN predictions in the agglomeration. Thus, we perform a set of simulations without adding longrange connections to the gridgraph and another set where we introduce them with a 10% probability^{2}^{2}2We also performed experiments adding all the longrange predictions given by the CNN model, but we did not note major differences when using only 10% of them. Adding this fraction is usually sufficient to improve the scores..
Average and Abs Max linkage Our findings confirm that GASP with Average linkage criterion represents the most robust algorithm tested and the one that benefits the most from using the longrange CNN predictions. On the other hand, it is not a surprise that the Abs Max statistic proposed by wolf2018mutex is less robust to noise than the Average linkage, but, as we show in the Appendix Table 5, Abs Max represents a valid and considerably faster option. Adding longrange connections to the graph is generally helpful, but when many of them carry repulsive weights, then GASP with cannotlink constraints shows a clear tendency to overcluster.
Sum linkage All our experiments show that GASP with Sum linkage is the algorithm with the highest tendency to undercluster and incorrectly merge segments (see Fig. 3 for an example). This property is related to the empirical observation that a Sum statistic tends to grow clusters one after the other, as shown in Fig. 1 by the quite unique agglomeration order of the Sum statistic. An intuitive explanation of this fact is the following: initially, most of the intracluster nodes present similar attractive interactions between each others; when the two nodes sharing the most attractive interaction are merged, there is a high chance that they both share an attractive interaction with a common neighboring node, so the new interaction with this common neighbor will be immediately assigned to a high priority in the agglomeration, given by the sum of two high weights; this usually starts a “chain reaction”, where only a single cluster is agglomerated at the beginning. On the other hand, as we also see in Fig. 1, other linkage criteria like Average or Abs Max grow clusters of similar sizes in parallel and accumulate in this way much more reliable intercluster statistics.


5 Experiments on CityScapes
We also evaluate the performances of GASP on the CityScapes dataset cordts2016cityscapes , which consists of 5000 streetscene images: 2975 for training, 500 for validation and 1525 for testing. We used the pipeline proposed in GMIS liu2018affinity , representing the proposalfree method performing best on the dataset. The pipeline consists of two CNN with similar structures, one predicting pixel level semantic scores and the other predicting pixel affinities between instances. The code and the model are publicly available, so we provided its output affinities as input to GASP. In Appendix 7.7 we present how we finetuned the model by using a SøresenDice loss, similarly to wolf2018mutex .
Results are summarized in Table 4 and Fig. 5: the best scores are achieved by PANet liu2018path , which is a proposalbased method strongly related to Mask RCNN. GASP with Average linkage achieves competitive results and outperforms all previously proposed proposalfree methods. Similarly to the experiments on neuron segmentation, other linkage criteria tend to overcluster, like Abs Max, or undercluster and merge instances, like Sum. The graphmerging algorithm proposed by liu2018affinity (MultiStepHAC) requires the user to tune several threshold parameters and when we applied it to the affinities predicted by our finetuned model it achieved an AP score of 33.0 on the validation set, which is worse than the original value 34.1 reported in liu2018affinity . This is probably due to the fact that the agglomeration MultiStepHAC was tailored to the output affinities of the original model. Table 6 in Appendix includes the scores of all other tested GASP algorithms.
6 Conclusion
We have presented a novel unifying framework for agglomerative clustering of graphs with both positive and negative edge weights and we have shown that several existing clustering algorithms, e.g. the Mutex Watershed wolf2018mutex , can be reformulated as special cases of one underlying agglomerative algorithm. This framework also allowed us to introduce new algorithms, one of which, based on an Average linkage criterion, outperformed all the others: It proved to be a simple and remarkably robust approach to process short and longrange predictions of a CNN applied to an instance segmentation task. On biological images, this simple average agglomeration algorithm can represent a valuable choice for user who is not willing to spend much time tuning complex taskdependent pipelines based on superpixels. In future work we plan to extend the comparison to other types of nonimage graphs and explore common theoretical properties of the algorithms included in the framework.
References
 [1] N. Ailon, M. Charikar, and A. Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM), 55(5):23, 2008.
 [2] B. Andres, T. Kroeger, K. L. Briggman, W. Denk, N. Korogod, G. Knott, U. Koethe, and F. A. Hamprecht. Globally optimal closedsurface segmentation for connectomics. In European Conference on Computer Vision, pages 778–791. Springer, 2012.
 [3] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence, 33(5):898–916, 2011.
 [4] I. ArgandaCarreras, S. C. Turaga, D. R. Berger, D. Cireşan, A. Giusti, L. M. Gambardella, J. Schmidhuber, D. Laptev, S. Dwivedi, J. M. Buhmann, et al. Crowdsourcing the creation of image segmentation algorithms for connectomics. Frontiers in neuroanatomy, 9:142, 2015.

[5]
A. Arnab and P. H. Torr.
Pixelwise instance segmentation with a dynamically instantiated
network.
In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 441–450, 2017.  [6] M. Bai and R. Urtasun. Deep watershed transform for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5221–5229, 2017.
 [7] N. Bansal, A. Blum, and S. Chawla. Correlation clustering. Machine learning, 56(13):89–113, 2004.
 [8] T. Beier, B. Andres, U. Köthe, and F. A. Hamprecht. An efficient fusion move algorithm for the minimum cost lifted multicut problem. In European Conference on Computer Vision, pages 715–730. Springer, 2016.
 [9] T. Beier, C. Pape, N. Rahaman, T. Prange, S. Berg, D. D. Bock, A. Cardona, G. W. Knott, S. M. Plaza, L. K. Scheffer, et al. Multicut brings automated neurite segmentation closer to human performance. Nature Methods, 14(2):101, 2017.
 [10] Y.T. Chen, X. Liu, and M.H. Yang. Multiinstance object segmentation with occlusion handling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3470–3478, 2015.
 [11] S. Chopra and M. R. Rao. On the multiway cut polyhedron. Networks, 21(1):51–89, 1991.
 [12] S. Chopra and M. R. Rao. The partition problem. Mathematical Programming, 59(13):87–115, 1993.
 [13] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger. 3d unet: learning dense volumetric segmentation from sparse annotation. In International conference on medical image computing and computerassisted intervention, pages 424–432. Springer, 2016.

[14]
D. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber.
Deep neural networks segment neuronal membranes in electron microscopy images.
In Advances in neural information processing systems, pages 2843–2851, 2012. 
[15]
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson,
U. Franke, S. Roth, and B. Schiele.
The cityscapes dataset for semantic urban scene understanding.
In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.  [16] J. Dai, K. He, and J. Sun. Instanceaware semantic segmentation via multitask network cascades. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3150–3158, 2016.
 [17] B. De Brabandere, D. Neven, and L. Van Gool. Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551, 2017.
 [18] E. D. Demaine, D. Emanuel, A. Fiat, and N. Immorlica. Correlation clustering in general weighted graphs. Theoretical Computer Science, 361(23):172–187, 2006.
 [19] L. R. Dice. Measures of the amount of ecologic association between species. Ecology, 26(3):297–302, 1945.
 [20] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303–338, 2010.
 [21] A. Fathi, Z. Wojna, V. Rathod, P. Wang, H. O. Song, S. Guadarrama, and K. P. Murphy. Semantic instance segmentation via deep metric learning. arXiv preprint arXiv:1703.10277, 2017.
 [22] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient graphbased image segmentation. International journal of computer vision, 59(2):167–181, 2004.
 [23] J. R. Finkel and C. D. Manning. Enforcing transitivity in coreference resolution. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pages 45–48. Association for Computational Linguistics, 2008.
 [24] J. Funke, F. A. Hamprecht, and C. Zhang. Learning to segment: training hierarchical segmentation under a topological loss. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 268–275. Springer, 2015.
 [25] J. Funke, S. Saalfeld, D. Bock, S. Turaga, and E. Perlman. Cremi challenge. https://cremi.org., 2016. Accessed: 20190515.
 [26] J. Funke, F. D. Tschopp, W. Grisaitis, A. Sheridan, C. Singh, S. Saalfeld, and S. C. Turaga. Large scale image segmentation with structured loss based deep learning for connectome reconstruction. IEEE transactions on pattern analysis and machine intelligence, 2018.
 [27] M. Grötschel and Y. Wakabayashi. A cutting plane algorithm for a clustering problem. Mathematical Programming, 45(13):59–96, 1989.
 [28] M. Grötschel and Y. Wakabayashi. Facets of the clique partitioning polytope. Mathematical Programming, 47(13):367–387, 1990.
 [29] B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik. Simultaneous detection and segmentation. In European Conference on Computer Vision, pages 297–312. Springer, 2014.
 [30] K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask rcnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
 [31] M. Januszewski, J. Kornfeld, P. H. Li, A. Pope, T. Blakely, L. Lindsey, J. MaitinShepard, M. Tyka, W. Denk, and V. Jain. Highprecision automated reconstruction of neurons with floodfilling networks. Nature methods, 15(8):605, 2018.
 [32] J. H. Kappes, M. Speth, B. Andres, G. Reinelt, and C. Schn. Globally optimal image partitioning by multicuts. In International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, pages 31–44. Springer, 2011.
 [33] A. Kardoost and M. Keuper. Solving minimum cost lifted multicut problems by node agglomeration. In ACCV 2018, 14th Asian Conference on Computer Vision, Perth, Australia, 2018.
 [34] V. Kaynig, A. VazquezReina, S. KnowlesBarley, M. Roberts, T. R. Jones, N. Kasthuri, E. Miller, J. Lichtman, and H. Pfister. Largescale automatic reconstruction of neuronal processes from electron microscopy images. Medical image analysis, 22(1):77–88, 2015.

[35]
B. W. Kernighan and S. Lin.
An efficient heuristic procedure for partitioning graphs.
Bell system technical journal, 49(2):291–307, 1970.  [36] M. Keuper, E. Levinkov, N. Bonneel, G. Lavoué, T. Brox, and B. Andres. Efficient decomposition of image and mesh graphs by lifted multicuts. In Proceedings of the IEEE International Conference on Computer Vision, pages 1751–1759, 2015.
 [37] B. R. Kiran and J. Serra. Global–local optimizations by hierarchical cuts and climbing energies. Pattern Recognition, 47(1):12–24, 2014.
 [38] A. Kirillov, E. Levinkov, B. Andres, B. Savchynskyy, and C. Rother. Instancecut: from edges to instances with multicut. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5008–5017, 2017.
 [39] S. KnowlesBarley, V. Kaynig, T. R. Jones, A. Wilson, J. Morgan, D. Lee, D. Berger, N. Kasthuri, J. W. Lichtman, and H. Pfister. Rhoananet pipeline: Dense automatic neural annotation. arXiv preprint arXiv:1611.06973, 2016.
 [40] I. Kokkinos. Pushing the boundaries of boundary detection using deep learning. arXiv preprint arXiv:1511.07386, 2015.
 [41] S. Kong and C. C. Fowlkes. Recurrent pixel embedding for instance grouping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9018–9028, 2018.
 [42] N. Krasowski, T. Beier, G. W. Knott, U. Koethe, F. A. Hamprecht, and A. Kreshuk. Improving 3d em data segmentation by joint optimization over boundary evidence and biological priors. In 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), pages 536–539. IEEE, 2015.
 [43] V. Kulikov, V. Yurchenko, and V. Lempitsky. Instance segmentation by deep coloring. arXiv preprint arXiv:1807.10007, 2018.
 [44] L. Ladickỳ, P. Sturgess, K. Alahari, C. Russell, and P. H. Torr. What, where and how many? combining object detectors and crfs. In European conference on computer vision, pages 424–437. Springer, 2010.
 [45] G. N. Lance and W. T. Williams. A general theory of classificatory sorting strategies: 1. hierarchical systems. The computer journal, 9(4):373–380, 1967.
 [46] J.H. Lange, B. Andres, and P. Swoboda. Combinatorial persistency criteria for multicut and maxcut. arXiv preprint arXiv:1812.01426, 2018.
 [47] J.H. Lange, A. Karrenbauer, and B. Andres. Partial optimality and fast lower bounds for weighted correlation clustering. In International Conference on Machine Learning, pages 2898–2907, 2018.
 [48] K. Lee, J. Zung, P. Li, V. Jain, and H. S. Seung. Superhuman accuracy on the snemi3d connectomics challenge. arXiv preprint arXiv:1706.00120, 2017.
 [49] E. Levinkov, A. Kirillov, and B. Andres. A comparative study of local search algorithms for correlation clustering. In German Conference on Pattern Recognition, pages 103–114. Springer, 2017.
 [50] Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei. Fully convolutional instanceaware semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2359–2367, 2017.
 [51] X. Liang, Y. Wei, X. Shen, Z. Jie, J. Feng, L. Lin, and S. Yan. Reversible recursive instancelevel object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 633–641, 2016.
 [52] T.Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
 [53] S. Liu, J. Jia, S. Fidler, and R. Urtasun. Sgn: Sequential grouping networks for instance segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pages 3496–3504, 2017.
 [54] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8759–8768, 2018.
 [55] T. Liu, C. Jones, M. Seyedhosseini, and T. Tasdizen. A modular hierarchical approach to 3d electron microscopy image segmentation. Journal of neuroscience methods, 226:88–102, 2014.
 [56] T. Liu, M. Seyedhosseini, and T. Tasdizen. Image segmentation using hierarchical merge tree. IEEE transactions on image processing, 25(10):4596–4607, 2016.
 [57] T. Liu, M. Zhang, M. Javanmardi, N. Ramesh, and T. Tasdizen. Sshmt: Semisupervised hierarchical merge tree for electron microscopy image segmentation. In European Conference on Computer Vision, pages 144–159. Springer, 2016.
 [58] Y. Liu, S. Yang, B. Li, W. Zhou, J. Xu, H. Li, and Y. Lu. Affinity derivation and graph merge for instance segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 686–703, 2018.
 [59] F. Malmberg, R. Strand, and I. Nyström. Generalized hard constraints for graph segmentation. In Scandinavian Conference on Image Analysis, pages 36–47. Springer, 2011.
 [60] Y. Meirovitch, A. Matveev, H. Saribekyan, D. Budden, D. Rolnick, G. Odor, S. KnowlesBarley, T. R. Jones, H. Pfister, J. W. Lichtman, et al. A multipass approach to largescale connectomics. arXiv preprint arXiv:1612.02120, 2016.
 [61] A. Newell, Z. Huang, and J. Deng. Associative embedding: Endtoend learning for joint detection and grouping. In Advances in Neural Information Processing Systems, pages 2277–2287, 2017.
 [62] J. NunezIglesias, R. Kennedy, T. Parag, J. Shi, and D. B. Chklovskii. Machine learning of hierarchical clustering to segment 2d and 3d images. PloS one, 8(8):e71715, 2013.
 [63] C. Pape, T. Beier, P. Li, V. Jain, D. D. Bock, and A. Kreshuk. Solving large multicut problems for connectomics via domain decomposition. In Proceedings of the IEEE International Conference on Computer Vision, pages 1–10, 2017.
 [64] T. Parag, F. Tschopp, W. Grisaitis, S. C. Turaga, X. Zhang, B. Matejek, L. Kamentsky, J. W. Lichtman, and H. Pfister. Anisotropic em segmentation by 3d affinity learning and agglomeration. arXiv preprint arXiv:1707.08935, 2017.
 [65] K. Perlin. An image synthesizer. ACM Siggraph Computer Graphics, 19(3):287–296, 1985.
 [66] K. Perlin. Noise hardware. RealTime Shading SIGGRAPH Course Notes, 2001.
 [67] M. Ren and R. S. Zemel. Endtoend instance segmentation with recurrent attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6656–6664, 2017.
 [68] S. Ren, K. He, R. Girshick, and J. Sun. Faster rcnn: Towards realtime object detection with region proposal networks. In Advances in neural information processing systems, pages 91–99, 2015.
 [69] Z. Ren and G. Shakhnarovich. Image segmentation by cascaded region agglomeration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2011–2018, 2013.
 [70] B. RomeraParedes and P. H. S. Torr. Recurrent instance segmentation. In European conference on computer vision, pages 312–329. Springer, 2016.
 [71] O. Ronneberger, P. Fischer, and T. Brox. Unet: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computerassisted intervention, pages 234–241. Springer, 2015.
 [72] S. Saalfeld, R. Fetter, A. Cardona, and P. Tomancak. Elastic volume reconstruction from series of ultrathin microscopy sections. Nature methods, 9(7):717, 2012.
 [73] P. Salembier and L. Garrido. Binary partition tree as an efficient representation for image processing, segmentation, and information retrieval. IEEE transactions on Image Processing, 9(4):561–576, 2000.
 [74] P. Schlegel, M. Costa, and G. S. Jefferis. Learning from connectomics on the fly. Current opinion in insect science, 24:96–105, 2017.
 [75] U. Schmidt, M. Weigert, C. Broaddus, and G. Myers. Cell detection with starconvex polygons. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pages 265–273. Springer, 2018.
 [76] T. Sørensen. A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on danish commons. Biol. Skr., 5:1–34, 1948.
 [77] S. C. Turaga, K. L. Briggman, M. Helmstaedter, W. Denk, and H. S. Seung. Maximin affinity learning of image segmentation. pages 1865–1873, 2009.
 [78] J. Uhrig, M. Cordts, U. Franke, and T. Brox. Pixellevel encoding and depth layering for instancelevel semantic labeling. In German Conference on Pattern Recognition, pages 14–25. Springer, 2016.
 [79] M. G. Uzunbas, C. Chen, and D. Metaxas. An efficient conditional random field approach for automatic and interactive neuron segmentation. Medical image analysis, 27:31–44, 2016.
 [80] S. Wolf, A. Bailoni, C. Pape, N. Rahaman, A. Kreshuk, U. Köthe, and F. A. Hamprecht. The mutex watershed and its objective: Efficient, parameterfree image partitioning. arXiv preprint arXiv:1904.12654, 2019.
 [81] S. Wolf, C. Pape, A. Bailoni, N. Rahaman, A. Kreshuk, U. Kothe, and F. Hamprecht. The mutex watershed: Efficient, parameterfree image partitioning. In Proceedings of the European Conference on Computer Vision (ECCV), pages 546–562, 2018.
 [82] S. Xie and Z. Tu. Holisticallynested edge detection. In Proc. ICCV’15, pages 1395–1403, 2015.
 [83] Y. Yang, S. Hallman, D. Ramanan, and C. C. Fowlkes. Layered object models for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9):1731–1743, 2012.
 [84] J. Yarkony, A. Ihler, and C. C. Fowlkes. Fast planar correlation clustering for image segmentation. In European Conference on Computer Vision, pages 568–581. Springer, 2012.
 [85] T. Zeng, B. Wu, and S. Ji. Deepem3d: approaching humanlevel performance on 3d anisotropic em image segmentation. Bioinformatics, 33(16):2555–2562, 2017.
7 Supplementary material
7.1 Implementation details and complexity of GASP
Update rules
During the agglomerative process, the interaction between adjacent clusters has to be properly updated and recomputed, as shown in Algorithm 1. An efficient way of implementing these updates can be achieved by representing the agglomeration as a sequence of edge contractions in the graph. Given a graph and a clustering , we define the associated contracted graph , such that there exists exactly one representative for every cluster . Edges in represent adjacencyrelationships between clusters and the signed edge weights are given by intercluster interactions . For the linkage criteria tested in this work, when two clusters and are merged, the interactions between the new cluster and each of its neighbors depend only on the previous interactions involving and . Thus, we can recompute these interactions by using an update rule that does not involve any loop over the edges of the original graph :
(1) 
In Fig. 6 we show an example of edge contraction and we list the update rules associated to the linkage criteria we introduced in Table 1.
Implementation
As we show in Algorithm 2, our implementation of GASP is based on an unionfind data structure and a heap allowing deletion of its elements. The algorithm starts with each node assigned to its own cluster and sorts all edges in a heap/priority queue (PQ) by their absolute weight in descending order, so that the most attractive and the most repulsive interactions are processed first. It then iteratively pops one edge from PQ and, depending on the priority , does the following: in case of attractive interaction , provided that was not flagged as a cannotlink constraint, then merge the connected clusters, perform an edge contraction of in and update the priorities of new double edges as explained in Fig. 6. If, on the other hand, the interaction is repulsive () and the option addCannotLinkContraints of Alg. 2 is True, then the edge is flagged as cannotlink constraint.
Complexity
In the main loop, the algorithm iterates over all edges, but the only iterations presenting a complexity different from are the ones involving a merge of two clusters, which are at most . By using a unionfind data structure (with path compression and union by rank) the time complexity of merge and find() operations is , where is the slowly growing inverse Ackerman function. The algorithm then iterates over the neighbors of the merged cluster (at most ) and updates/deletes values in the priority queue (). Therefore, similarly to a heapbased implementation of hierarchical agglomerative clustering, our implementation of GASP has a complexity of . In the worst case, when the graph is dense and , the algorithm requires memory. Nevertheless, in our practical applications the graph is much sparser, so . With a singlelinkage, corresponding to the choice of the Maximum update rule in our framework, the algorithm can be clearly implemented by using the more efficient Kruskal’s Minimum Spanning Tree algorithm with complexity . Moreover, in the next section, we present an efficient implementation of GASP with Absolute Maximum linkage that has empirical complexity.
7.2 Properties of GASP with Absolute Maximum linkage
Remark on graph notation
The definition of a graph proposed by wolf2018mutex makes a distinction between a set of positive edges , associated with a set of positive scalar attributes representing merge affinities, and a set of negative edges , associated with a set of positive attributes representing split tendencies. On the other hand, in our definition each edge have both an attractive and a repulsive attribute, so we can make them equivalent by defining:
(2)  
(3) 
Proposition 7.1.
The Mutex Watershed Algorithm 3 (MWS) with empirical complexity introduced by wolf2018mutex returns the same final clustering given by the GASP Algorithm 2 with the use of cannotlink constraints and an Absolute Maximum update rule:
(4) 
Proof.
Both algorithms sort edges in descending order of the absolute interactions and then iterate over all of them. The only difference is that MWS, after merging two clusters, does not update the interactions between the new cluster and its neighbors. However, since with an Abs. Max. linkage the interaction between clusters is simply given by the edge with highest absolute weight , the order by which edges are iterated over in GASP is never updated. Thus, both algorithms perform precisely the same steps and return the same clustering. ∎
Proposition 7.2.
Proof.
In the GASP Algorithm 2, the clustering is updated only when two clusters are merged and the condition at line 9 is satisfied. We also observe that, in the unconstrained version of GASP, the predicate canBeMerged at line 9 can never be false because cannotlink constraints are never introduced at line 14. Let us now contradict the initial hypothesis and assume by absurd that the constrained version of GASP introduces a cannotlink constraints between two clusters sharing a positive interaction and outputs a different clustering as compared to the unconstrained version. This can happen only in the situation shown in Fig. 7, when two clusters and are merged together and share a common neighboring node having the following two properties: a) and are already constrained and share a repulsive interaction , b) and share an attractive interaction that is higher in absolute value . Then, according to Eq. 4, the new merged cluster and are constrained and share a positive interaction. But this case can never happen, since if then clusters and are merged before clusters and are constrained. ∎
7.3 Predicting signed edge weights with a CNN
Our CNN model outputs affinities in the form of pseudoprobabilities , where represents a boundary evidence. In order to use them as input of the algorithms in our framework, we mapped them to positive and negative values^{3}^{3}3Note that in general attractive and repulsive interactions and can be independently estimated with different classifiers.. The most common approaches use additive ailon2008aggregating or logarithmic finkel2008enforcing ; andres2012globally mappings:
(5) 
where is a bias parameter that allow a tuning between over and undersegmentation. We evaluated both of them empirically with each of the tested linkage and found that the additive mapping is the best option in all cases apart from the Sum linkage. Note that varying the parameter does not usually define a hierarchy of nested clusterings, thus it is not equivalent to varying a threshold parameter in HAC. This hierarchical property is only valid for GASP without constraints and with Average, Max or Min linkage.
7.4 Neuron segmentation and compared methods
Training details
The data from the CREMI challenge is highly anisotropic and contains artifacts like missing sections, staining precipitations and support film folds. To alleviate difficulties stemming from misalignment, we use a version of the data that was elastically realigned by the challenge organizers with the method of saalfeld2012elastic . We train a 3D UNet ronneberger2015u ; cciccek20163d using the same architecture as funke2018large and predict longandshort range affinities as described in lee2017superhuman . In addition to the standard data augmentation techniques of random rotations, random flips and elastic deformations, we simulate data artifacts. In more detail, we randomly zeroout slices, decrease the contrast of slices, simulate tears, introduce alignment jitter and paste artifacts extracted from the training data. Both funke2018large and lee2017superhuman have shown that these kinds of augmentations can help to alleviate issues caused by EMimaging artifacts. We use L2 loss and Adam optimizer to train the network. The model was trained on all the three samples with available ground truth labels.
THRESH and WSDT
The basic postprocessing methods we consider cannot take longrange affinities into account, so we only consider direct neighbors affinities and generate a boundary map by taking an average over the 3 directions. Based on this boundary map, we run connected components (THRESH) and we also run a watershed algorithm seeded at the maxima of the smoothed distance transform (WSDT). For WSDT, the degree of smoothing was optimized such that each region receives as few seeds as possible, without however causing severe undersegmentation. Due to the anisotropy of the data, we generate 2D WSDT superpixels by considering each 2D image in the stack singularly.
Multistep pipelines
Given the 2D WSDT superpixels, we build a 3D regionadjacency graph such that each node represents a superpixel. The weights of the edges connecting neighboring superpixels are computed by taking an average over both short and longrange affinities connecting the two regions. We then convert the edge probabilities to signed weights using the logarithmic mapping defined in Eq. 5 and solve the multicut problem on this graph. For our experiments, we use the approximate KernighanLin solver keuper2015efficient ; kernighan1970efficient (WSDT+MC). In some cases, the longrange affinities predicted by the CNN can connect two superpixels that are not directneighbors. Thus, in these cases we introduce additional lifted edges in the graph and an instance of the lifted multicut problem (WSDT+LMC). This time, similarly to the methods mentioned in beier2016efficient , we used a combination of approximate solvers consisting in GAEC and KernighanLin.
7.5 GASP on the full CREMI dataset
Premerge processing
For the predictions on the full dataset from the CREMI challenge, we used the padded volumes provided by the challenge. The crops on which we performed a prediction have a size of
voxels or larger. Building a graph with nodes can easily incur a large use of memory, so we decided to perform a preprocessing step by initially merging some nodes together. Simply downsampling the predictions of the CNN would have led to a loss of resolution and performances in the most difficult parts of the dataset. Thus, we decided to premerge the most connected components of the graph that would be anyway clustered during the first iterations of GASP. To do this, we used a simple approach: we generated a boundary probability map by taking for each voxel an average over affinities in all directions (both short and longrange ones) and we run THRESH with a conservative threshold parameter to find the connected components. With this approach, pixels are preclustered only when they are far away (in all directions) from predicted boundaries. To make sure that in this preprocessing step different neurons are never merged together by mistake, we intersected the segments given by THRESH with the segments given by WSDT. We tested GASP both on the full gridgraph and on this preprocessed graph and we did not notice any major differences in the final clustering or achieved scores, although the version with a preprocessed graph was significantly faster. To reduce the runtime and memory requirements even further, we used only 10 % of the longrange connections in the pixelgraph, since adding all of them did not improve the scores.Removing small segments
After running GASP, we use a simple postprocessing step to delete small segments on the boundaries, most of which are given by singlevoxel clusters. On the neuron segmentation predictions, we deleted all regions with less than 200 voxels and used a seeded watershed algorithm to expand the bigger segments.
Enforcing local merge
In 2D images of urbanscenes, due to partial occlusion, one object instance can be given by multiple components that are not directly connected in the image plane. This is not the case in neuronsegmentation, where each neuron should be given by a single 3D connected component in the volume. In order to enforce it, we modified the implementation of GASP so that two clusters are merged only when they represent two adjacent supervoxels in the 3D volume and if this condition is not satisfied, the merge is postponed until there is a direct connection. This then avoids the introduction of “airbridges” between segments due to attractive longrange connections in the initial voxel gridgraph. This approach achieved superior performances to the one proposed in wolf2018mutex , where all longrange connections in the gridgraph are associated to a negative repulsive edge weight.
GASP linkage  CREMIScore (higher better)  RandScore (higher better)  VImerge (lower better)  VIsplit (lower better)  Runtime (lower better) 
Average  0.226  0.936  0.315  0.494  3.49 10 
Sum + CLC levinkov2017comparative  0.282  0.906  0.358  0.510  4.64 10 
Abs Max wolf2018mutex  0.322  0.897  0.286  0.735  1.24 10 
Max + CLC  0.324  0.893  0.292  0.698  6.31 10 
Sum keuper2015efficient  0.334  0.872  0.461  0.444  4.74 10 
Average + CLC  0.563  0.772  0.259  1.142  2.95 10 
Min  2.522  0.030  0.197  6.365  2.97 10 
Min + CLC  2.522  0.030  0.197  6.365  4.77 10 
Max  2.626  0.028  7.069  0.026  6.04 10 
7.6 GASP sensitivity to noise: adding artifacts to CNN predictions
Additionally to the comparison on the full training dataset, we performed more experiments on a crop of the more challenging CREMI training sample B, where we perturbed the predictions of the CNN with noise and we introduced additional artifacts like missing or fictitious boundary evidences.
In the field of image processing there are several ways of adding noise to an image, among which the most common are Gaussian noise or Poisson shot noise. In these cases, the noise of one pixel does not correlate with its neighboring noise values. On the other hand, predictions of a CNN are known to be spatially correlated. Thus, we used Perlin noise^{4}^{4}4In our experiments, we used an opensource implementation of simplex noise perlin2001noise , which is an improved version of Perlin noise perlin1985image , one of the most common gradient noises used in procedural pattern generation. This type of noise generates spatial random patterns that are locally smooth but have large and diverse variations on bigger scales. We then combined it with the CNN predictions in the two following ways:
(6) 
where ; and is a positive factor representing the amount of added noise. represents then a underclustering biased prediction, such that the probability for two pixels to be in the same cluster is increased only if (see Fig. 8b), whereas is a overclustering biased prediction with decreased probabilities when (Fig. 8c). In the implementation we used, the noise can be generated in an arbitrary number of dimensions and a smoothing factor can be specified for each direction independently. In our experiments, each pixel is represented by a node in the gridgraph and it is linked to other nodes by short and longrange edges. Thus, the output of our CNN model has channels: for each pixel / voxel, it outputs values representing the weights of different edge connections. We then generated a 4dimensional noise that matches the dimension of the CNN output. The data is highly anisotropic, i.e. it has a lower resolution in one of the dimensions. Due to this fact, we chose different smoothing parameters to generate the noise in different directions.
The experiments summarized in Fig. 4 were performed in the following way: for each value , 30 random noise samples were drawn, from which median and percentiles statistics were computed for each different linkage criteria. For each sample, we randomly selected some of the longrange predictions from the CNN and added them to pixel gridgraph.
7.7 Finetuning the GMIS pipeline on CityScapes
For our experiments, we used the model from GMIS liu2018affinity that is publicly available. The model consists of two neural networks with similar structures, one predicting pixel level semantic scores and the other predicting pixel affinities between instances. We also used all the affinity postprocessing methods proposed in liu2018affinity , e.g. excluding background, resizing regions of interest or the proposed "affinityrefinement" method, which combines semantic and instance outputs. The instancebranch of the model was trained with a Binary CrossEntropy loss, but we noticed how the shortrange affinities were biased towards high probabilities, so that a strong shortrange boundary evidence was never predicted by the model. In liu2018affinity , they handle this problem by proposing a modified version of HAC that is done in stages (MultiStepHAC): initially only shortrange affinities are used to run HAC and a low threshold in the hierarchy is chosen to define a first clustering; then a new HAC problem including longrange affinities is initialized with the first clustering; in the method proposed by liu2018affinity , these steps are repeated three times.
Since MultiStepHAC is a rather complex postprocessing method that requires to tune several hyperparameters, we opted for a different approach to solve the problem of the unbalanced affinities. We added two 1x1 convolutional layers to the instancebranch model and trained them by using the same loss used by wolf2018mutex and is based on the SøresenDice coefficient dice1945measures ; sorensen1948method . Compared to Hammingdistance based loss like Binary CrossEntropy or Mean Squared Error, the advantage of this loss is its being robust against prediction and / or target sparsity, that is a desirable quality in this application since boundaries between instances can be sparse. During training, all the affinities involving at least one pixel belonging to the background were ignored in the loss. In this way, these last two layers specialized in improving the predictions of boundary evidence between adjacent instances (especially those belonging to the same class). We then considered an average of these new finetuned affinities with the ones predicted by the original model. During the finetuning process, only the parameters of the last two convolutional layers were updated.
Before to apply GASP, we performed a parametersearch for the bias defined in 5. Table 6 lists the bestcase performances for each of the methods: note that depending on the GASP linkage criterion, it was necessary to bias more or less the predicted edge weights.
The semantic categories are assigned to each instance in the same way proposed by liu2018affinity , i.e. with a majority vote based on the semantic output of the model.
Comments
There are no comments yet.