Instance Segmentation of Biological Images Using Harmonic Embeddings

by   Victor Kulikov, et al.

We present a new instance segmentation approach tailored to biological images, where instances may correspond to individual cells, organisms or plant parts. Unlike instance segmentation for user photographs or road scenes, in biological data object instances may be particularly densely packed, the appearance variation may be particularly low, the processing power may be restricted, while, on the other hand, the variability of sizes of individual instances may be limited. These peculiarities are successfully addressed and exploited by the proposed approach. Our approach describes each object instance using an expectation of a limited number of sine waves with frequencies and phases adjusted to particular object sizes and densities. At train time, a fully-convolutional network is learned to predict the object embeddings at each pixel using a simple pixelwise regression loss, while at test time the instances are recovered using clustering in the embeddings space. In the experiments, we show that our approach outperforms previous embedding-based instance segmentation approaches on a number of biological datasets, achieving state-of-the-art on a popular CVPPP benchmark. Notably, this excellent performance is combined with computational efficiency that is needed for deployment to domain specialists. The source code is publicly available at Github:



page 3

page 4

page 5

page 6

page 7

page 8


Instance Segmentation by Deep Coloring

We propose a new and, arguably, a very simple reduction of instance segm...

Sparse Object-level Supervision for Instance Segmentation with Pixel Embeddings

Most state-of-the-art instance segmentation methods have to be trained o...

Embedding-based Instance Segmentation of Microscopy Images

Automatic detection and segmentation of objects in microscopy images is ...

Layered Embeddings for Amodal Instance Segmentation

The proposed method extends upon the representational output of semantic...

Incremental Few-Shot Instance Segmentation

Few-shot instance segmentation methods are promising when labeled traini...

Fully Convolutional Networks for Panoptic Segmentation

In this paper, we present a conceptually simple, strong, and efficient f...

HistoNet: Predicting size histograms of object instances

We propose to predict histograms of object sizes in crowded scenes direc...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Instance segmentation (object separation) in biological images often represents a key step in their analysis. Many biological image modalities (e.g. microscopy images of cell cultures) are characterized by excessive numbers of instances. Other modalities (e.g. worm assays) are characterized by tight and complex overlaps and occlusions. On the other hand, in most situations, the scale variations of objects of interest in biomedical data is less drastic than in natural photographs due to the lack of strong perspective effects. In this work, we propose a new instance segmentation approach designed for biological image instance segmentation that can address the challenges (number of instances, overlaps) while exploiting the simplifying properties (limited scale variation).

Our approach continues the line of works [7, 21] that perform instance segmentation by learning deep embeddings, and using clustering in the embedding space to recover the instances at test time. Learning good embeddings for object instances with a fully-convolutional network is however challenging, especially for biological data, where individual instances may have almost indistinguishable appearance.

To utilize the specific nature of biomedical images, we depart from the end-to-end philosophy of the previous embedding-based instance segmentation works, and split the learning process into two stages. At first stage, we seek a small set of harmonic functions that can be used to separate objects in the training dataset. The search is implemented as an optimization process that tunes the frequencies and phases of the harmonics to a specific range of scales and object densities in the data. The selected set of harmonics then guides the second stage of the learning process as well as the inference process at test-time.

At the second learning stage, we assign each ground truth object instance its harmonic embedding based on the expectation of the learned set of functions. We then learn a deep fully-convolutional network to predict resulting embeddings at each pixel. We show that learning with a simple pixelwise regression loss is feasible, as long as the information about harmonic functions is provided to convolutional layers in the network (which we achieve using a special new kind of a convolutional layer). The learned networks generalize well to new images, and tend to predict pixel embeddings that can be easily clustered into object instances.

In the experiments, we compare our approach to direct embedding-based instance segmentation  [7] as well as to other state-of-the-art methods. Four biomedical datasets corresponding to plant phenotyping, bacterial and human cell culture microscopy, as well as C.Elegans assays are considered. We observe considerable improvement of performance brought by our approach.

Figure 1: The harmonic instance segmentation framework. At train-time, we embed each pixel of the ground truth image as the mean of predefined guide functions f over instance pixels it belongs to, resuling in embeddings

. We then train the neural network

to reproduce the ground truth embedding given the input image . To simplify learning, guide functions f are inputed into intermediate representations of the network using SinConv layers. The learning process uses a simple pixelwise L1-Loss between ground truth embedding and the neural network prediction as a learning objective. At test time instances are retrieved from the predicted embedding using mean shift clustering.

2 Related Work

Proposal-based instance segmentation methods [5, 11, 3, 5, 6, 17, 12]

combine object detection with object mask estimation, and currently represent state-of-the-art on instance segmentation non-biological benchmarks. The necessity to perform object detection followed by non-maximum suppression makes learning and tuning of methods from this group complex, especially in the presence of tight object overlaps when non-maximum suppression becomes fragile.

Another group of approaches to instance segmentation is based on recurrent neural networks (RNNs) and generate instances sequentially. Thus, Romera et al. [24] trains a network for end-to-end instance segmentation and counting using LSTM networks [14]. Ren et al. [23] proposed a combination of a recurrent network with bounding box proposals. RNN-based frameworks show excellent performance on small datasets; they achieve state-of-the-art results on the CVPPP plants phenotyping dataset. The major problem of recurrent methods is the vanishing gradients effect that becomes particularly acute when the number of instances is large.

Our method falls in the category of proposal-free approaches to instance segmentation based on instance embedding. In this case, neural networks are used to embeds pixels of an image into a hidden multidimensional space, where embeddings for pixels belonging to the same instance should be close, while embeddings for pixels of different objects should be separated. A clustering algorithm may then be applied to separate instances. To achieve this, the approach [9]

penalizes pairs of pixels using a logistic distance function in the embedding space. The embedding is learned using log-loss function and requires to weight pairs of pixels in order to mitigate the size imbalance issue. This method also predicts a seedness score for each pixel, that correlates with the centeredness. They use this score to pick objects from the embedding. Kong at al. 

[15] use differentiable Gaussian Blurring Mean-Shift for the recurrent grouping of embeddings. Deep Coloring [16] proposes a reduction of instance segmentation to semantic segmentation, whereas class labels are reused for non-adjacent objects. The instances are then retrieved using connected component analysis.

Most related to ours, De Brabandere et al. [7] use a non-pairwise discriminative loss function composed of two parts: one pushing different objects embeddings centers further apart, while the other pulling embeddings of the same object pixels closer to its mean. Instances are retrieved using the mean-shift algorithm. The approach [21] uses metric learning together with an explicit assignment of the center of mass as the target embedding. Our approach follows the general paradigm of [7, 21], however suggests a special kind of embeddings detailed below. The use of new embeddings result in an explicit assignment of embeddings to each pixel in the training image, thus simplifying the learning process.

3 Harmonic Instance Embedding

We now discuss our approach in details. Existing instance embedding methods [7, 16, 9] do not prespecify target embeddings for pixels in the training set. Instead, they rely on the learning process itself to define these embeddings. In contrast, our goal is to define “good” embeddings to pixels a priori. “Goodness” here means amenability for clustering as well as learnability by a convolutional architecture.

Let be a family of real-valued functions in the image domain, where corresponds to the coordinates of the argument, and

is a set of learnable parameters defining the shape of the function (e.g. the frequency vector and the phase of a sine function). As our approach is built in many ways around this family of functions, we call the function family

the guide functions.

Let be an arbitrary set of pixels (e.g. an object instance in the ground truth annotation of a training image). We denote with the expectation of over :


If denotes the joint vector of parameters of all functions, then the guided embedding of an object determined by is defined as the following -dimensional vector:


To sum up, the guided embedding maps each object to the expectations of the guide functions over this object.

3.1 Picking good guide functions

Given a new dataset representing a new type of instance segmentation problem, our goal is to find a good set of guide functions (1), so that different objects have well-separated guided embeddings.

To do that, we first restrict to a certain functional family parameterized by the parameters . As discussed above, in many biomedical datasets, there is a certain (imperfect) regularity in the location of objects. E.g. monolayer cell cultures organize themselves in a texture composed of elements of approximately same size and adjacent to each other. Such loosely-regular, semi-periodic structure calls for the use of harmonic functions as guides:


where and are image width and height respectively, and are frequency parameters, and is a phase parameter.

Assume now that a set of training images is given. We can then estimate the quality of guided embeddings by looking at pairs and of objects belonging to the same image (e.g. two different cells from the same image) and finding out how frequently they have very close embeddings. Ideally, we want to avoid such collisions in the embedding space (at the very least, we want to avoid them on the training set). The following loss is therefore considered:


where is the distance, is the margin meta-parameter, and denotes the set of all pairs of objects from the training image . Each individual term in (4) is a hinge loss term that is non-zero, if the guided embeddings of a certain object pair are too close (closer than ).

To find good guide functions, we minimize the loss (4

) on the training set. We perform stochastic gradient descent over the training set by drawing minibatches of random pairs of objects from random images and updating

to minimize (4) for the pairs from the minibatch. In our implementation, we initialize frequency parameters and

to uniformly distributed random numbers from the interval

, while the phase parameters are initialized uniformly from .

The outcome of the learning is a set of guide functions, such that pairs of objects from training images have their guided embeddings separated in the embedding space. For typical settings and , most pairs in the training set end up isolated by more than the margin.

3.2 Learning good embedding network

Assume now that the parameters of the guide functions have been optimized on the training set, so that the parameters are now fixed. To derive the loss for the second stage of the training process, we further denote be a mapping from pixel to an object containing this pixel.

We then train a deep fully-convolutional embedding network with parameters to map input images to sets of -channel images, where each pixel is assigned an -dimensional embedding. During learning, we minimize the following simple loss function:


Here, denotes the set of foreground pixels of image and denotes the output of the network at the spatial position (if the foreground/background segmentation is not available, then the summation is taken over the full image). By minimizing (5), we encourage the network to map each pixel to the guided embedding of the object it belongs to.

We have found that standard fully-convolutional architectures (e.g. U-Net [25]) perform very well and achieve low train and test set losses (5) provided one important modification to convolutional layers is made. When modifying a convolutional layer , we augment its input with an extra set of maps holding the guide functions values. Specifically, the extra maps contain the values at each spatial position . Here, is the downsampling factor of the layer (compared to the input/output resolution). The use of downsampling factor is needed to make sure that the augmenting maps in different layers are spatially aligned with the output.

Note that our augmentation idea generalizes the recently suggested CoordConv layer [19] that augmented the input of convolutional layers with . By analogy, and since the guiding functions in our implementation are harmonic, we call the new operation SinConv layer (Figure 2).

Figure 2: The SinConv layer maps a representation block with shape to a new representation block of shape by concatenating guide functions and then performs convolution. The use of SinConv blocks greatly simplifies the task of learning to regress harmonic embeddings.

Our embedding architectures follow the design principles of the state-of-the-art semantic segmentation frameworks [25, 31, 13] which are composed of encoder and decoder pathways. We use SinConv layers in the upsampling part (“decoder”) only to be able to use a pretrained “backbone” network. .

3.3 Instance segmentation of test images

At test time, the application of the learned embedding network is straightforward. The network is applied to an input image. Our post-processing is then similar to the one suggested in [7]. We use the mean-shift clustering algorithm [4] to obtain instance masks from the embeddings space (Figure 3 bottom row). The mean-shift bandwidth is set to the margin used in the guide function selection, since both parameters have the meaning of the desirable separation between the embeddings of different instances.

Figure 3: Example of ground truth (top row) and predicted (bottom row) embedding. Embeddings predicted by the network are very close to the ground truth, which greatly simplifies the clustering-based post-processing.

4 Experiments

We provide results of our method on three challenging biomedical datasets of bright-field microscopy images of C.elegans, E.Coli, Hela and the plant phenotyping dataset (CVPPP 2017 sequence A1). In each case, learning was done on a single NVidia Tesla V100 GPU. The training in all cases was performed using ADAM optimizer with learning rate 1e-5. All code was implemented using PyTorch framework


The architecture and data augmentation were same for all datasets. In our experiments we have used the U-Net [25] neural network and replaced the first convolution of each upscaling block with the SinConv layer. The network was trained from scratch. Due to a small number of training images in those datasets, we have added some data augmentation procedures, namely cropping patches of size , scaling, and left-right flips. The number of embedding dimensions was set to 12 (with that dimensionality and =0.5 we obtained zero error in hinge loss 4, during guide function selection), and the mean-shift bandwidth was set to

. Note that availability of parameters that work well for diverse datasets is very important for practitioners. The number of training epochs was set differently for different datasets due to their varying complexity.

We used Symmetric best Dice coefficient (SBD) and average precision (AP) as metrics. The SBD metric averages the intersection over union (IOU) between pairs of predicted and the ground truth labels yielding maximum IOU. The AP metrics integrates precision for different recall values.

We have used De Brabandere et al. [7] as the main baseline, and have reimplemented their approach using the same network architecture as ours (both variants with and without SinConv layers were tried). On the CVPPP dataset where the authors’ implementation results is known, the result of our re-implementation is considerably better suggesting that our re-implementation forms a strong baseline.

4.1 CVPPP dataset

The Computer Vision Problems in Plants Phenotyping (CVPPP) dataset

[26](Figure 4) is one of the most popular instance segmentation benchmark. The dataset consists of five sequences of different plants. We have used the most common sequence A1 that has the most significant number of baselines. The A1 sequence has 128 top-down view images, with pixels size each as a training set, and an additional hidden test set with 33 images from the same sequence. The task of instance segmentation is challenging because of the high variety of leaf shapes and complex occlusion between leaves. The performance of competing algorithms is SBD metrics and the absolute difference in counting (c.f. [27]).

To fit that embedding space we have trained the neural network for 500 epochs. Table 1 shows our method currently being the state-of-the-art compared to all published methods.

Figure 4: Sample results on the plant phenotyping dataset (CVPPP). The top row shows the source images, the second row shows our results, and the third row shows the ground truth (color display required).
IPK [22] 2.6 74.4
Nottingham [27] 3.8 68.3
MSU [27] 2.3 66.7
Wageningen [29] 2.2 71.1
PRIAn [10] 1.3 -
Recurrent IS [24] 1.1 56.8
Recurrent IS+CRF [24] 1.1 66.6
Recurrent with attention [23] 0.8 84.9
Discriminative loss [7] 1.0 84.2
Deep coloring [16] 2.0 80.4
Discriminative loss [7] (our implementation)
Without SinConv 4. 88.0
With SinConv 4. 89.0
Ours without SinConv 5. 78.3
Ours 3. 89.9
Table 1: Quantiatative results on the CVPPP dataset (methods with published descriptions as well as our method and baselines included). Our method performs best both in terms of using the Symmetric Best Dice coefficient () and the additional measure.

4.2 E.coli dataset

The E.coli dataset (Figure 5) is interesting because of the number of organisms is large and they are crowded. The dataset contain 37 brightfield images. The ground truth is derived using watershed algorithm [2] from weak annotations, in which every organism is annotated by a line segment.

At test time, images were processed by non-overlapping crops of size . The SBD score was calculated for each crop independently and then averaged. The performance of our method is better compared to other methods prevously evaluated on this dataset (Table 2). Unfortunately, we were not able to get reasonable results from the method  [7]

, probably due to a drastic change of the organism number between different crops.

Figure 5: E.Coli bacteria recorder under differential interference contrast microscopy. From left to right: raw image, result of our segmentation, ground truth labels.
U-Net baseline [25] - 59.3
Deep coloring [16] 2.2 61.9
Ours 0.88 81.2
Table 2: Results on the E.Coli dataset. We follow the protocol from [26] in order to compare with [25] and [16].

4.3 HeLa dataset

The HeLa cancer cells dataset111Courtesy Dr. Gert van Cappellen Erasmus Medical Center of Rotterdam. (Figure 6) is quite different from the other three datasets. Cells take a large part of each image, and, being cancerous, are more irregular and form intricate patterns. In contrast to small and crowded pictures with E.Coli, the number of cells is moderate, but they have a large area and more diverse sizes. The dataset contains 18 partially annotated single channel training images. Following best practices we split the dataset into train and test parts (9 images each). The goal of this experiment is thus to show that our method can generalize well given very few training examples.

We trained the network for 8000 epochs. No information about the background was used in the dataset. On this dataset we achieve SBD without foreground mask, and SBD with foreground mask, which is insignificantly outperforms the semantic segmentation baseline of IOU reported in [25]. Our implementation of the baseline method  [7] didn’t show any reasonable results with current configuration.

Figure 6: Hela cells on glass recorded with differential interference contrast microscopy. From left to right: raw image, result of our segmentation, ground truth labels.

4.4 C.elegans dataset

Figure 7: Binary images of C.elegans roundworm. Simple images crops (top row), second row - our results, third row - ground truth lables. (color display needed)

Finally, we look at the C.elegans dataset (Figure 7), which is available from the Broad Bioimage Benchmark Collection [20]. This sequence contains 97 two channel images 696 x 520 pixels, each of roundworm C.elegans. Each image contains approximately 30 organisms, some of them in complex overlapping patterns. In order to compare with results from [21] we follow their protocol: the whole dataset was split into to equal parts - 50 training set and 47 test images. Here, we use the binary segmentation masks (following [21, 28, 30]). The network was trained for 1000 epochs.

Semi-convolutional operators [21] 0.569 0.885 0.661 0.511 0.671
Mask RCNN [12] 0.559 0.865 0.641 0.502 0.650
Discriminative loss [7] (our implementation without SinConv) 0.343 0.624 0.380 0.441 0.563
Discriminative loss [7] (our implementation with SinConv) 0.478 0.771 0.560 0.551 0.677
Ours 0.724 0.900 0.723 0.775 0.875
Table 3: Results on the C. elegans dataset. The results were obtained using the COCO standard metric [18]. Again, the proposed method outperforms previous approaches.

We evaluate the instance segmentation using average precision (AP) metric, computed with the standard COCO evaluation protocol [18]. From Table 3 it can be seen that our method outperform previous works including the well-known Mask-RCNN method [12] (as reported by the authors of [21]).

4.5 Method limitations

Despite the improving state-of-the-art results on biological datasets, the proposed method has several limitations that need to be resolved before applying to more complex datasets with severe variety in object scales like COCO [18], PASCAL VOC [8], Cityscapes [5] (where our initial attempts to apply the method lead to mediocre, i.e. mid-table results). To the best of our understanding, the sub-par performance of the method is caused by inability to handle very diverse scales gracefully. We are currently investigating multi-scale schemes as well as other families of guide functions, which may potentially improve the results.

5 Conclusions

We have presented a new instance segmentation approach that exploits the peculiarities of biological images. The approach is based around new type of embedding based on sine waves with parameters adjusted to achieve separation of ground truth instances in the train dataset, prior to the main training stage. We have shown that such precomputation of good embeddings greatly simplifies the learning stage, whenever the same guide patterns are inputted in some of the convolutional layers of the embedding network. The ease of training is evidenced by the superior performance of our method compared to [7].

In the experiments we have shown the ability of our method to handle rather diverse biological image data, while using the same relatively small architecture and the same set of meta-parameters. Such versatility is valuable for practical deployment of the method to domain specialists.

The source code is publicly available at Github:


This work was supported by the Skoltech NGP Program (MIT-Skoltech 1911/R). Assistance of Skoltech HPC team is deeply appreciated. High-performance computations presented in the paper were carried out on Skoltech HPC cluster Zhores.


  • [1]

    PyTorch tensors and dynamic neural networks in python with strong gpu acceleration. Accessed: 2018-11-15.
  • [2] A. Bieniek and A. Moga. An efficient watershed algorithm based on connected components. Pattern recognition, 33(6):907–916, 2000.
  • [3] Y.-T. Chen, X. Liu, and M.-H. Yang. Multi-instance object segmentation with occlusion handling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3470–3478, 2015.
  • [4] D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. IEEE Transactions on pattern analysis and machine intelligence, 24(5):603–619, 2002.
  • [5] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele.

    The cityscapes dataset for semantic urban scene understanding.

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3213–3223, 2016.
  • [6] J. Dai, K. He, Y. Li, S. Ren, and J. Sun. Instance-sensitive fully convolutional networks. In European Conference on Computer Vision, pages 534–549. Springer, 2016.
  • [7] B. De Brabandere, D. Neven, and L. Van Gool. Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551, 2017.
  • [8] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.
  • [9] A. Fathi, Z. Wojna, V. Rathod, P. Wang, H. O. Song, S. Guadarrama, and K. P. Murphy. Semantic instance segmentation via deep metric learning. arXiv preprint arXiv:1703.10277, 2017.
  • [10] M. V. Giuffrida, M. Minervini, and S. A. Tsaftaris. Learning to count leaves in rosette plants. 2016.
  • [11] B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik. Simultaneous detection and segmentation. In European Conference on Computer Vision, pages 297–312. Springer, 2014.
  • [12] K. He, G. Gkioxari, P. Dollar, and R. Girshick. Mask r-cnn. In The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  • [13] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [14] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  • [15] S. Kong and C. Fowlkes. Recurrent pixel embedding for instance grouping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9018–9028, 2018.
  • [16] V. Kulikov, V. Yurchenko, and V. Lempitsky. Instance segmentation by deep coloring. arXiv preprint arXiv:1807.10007, 2018.
  • [17] Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei. Fully convolutional instance-aware semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  • [18] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
  • [19] R. Liu, J. Lehman, P. Molino, F. P. Such, E. Frank, A. Sergeev, and J. Yosinski. An intriguing failing of convolutional neural networks and the coordconv solution. arXiv preprint arXiv:1807.03247, 2018.
  • [20] V. Ljosa, K. L. Sokolnicki, and A. E. Carpenter. Annotated high-throughput microscopy image sets for validation. Nat Methods, 9(7):637, 2012.
  • [21] D. Novotny, S. Albanie, D. Larlus, and A. Vedaldi. Semi-convolutional operators for instance segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 86–102, 2018.
  • [22] J.-M. Pape and C. Klukas. 3-d histogram-based segmentation and leaf detection for rosette plants. In ECCV Workshops (4), pages 61–74, 2014.
  • [23] M. Ren and R. S. Zemel. End-to-end instance segmentation with recurrent attention. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  • [24] B. Romera-Paredes and P. H. S. Torr. Recurrent instance segmentation. In European Conference on Computer Vision, pages 312–329. Springer, 2016.
  • [25] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 234–241. Springer, 2015.
  • [26] H. Scharr, M. Minervini, A. Fischbach, and S. A. Tsaftaris. Annotated image datasets of rosette plants. In European Conference on Computer Vision. Zürich, Suisse, pages 6–12, 2014.
  • [27] H. Scharr, M. Minervini, A. P. French, C. Klukas, D. M. Kramer, X. Liu, I. Luengo, J.-M. Pape, G. Polder, D. Vukadinovic, et al. Leaf segmentation in plant phenotyping: a collation study. Machine vision and applications, 27(4):585–606, 2016.
  • [28] C. Wählby, T. Riklin-Raviv, V. Ljosa, A. L. Conery, P. Golland, F. M. Ausubel, and A. E. Carpenter. Resolving clustered worms via probabilistic shape models. In 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pages 552–555. IEEE, 2010.
  • [29] X. Yin, X. Liu, J. Chen, and D. M. Kramer. Multi-leaf tracking from fluorescence plant videos. In Image Processing (ICIP), 2014 IEEE International Conference on, pages 408–412. IEEE, 2014.
  • [30] V. Yurchenko and V. Lempitsky. Parsing images of overlapping organisms with deep singling-out networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6280–6288, 2017.
  • [31] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene parsing network. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.