The long-term goal of our work has been the automatic segmentation of cells found in different modalities of microscope images so that it can ultimately help in the quantification of biological studies (see e.g. [1, 2, 3, 4]). The task remains a challenge particularly when cells are densely packed in clusters exhibiting a range of signals and when training with a small number of weak annotations (see Fig.1). Separation of cluttered cells is especially difficult when shared edges have low contrast and are similar to cell interiors. Weak annotations, when incomplete and inaccurate, can harm the learning process as the optimizer might be confused when deciding if annotated and non-annotated regions with same patterns must be segmented or not. Our proposed solutions aim to resolve these problems with advances in loss formulation, class imbalance handling, multiclass classification, and data augmentation.
We propose a new deep learning multiclass segmentation method which classifies pixels into four distinct classes – background, cell, touching, and gap – by minimizing a loss function that penalizes both cross entropy and Youden’sstatistic. Pixels and voxels classified as touching and gap become either cell or background in a post-processing step, producing a final segmentation containing a single mask for each individual cell in the image.
We build upon our recent work [1, 2] to further improve multiclass cell segmentation. The introduction of a fourth class, named gap, and of a new loss lead to better segmentations where small regions separating nearby cells are now correctly classified as background regions. Slim cell protrusions are also correctly classified thanks to the balancing offered by our proposed loss.
Previous work. Recent modeling of new loss functions for segmentation [1, 5, 6, 7] incorporates a differentiable surrogate over a known performance measurement. Unfortunately these are not sufficient to cope with high data imbalance typical when segmenting biomedical images. In  the authors review regional losses and propose a contour based loss as an alternative to combat imbalance. The work of Brosch et al.  bears similarities to ours as they model their loss as a linear combination of sensitivity and specificity measures. But they use mean square errors instead and recommend heavily weighting specificity, 95%, in detriment to sensitivity, 5%, which we believe goes against the importance of equally balancing both measures. Sudre et al.  proposed using the generalized dice overlap introduced in  as a loss function to avert imbalance in segmentation. Imbalance is achieved by explicitly weighting classes as in  but now inversely proportional to the square number of pixels. From our experience, this works to isolate cell clusters but it is not enough to isolate cells in a cluster.
Pixel weights have been adopted as a strategy to balance data [12, 9] including shape aware weights . While advantageous they are not sufficient to fully separate packed cells or resolve fine details. Equibatches  is yet another balancing strategy for segmentation. It forces training examples from all classes to be present during every training iteration. Multiclass deep learning training for cell segmentation is adopted in  for 2D images and in  for 3D confocal stacks.
Notation. The goal of panoptic segmentation is to assign to each pixel or voxel of a single channel image a semantic label, and an instance label when belongs to a countable category . For learning a segmentation we are given a training set where for every image we know its ground truth segmentation . In general, we have , a mapping where for in the background and is a unique label for each object in the image. Our task is cast as a semantic segmentation problem by modifying the approach proposed in  to transform the instance annotation into a semantic ground truth , generalizing to high dimensions by using a neighborhood . Let be the one hot representation for the -classes in the semantic mapping , and the number of elements of class . We call the bottom hat transform over using structuring element , a hyper-sphere whose size is data dependent. The output of our trained network is a probability map such that . A post-processing similar to the one proposed in  is then applied to build a panoptic segmentation from .
Gap class. We have previously shown that using three semantic classes, namely image background, cell interior, and touching region, increases the network discriminative power when segmenting cluttered cells [2, 1]. However, misclassified background regions persisted in some cases, see Fig.1. We speculate this is due to losing background information when merging nearby cells in the U-Net contracting path, information which is not fully recovered in the up–sampling path. By introducing a new training class representing the gap between nearby cells, the network can now classify the regions separating nearby cells as background. We name this new class, not surprisingly, gap – white pixels shown in , Fig.1. These regions are obtained using the bottom hat transform. Given an instance annotation , a semantic ground truth of our four classes is defined as
If is in the background and lies in the bottom hat transform, then is a gap pixel/voxel, . We use in our experiments.
The J statistic was formulated by statistician William J. Youden to improve rating the performance of diagnostic tests of diseases . A high index for a test would imply that this test could predict with high probability if an individual was diseased or not. An ideal test would be able to eliminate false negatives (sick, at risk individuals falsely reported as healthy) and false positives (healthy individuals falsely reported as sick) thus always reporting with certainty diseased (true positive) and healthy (true negative) individuals. Youden modeled as the average success of a test on reporting the proportions of diseased and healthy individuals. The effectiviness of this index in binary classification is due to the equal importance it gives to correctly classifying the subjects belonging and not belonging to a class, giving equal weight to true positive (sensitivity) and true negative (specificity) rates. is thus a suitable measure for predicting segmentation with our imbalanced classes: we typically have , i.e. touching and gap classes are comprised of a few pixels/voxels when compared to background and cell classes. We can write . We thus have , and we aim to penalize negative correlations  and obtain a high after training.
We borrow ideas from  to compare to other popular measures used in loss surrogates [5, 6, 7]. Note that the most common surrogate for Accuracy is the Cross Entropy loss . Classifier C1 is a random prediction where each class has the same imbalance ratio
as in the ground truth. C3 is a random prediction with uniform distribution for all classes,. As can be seen in Fig.2 the performance of under different imbalance ratios is similar to the Matthews Correlation Coefficient, MCC , which is well-known to perform well under highly imbalanced data 
. This is not the case for the Jaccard index , F1 (Dice) score, Tversky index, and Accuracy, as they all report different values for different imbalance ratio. should thus be favored when training with imbalanced classes.
To compare the correlation between and Matthews Correlation Coefficient, we used the settings for classifier C3 from . We then measured the linear correlation between MCC and for imbalance ratios by using Pearson’s Correlation Coefficient. Fig.3 shows an almost perfect linear correlation for all ratios. This supports our claim that Youden’s index is a robust measure for imbalanced binary classification problems.
Assuming a binary segmentation problem, we then define a binary surrogate for as
with and soft definitions, respectively, for TPR and TNR, and a weighting coefficient. From Eq. 1, we define a multiclass surrogate for as the sum of pairwise binary surrogates
where is a pairwise class weight. and are, respectively, soft definitions for TPR and TNR, where is considered to be the positive class and the negative one. These definitions are similar to the ones used for Soft Dice  and Tversky  loss functions,
where . Inserting these values into Eq.2 we obtain
with . We use Eq.3 as a regularizer to cross entropy loss, , obtaining our training loss . Of all solutions with equal values of cross entropy, we favor the one that has the highest separation between classes. Note that, contrary to [12, 2], explicit class weights per pixel are not used.
Simulation. We simulate the optimization towards the ground truth to show how the regularization helps cross entropy, CE, reach the optimum result. The target segmentation consists of two touching square cells separated by a one pixel wide notch covering half of a cell side, see Fig.4. Initially, when the solution is far away (), CE drives the optimization (large gradients) until it shrinkwraps both cells, at which point () its gradient no longer contributes to advance the segmentation. Around that point, takes over and its gradient is now driving the optimization and it will do so until the optimum is reached. We slowly increase pixel probabilities to its optimal value until we reach ground truth so to mimic real updates. Plots in Fig.4 show how the combination of cross entropy and Youden’s statistic work in tandem to achieve the desired result. None would solve the segmentation if considered separately as the vanishing of their gradients would stall the optimization.
Loss visualization. We use the approach proposed by Li et al.  to help us visualize how our loss compares to others – , weighted cross entropy with class balance, and , triplex weight map  – around a known optimal point in the optimization space. As shown in Fig.5, our loss has a cone–like shape whose gradients favor a fast descent to the optimum, contrary to the other losses and which have near zero gradients all over potentially preventing the optimization to reach the optimum – gradient descent methods are extremely slow to converge in these cases. Although this analysis is based on a visualization that employs dimensionality reduction, our evidences from other experiments suggest this behavior spans the entire optimization space.
Gap assignment. We obtain a semantic segmentation from the output probability map using the Maximum A Posteriori (MAP) decision rule, . A gap pixel , , can be directly classified as a true background pixel or, in case of dubious probabilities, , we assign the second most likely class to it. This is equivalent of applying MAP on the first three classes of the output map, . An instance segmentation is achieved then by a sequence of labeling operations on each region in the semantic segmentation map .
To facilitate comparing our loss to losses weighted cross entropy with class balance (BWM), weighted cross entropy with triplex weight map (W) , and cross entropy with dice regularization (DSC)  we use all with the same U-Net 
, with initial weights following a normal distribution, and all equally initialized by fixing all random seeds. For 3D volumes we used 3D convolutions but maintained the same architecture topology as in 2D . A Watershed post-processing (WT) is also applied to those results showing weak touching separation (see  for details). The influence of the gap class over training was also analyzed by comparing and over a DIC Hela dataset , a 3D meristem confocal stack (see Fig.7), and T-Cells from . Zero shot segmentation of Hela cells  was obtained by using a model trained over the T-Cells data. We used the optimizer Adam  with initial learning rate of . Data augmentation included random rotation, mirroring, gamma correction, touching contrast modulation , and warping. Precision (P05) and F1 score (RQ) were used for cell detection rates. Segmentation Quality (SQ) and Panoptic Quality were, respectively, used for measuring contour adequacy and instance segmentation quality .
Instance segmentation performance: Table 1 shows a performance comparison of networks trained with different loss functions. Watershed (WT) post-processing effectively increased the performance of BWM, DSC and W when compared with Maximum a Posteriori (MAP). However, the WT method depends on carefully choosing two parameters. Networks trained with the proposed loss are able to improve instance detection rates using only the parameter-free MAP post-processing. This is due to improvements in the probabilities of gap and touching regions leading to better cell separation. Because we have a weakly annotated ground truth (see annotation in Fig. 1), we found SQ values are not always reliable.
We use to assess the gap class influence. Table 2 shows results obtained over each dataset. The best Panoptic Quality, PQ, for all cases was obtained with four classes. An improvement on the Segmentation Quality is observed for the first two datasets, as a direct consequence of using a fourth class (see first row in Fig.6). However, as stated before, weak annotations in the case of T-Cells and meristem datasets tainted SQ values: in reality, a visual inspection shows offers a better contour adequacy. The second row of Fig.6 shows examples of and segmentation and probability maps for T-Cells and meristem volume. Results showed in Figure 6B, E and F were obtained with a network trained over T-Cells images (zero-shot instance segmentation).
We proposed loss, a Youden’s statistic regularization to the bare cross entropy loss. We build upon our previous work and introduced a new pixel/voxel class we call gap which improves classification and contour adequacy. The approach improved 2D and 3D instance segmentation of highly cluttered cells even after training with weak annotations. Landscape analysis and performance evaluation with different loss functions suggest our new loss is superior to segment cluttered cells. In future work we plan to optimize the proposed pairwise loss to be linear in the number of classes and extensively compare our methods using benchmarks.
-  Fidel A Guerrero-Peña, Pedro D Marrero Fernandez, Tsang Ing Ren, and Alexandre Cunha, “A weakly supervised method for instance segmentation of biological cells,” in Medical Image Learning with Less Labels and Imperfect Data, MICCAI Workshop, pp. 216–224. Springer, 2019.
-  Fidel A Guerrero-Pena, Pedro D Marrero Fernandez, Tsang Ing Ren, Mary Yui, Ellen Rothenberg, and Alexandre Cunha, “Multiclass Weighted Loss for Instance Segmentation of Cluttered Cells,” in 2018 IEEE ICIP. IEEE, 2018, pp. 2451–2455.
-  Alexandre Cunha, Paul T Tarr, Adrienne HK Roeder, Alphan Altinok, Eric Mjolsness, and Elliot M Meyerowitz, “Computational analysis of live cell images of the Arabidopsis thaliana plant,” in Methods in Cell Biology, vol. 110, pp. 285–323. Elsevier, 2012.
-  Alexandre Cunha, Adrienne HK Roeder, and Elliot M Meyerowitz, “Segmenting the sepal and shoot apical meristem of Arabidopsis thaliana,” in IEEE EMBS International Conference, 2010, pp. 5338–5342.
Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi,
“V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation,”in 2016 Fourth International Conference on 3D Vision (3DV). IEEE, 2016, pp. 565–571.
Seyed Sadegh Mohseni Salehi, Deniz Erdogmus, and Ali Gholipour,
“Tversky Loss Function for Image Segmentation Using 3D Fully
Convolutional Deep Networks,”
International Workshop on Machine Learning in Medical Imaging. Springer, 2017, pp. 379–387.
-  Maxim Berman, Amal Rannen Triki, and Matthew B Blaschko, “The Lovász-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-Over-Union Measure in Neural Networks,” in Proceedings of IEEE CVPR, 2018, pp. 4413–4421.
-  Hoel Kervadec, Jihene Bouchtiba, Christian Desrosiers, Eric Granger, Jose Dolz, and Ismail Ben Ayed, “Boundary Loss for Highly Unbalanced Segmentation,” in Proceedings of The 2nd International Conference on Medical Imaging with Deep Learning, 2019, vol. 102, pp. 285–296.
-  Tom Brosch, Youngjin Yoo, Lisa YW Tang, David KB Li, Anthony Traboulsee, and Roger Tam, “Deep convolutional encoder networks for multiple sclerosis lesion segmentation,” in MICCAI 2015. Springer, 2015, pp. 3–11.
-  Carole H Sudre, Wenqi Li, Tom Vercauteren, Sebastien Ourselin, and M Jorge Cardoso, “Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 240–248. Springer, 2017.
-  William R Crum, Oscar Camara, and Derek LG Hill, “Generalized overlap measures for evaluation and validation in medical image analysis,” IEEE Transactions on Medical Imaging, vol. 25, no. 11, pp. 1451–1461, 2006.
-  Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in 2015 MICCAI. Springer, 2015, pp. 234–241.
-  Dennis Eschweiler, Thiago V Spina, Rohan C Choudhury, Elliot Meyerowitz, Alexandre Cunha, and Johannes Stegmaier, “CNN-based preprocessing to optimize watershed-based cell segmentation in 3D confocal microscopy images,” in 2019 IEEE ISBI. IEEE, 2019, pp. 223–227.
-  Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollár, “Panoptic Segmentation,” in Proceedings of IEEE CVPR, 2019, pp. 9404–9413.
-  William J Youden, “Index for Rating Diagnostic Tests,” Cancer, vol. 3, no. 1, pp. 32–35, 1950.
“Improved Confidence Intervals for the Youden Index,”PloS One, vol. 10, no. 7, pp. e0127272, 2015.
-  Sabri Boughorbel, Fethi Jarray, and Mohammed El-Anbari, “Optimal Classifier for Imbalanced Data Using Matthews Correlation Coefficient Metric,” PloS One, vol. 12, no. 6, pp. e0177678, 2017.
-  Ian Goodfellow, Yoshua Bengio, and Aaron Courville, “Deep Learning,” MIT Press, 2016.
-  Brian W Matthews, “Comparison of the Predicted and Observed Secondary Structure of T4 Phage Lysozyme,” Biochimica et Biophysica Acta (BBA)-Protein Structure, vol. 405, no. 2, pp. 442–451, 1975.
-  Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein, “Visualizing the Loss Landscape of Neural Nets,” in Advances in Neural Information Processing Systems, 2018, pp. 6389–6399.
-  Fabian Isensee, Jens Petersen, Andre Klein, David Zimmerer, Paul F Jaeger, Simon Kohl, Jakob Wasserthal, Gregor Koehler, Tobias Norajitra, Sebastian Wirkert, et al., “nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation,” in Bildverarbeitung für die Medizin, pp. 22–22. Springer, 2019.
Xavier Glorot and Yoshua Bengio,
“Understanding the Difficulty of Training Deep Feedforward Neural
Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2010, pp. 249–256.
-  “ISBI Cell Tracking Challenge: http://celltrackingchallenge.net/2d-datasets/,” 2019, Accessed on 10.07.2019.
-  Vebjorn Ljosa, Katherine L Sokolnicki, and Anne E Carpenter, “Annotated High-throughput Microscopy Image Sets for Validation,” Nature Methods, vol. 9, no. 7, pp. 637–637, 2012.
-  Diederik P Kingma and Jimmy Ba, “Adam: A Method for Stochastic Optimization,” arXiv:1412.6980, 2014.
-  Benjamin Schmid, Johannes Schindelin, Albert Cardona, Mark Longair, and Martin Heisenberg, “A high-level 3D visualization API for Java and ImageJ,” BMC Bioinformatics, vol. 11, no. 1, pp. 274, 2010.