1 Introduction
Connectomics researchers study structures of nervous systems to understand their function [1]. Electron microscopy (EM) is the only modality capable of imaging substantial tissue volumes at sufficient resolution and has been used for the reconstruction of neural circuitry [2, 3, 4]. The high resolution leads to image data sets at enormous scale, for which manual analysis is extremely laborious and can take decades to complete [5]. Therefore, reliable automatic connectome reconstruction from EM images, and as the first step, automatic segmentation of neuronal structures is crucial. However, due to the anisotropic nature, deformation, complex cellular structures and semantic ambiguity of the image data, automatic segmentation still remains challenging after years of active research.
Similar to the boundary detection/region segmentation pipeline for natural image segmentation [6, 7, 8, 9], most recent EM image segmentation methods use a membrane detection/cell segmentation pipeline. First, a membrane detector generates pixelwise confidence maps of membrane predictions using local image cues [10, 11, 12]. Next, regionbased methods are applied to transforming the membrane confidence maps into cell segments. It has been shown that regionbased methods are necessary for improving the segmentation accuracy from membrane detections for EM images [13]. A common approach to regionbased segmentation is to transform a membrane confidence map into oversegmenting superpixels and use them as “building blocks” for final segmentation. To correctly combine superpixels, greedy region agglomeration based on certain boundary saliency has been shown to work [14]. Meanwhile, structures, such as loopy graphs [15, 16] or trees [17, 18, 19], are more often imposed to represent the region merging hierarchy and help transform the superpixel combination search into graph labeling problems. To this end, local [17, 16] or structured [18, 19] learning based methods are developed.
Most current regionbased segmentation methods use a scoring function to determine how likely two adjacent regions should be combined. Such scoring functions are usually learned in a supervised manner that demands considerable amount of highquality ground truth data. Obtaining such ground truth data, however, involves manual labeling of image pixels and is very labor intensive, especially given the large scale and complex structures of EM images. To alleviate this demand, Parag et al. recently propose an active learning framework
[20, 21]that starts with small sets of labeled samples and constantly measures the disagreement between a supervised classifier and a semisupervised label propagation algorithm on unlabeled samples. Only the most disagreed samples are pushed to users for interactive labeling. The authors demonstrate that by using
to of all labeled samples, the method can perform similar to the underlying fully supervised method with full training set. One disadvantage of this framework is that it does not directly explore the unsupervised information while searching for the optimal classification function. Also, retraining is required for the supervised algorithm at each iteration, which can be time consuming especially when more iterations with fewer samples per iteration are used to maximize the utilization of supervised information and minimize human effort. Moreover, repeated human interactions may lead to extra cost overhead in practice.In this paper, we propose a semisupervised learning framework for regionbased neuron segmentation that seeks to reduce the demand for labeled data by exploiting the underlying correlation between unsupervised data samples. Based on the merge tree structure [17, 18, 19]
, we redefine the labeling constraint and formulate it into a differentiable loss function that can be effectively used to guide the unsupervised search in the function hypothesis space. We then develop a Bayesian model that incorporates both unsupervised and supervised information for probabilistic learning. The parameters that are essential to balancing the learning can be estimated from the data automatically. Our method works with very small amount of supervised data and requires no further human interaction. We show that by using only
to of the labeled data, our method performs stably close to the stateoftheart fully supervised algorithm with the entire supervised data set (Section 4). Also, our method can be conveniently adopted to replace the supervised algorithm in the active learning framework [20, 21] and further improve the overall segmentation performance.2 Hierarchical Merge Tree
Starting with an initial superpixel segmentation of an image, a merge tree is a graphical representation of superpixel merging order. Each node corresponds to an image region . Each leaf node aligns with an initial superpixel in . A nonleaf node corresponds to an image region combined by multiple superpixels, and the root node represents the whole image as a single region. An edge between and one of its child indicates . Assuming only two regions are merged each time, we have as a full binary tree. A clique represents . In this paper, we call clique is at node . We call the cliques and at and the child cliques of , and the parent clique of and . If is a leaf node, is called a leaf clique. We call a nonleaf/root/nonroot clique if is a nonleaf/root/nonroot node. An example merge tree, as shown in Fig. (c)c, represents the merging of superpixels in Fig. (a)a. The red box in Fig. (c)c shows a nonleaf clique as the child clique of . A common approach to building a merge tree is to greedily merge regions based on certain boundary saliency measurement in an iterative fashion [17, 18, 19].
Given the merge tree, the problem of finding a final segmentation is equivalent to finding a complete label assignment for every node being a final segment () or not (). Let be a query function that returns the index of the parent node of . The th () ancestor of is denoted as with being the depth of in the tree, and . For every leaftoroot path, we enforce the region consistency constraint that requires for any leaf node . As an example shown in Fig. (c)c, the red nodes (, , and ) are labeled and correspond to the final segmentation in Fig. (b)b. The rest black nodes are labeled . Supervised algorithms are proposed to learn scoring functions in a local [17, 9] or a structured [18, 19] fashion, followed by greedy [17] or global [18, 9, 19] inference techniques for finding the optimal label assignment under the constraint. We refer to the local learning and greedy search inference framework in [17] as the hierarchical merge tree (HMT) method and follow its settings in the rest of this paper, as it has been shown to achieve stateoftheart results in the public challenges [13, 22].
A binary label is used to denote whether the region merging at clique occurs (“merge”, ) or not (“split”, ). For a leaf clique, . At training time, is generated by comparing both the “merge” and “split” cases for nonleaf cliques against the ground truth segmentation under certain error metric (e.g. adapted Rand error [13]). The one that causes the lower error is adopted. A binary classification function called the boundary classifier is trained with , where
is a collection of feature vectors. Shape and image appearance features are commonly used.
At testing time, each nonleaf clique is assigned a likelihood score by the classifier. A potential for each node is defined as
(1) 
The greedy inference algorithm iteratively assigns to an unlabeled node with the highest potential and to its ancestor and descendant nodes until every node in the merge tree receives a label. The nodes with forms a final segmentation.
3 SSHMT: Semisupervised Hierarchical Merge Tree
The performance of HMT largely depends on accurate boundary predictions given fixed initial superpixels and tree structures. In this section, we propose a semisupervised learning based HMT framework, named SSHMT, to learn accurate boundary classifiers with limited supervised data.
3.1 Merge consistency constraint
Following the HMT notation (Section 2), we first define the merge consistency constraint for nonroot cliques:
(2) 
Clearly, a set of consistent node labeling can be transformed to a consistent by assigning to the cliques at the nodes with and their descendant cliques and to the rest. A consistent can be transformed to by assigning to the nodes in and to the rest, vice versa.
Define a clique path of length that starts at as an ordered set . We then have
Theorem 3.1
Any consistent label sequence for under the merge consistency constraint is monotonically nonincreasing.
Proof
Assume there exists a label sequence subject to the merge consistency constraint that is not monotonically nonincreasing. By definition, there must exist , s.t. . Let , then , and thus . This violates the merge consistency constraint (2), which contradicts the initial assumption that is subject to the merge consistency constraint. Therefore, the initial assumption must be false, and all label sequences that are subject to the merge consistency constraint must be monotonically nonincreasing.∎
Intuitively, Theorem 3.1 states that while moving up in a merge tree, once a split occurs, no merge shall occur again among the ancestor cliques in that path. As an example, a consistent label sequence for the clique path in Fig. (c)c can only be , , , or . Any other label sequence, such as , is not consistent. In contrast to the region consistency constraint, the merge consistency constraint is a local constraint that holds for the entire leaftoroot clique paths as well as any of their subparts. This allows certain computations to be decomposed as shown later in Section 4.
Let be a predicate that denotes whether . We can express the nonincreasing monotonicity of any consistent label sequence for in disjunctive normal form (DNF) as
(3) 
which always holds by Theorem 3.1. We approximate with realvalued variables and operators by replacing with , with , and with realvalued . A negation is replaced by ; conjunctions are replaced by multiplications; disjunctions are transformed into negations of conjunctions using De Morgan’s laws and then replaced. The realvalued DNF approximation is
(4) 
which is valued for any consistent label assignments. Observing is exactly a binary boundary classifier in HMT, we further relax it to be a classification function that predicts . The choice of can be arbitrary as long as it is (piecewise) differentiable (Section 3.2
). In this paper, we use a logistic sigmoid function with a linear discriminant
(5) 
which is parameterized by .
3.2 Bayesian semisupervised learning
To learn the boundary classification function , we use both supervised and unsupervised data. Supervised data are the clique samples with labels that are generated from ground truth segmentations. Unsupervised samples are those we do not have labels for. They can be from the images that we do not have the ground truth for or wish to segment. We use to denote the collection of supervised sample feature vectors and for their true labels. is the collection of all supervised and unsupervised samples.
Let be the predictions about the supervised samples in , and be the DNF values (4) for all paths from . We are now ready to build a probabilistic model that includes a regularization prior, an unsupervised likelihood, and a supervised likelihood.
The prior is an i.i.d. Gaussian that regularizes to prevent overfitting. The unsupervised likelihood is an i.i.d. Gaussian on the differences between each element of and . It requires the predictions of to conform the merge consistency constraint for every path. Maximizing the unsupervised likelihood allows us to narrow down the potential solutions to a subset in the classifier hypothesis space without label information by exploring the sample feature representation commonality. The supervised likelihood is an i.i.d. Gaussian on the prediction errors for supervised samples to enforce accurate predictions. It helps avoid consistent but trivial solutions of , such as the ones that always predict or
, and guides the search towards the correct solution. The standard deviation parameters
and control the contributions of the three terms. They can be preset to reflect our prior knowledge about the model distributions, tuned using a holdout set, or estimated from data.By applying Bayes’ rule, we have the posterior distribution of as
(6) 
where and are the number of elements in and , respectively; is a dimensional vector of ones.
3.2.1 Inference
We infer the model parameters , , and
using maximum a posteriori estimation. We effectively minimize the negative logarithm of the posterior
(7) 
Observe that the DNF formula in (4) is differentiable. With any (piecewise) differentiable choice of , we can minimize (7) using (sub) gradient descent. The gradient of (7) with respect to the classifier parameter is
(8) 
Since we choose to be a logistic sigmoid function with a linear discriminant (5), the th () row of is
(9) 
where is the th element in .
Define , , we write (4) as as the th () element of . Then the th row of is
(10) 
where can be computed using (9).
We also alternately estimate and along with . Setting and , we update and using the closedform solutions
(11)  
(12) 
4 Results
We validate the proposed algorithm for 2D and 3D segmentation of neurons in three EM image data sets. For each data set, we apply SSHMT to the same segmentation tasks using different amounts of randomly selected subsets of ground truth data as the supervised sets.
4.1 Data sets
4.1.1 Mouse neuropil data set
[23] consists of 2D SBFSEM images of size at nm/pixel resolution. A random selection of images are considered as the whole supervised set, and the rest images are used for testing. We test our algorithm using (), (), (), (), (), and half () ground truth image(s) as the supervised data. We use all the images as the unsupervised data for training. We target at 2D segmentation for this data set.
4.1.2 Mouse cortex data set
[22] is the original training set for the ISBI SNEMI3D Challenge [22]. It is a SSSEM image stack at nm/pixel resolution. We use the first substack as the supervised set and the second substack for testing. There are ground truth neuron segments that are larger than pixels in the supervised substack, which we consider as all the available supervised data. We test the performance of our algorithm by using (), (), (), (), (), (), and () true segments. Both the supervised and the testing substack are used for the unsupervised term. Due to the unavailability of the ground truth data, we did not experiment with the original testing image stack from the challenge. We target at 3D segmentation for this data set.
4.1.3 Drosophila melanogaster larval neuropil data set
[24] is a FIBSEM image volume at nm/pixel resolution. We divide the whole volume evenly into eight subvolumes and do eightfold cross validation using one subvolume each time as the supervised set and the whole volume as the testing data. Each subvolume has from to ground truth neuron segments that are larger than pixels. Following the setting in the mouse cortex data set experiment, we use subsets of , , , , , and of all true neuron segments from the respective supervised subvolume in each fold of the cross validation as the supervised data to generate boundary classification labels. We use the entire volume to generate unsupervised samples. We target at 3D segmentation for this data set.
4.2 Experiments
We use fully trained Cascaded Hierarchical Models [12] to generate membrane detection confidence maps and keep them fixed for the HMT and SSHMT experiments on each data set, respectively. To generate initial superpixels, we use the watershed algorithm [25] over the membrane confidence maps. For the boundary classification, we use features including shape information (region size, perimeter, bounding box, boundary length, etc.) and image intensity statistics (mean, standard deviation, minimum, maximum, etc.) of region interior and boundary pixels from both the original EM images and membrane detection confidence maps.
We use the adapted Rand error metric [13] to generate boundary classification labels using whole ground truth images (Section 2) for the 2D mouse neuropil data set. For the 3D mouse cortex and Drosophila melanogaster larval neuropil data sets, we determine the labels using individual ground truth segments instead. We use this setting in order to match the actual process of analyzing EM images by neuroscientists. Details about label generation using individual ground truth segments are provided in Appendix 0.A.
We can see in (4) and (10) that computing and its gradient involves multiplications of floating point numbers, which can cause underflow problems for leaftoroot clique paths in a merge tree of even moderate height. To avoid this problem, we exploit the local property of the merge consistency constraint and compute for every path subpart of small length . In this paper, we use for all experiments. For inference, we initialize by running gradient descent on (7) with only the supervised term and the regularizer before adding the unsupervised term for the whole optimization. We update and in between every gradient descent steps on .
We compare SSHMT with the fully supervised HMT [17] as the baseline method. To make the comparison fair, we use the same logistic sigmoid function as the boundary classifier for both HMT and SSHMT. The fully supervised training uses the same Bayesian framework only without the unsupervised term in (7) and alternately estimates
to balance the regularization term and the supervised term. All the hyperparameters are kept identical for HMT and SSHMT and fixed for all experiments. We use the adapted Rand error
[13] following the public EM image segmentation challenges [13, 22]. Due to the randomness in the selection of supervised data, we repeat each experiment times, except in the cases that there are fewer possible combinations. We report the mean and standard deviation of errors for each set of repeats on the three data sets in Table 2. For the 2D mouse neuropil data set, we also threshold the membrane detection confidence maps at the optimal level, and the adapted Rand error is . Since the membrane detection confidence maps are generated in 2D, we do not measure the thresholding errors of the other 3D data sets. In addition, we report the results from using the globally optimal tree inference [9] in the supplementary materials for comparison.



Examples of 2D segmentation testing results from the mouse neuropil data set using fully supervised HMT and SSHMT with () ground truth image as supervised data are shown in Fig. 2. Examples of 3D individual neuron segmentation testing results from the Drosophila melanogaster larval neuropil data set using fully supervised HMT and SSHMT with () true neuron segments as supervised data are shown in Fig. 3.
(a) Original  (b) HMT  (c) SSHMT  (d) Ground truth 
(a) HMT  (b) SSHMT  (c) Ground truth 
From Table 2, we can see that with abundant supervised data, the performance of SSHMT is similar to HMT in terms of segmentation accuracy, and both of them significantly improve from optimally thresholding (Table (a)a). When the amount of supervised data becomes smaller, SSHMT significantly outperforms the fully supervised method with the accuracy close to the HMT results using the full supervised sets. Moreover, the introduction of the unsupervised term stabilizes the learning of the classification function and results in much more consistent segmentation performance, even when only very limited ( to ) label data are available. Increases in errors and large variations are observed in the SSHMT results when the supervised data become too scarce. This is because the few supervised samples are incapable of providing sufficient guidance to balance the unsupervised term, and the boundary classifiers are biased to give trivial predictions.
Fig. 2 shows that SSHMT is capable of fixing both over and undersegmentation errors that occur in the HMT results. Fig. 3 also shows that SSHMT can fix oversegmentation errors and generate highly accurate neuron segmentations. Note that in our experiments, we always randomly select the supervised data subsets. For realistic uses, we expect supervised samples of better representativeness to be provided with expertise and the performance of SSHMT to be further improved.
We also conducted an experiment with the mouse neuropil data set in which we use only ground truth image to train the membrane detector, HMT, and SSHMT to test a fully semisupervised EM segmentation pipeline. We repeat times for every ground truth image in the supervised set. The optimal thresholding gives adapted Rand error . The error of the HMT results is , and the error of the SSHMT results is . Despite the increase of error, which is mainly due to the fully supervised nature of the membrane detection algorithm, SSHMT again improves the region accuracy from optimal thresholding and has a clear advantage over HMT.
We have opensourced our code at https://github.com/tingliu/glia. It takes approximately 80 seconds for our SSHMT implementation to train and test on the whole mouse neuropil data set using GHz Intel Xeon CPUs and about MB memory.
5 Conclusion
In this paper, we proposed a semisupervised method that can consistently learn boundary classifiers with very limited amount of supervised data for regionbased image segmentation. This dramatically reduces the high demands for ground truth data by fully supervised algorithms. We applied our method to neuron segmentation in EM images from three data sets and demonstrated that by using only a small amount of ground truth data, our method performed close to the stateoftheart fully supervised method with full labeled data sets. In our future work, we will explore the integration of the proposed constraint based unsupervised loss in structural learning settings to further exploit the structured information for learning the boundary classification function. Also, we may replace the current logistic sigmoid function with more complex classifiers and combine our method with active learning frameworks to improve segmentation accuracy.
5.0.1 Acknowledgment
This work was supported by NSF IIS1149299 and NIH 1R01NS07531401. We thank the National Center for Microscopy and Imaging Research at the University of California, San Diego, for providing the mouse neuropil data set. We also thank Mehdi Sajjadi at the University of Utah for the constructive discussions.
Appendix 0.A Appendix: Generating Boundary Classification Labels Using Individual Ground Truth Segments
Assume we only have individual annotated image segments instead of entire image volumes as ground truth. Given a merge tree, we generate the besteffort ground truth classification labels for a subset of cliques as follows:

For every region represented by a tree node, compute the Jaccard indices of this region against all the annotated ground truth segments. Use the highest Jaccard index of each node as its eligible score.

Mark every node in the tree as “eligible” if its eligible score is above certain threshold ( in practice) or “ineligible” otherwise.

Iteratively select a currently “eligible” node with the highest eligible score; mark it and its ancestors and descendants as “ineligible”, until every node is “ineligible”. This procedure generates a set of selected nodes.

For every selected node, label the cliques at itself and its descendants as (“merge”) and the cliques at its ancestors as (“split”).
Eventually, the clique samples that receive merge/split labels are considered as the supervised data.
References
 [1] Sporns, O., Tononi, G., Kötter, R.: The human connectome: a structural description of the human brain. PLoS Computational Biology 1(4) (2005) e42
 [2] Famiglietti, E.V.: Synaptic organization of starburst amacrine cells in rabbit retina: analysis of serial thin sections by electron microscopy and graphic reconstruction. Journal of Comparative Neurology 309(1) (1991) 40–70
 [3] Briggman, K.L., Helmstaedter, M., Denk, W.: Wiring specificity in the directionselectivity circuit of the retina. Nature 471(7337) (2011) 183–188
 [4] Helmstaedter, M.: Cellularresolution connectomics: challenges of dense neural circuit reconstruction. Nature Methods 10(6) (2013) 501–507
 [5] Briggman, K.L., Denk, W.: Towards neural circuit reconstruction with volume electron microscopy techniques. Current Opinion in Neurobiology 16(5) (2006) 562–570
 [6] Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 33(5) (2011) 898–916

[7]
Ren, Z., Shakhnarovich, G.:
Image segmentation by cascaded region agglomeration.
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2013) 2011–2018
 [8] Arbeláez, P., PontTuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2014) 328–335
 [9] Liu, T., Seyedhosseini, M., Tasdizen, T.: Image segmentation using hierarchical merge tree. Image Processing, IEEE Transactions on 25(10) (2016) 4596–4607
 [10] Sommer, C., Straehle, C., Koethe, U., Hamprecht, F.A.: ilastik: Interactive learning and segmentation toolkit. In: Biomedical Imaging: From Nano to Macro, 2011 IEEE International Symposium on, IEEE (2011) 230–233

[11]
Ciresan, D., Giusti, A., Gambardella, L.M., Schmidhuber, J.:
Deep neural networks segment neuronal membranes in electron microscopy images.
In: Advances in Neural Information Processing Systems 25. (2012) 2852–2860  [12] Seyedhosseini, M., Sajjadi, M., Tasdizen, T.: Image segmentation with cascaded hierarchical models and logistic disjunctive normal networks. In: Proceedings of the IEEE International Conference on Computer Vision. (2013) 2168–2175
 [13] ArgandaCarreras, I., Turaga, S.C., Berger, D.R., Cireşan, D., Giusti, A., Gambardella, L.M., Schmidhuber, J., Laptev, D., Dwivedi, S., Buhmann, J.M., et al.: Crowdsourcing the creation of image segmentation algorithms for connectomics. Frontiers in Neuroanatomy 9 (2015)
 [14] NunezIglesias, J., Kennedy, R., Parag, T., Shi, J., Chklovskii, D.B.: Machine learning of hierarchical clustering to segment 2D and 3D images. PLoS ONE 8(8) (2013) e71715
 [15] Kaynig, V., VazquezReina, A., KnowlesBarley, S., Roberts, M., Jones, T.R., Kasthuri, N., Miller, E., Lichtman, J., Pfister, H.: Largescale automatic reconstruction of neuronal processes from electron microscopy images. Medical Image Analysis 22(1) (2015) 77–88
 [16] Krasowski, N., Beier, T., Knott, G., Koethe, U., Hamprecht, F., Kreshuk, A.: Improving 3D EM data segmentation by joint optimization over boundary evidence and biological priors. In: Biomedical Imaging (ISBI), 2015 IEEE 12th International Symposium on, IEEE (2015) 536–539
 [17] Liu, T., Jones, C., Seyedhosseini, M., Tasdizen, T.: A modular hierarchical approach to 3D electron microscopy image segmentation. Journal of Neuroscience Methods 226 (2014) 88–102
 [18] Funke, J., Hamprecht, F.A., Zhang, C.: Learning to segment: Training hierarchical segmentation under a topological loss. In: Medical Image Computing and ComputerAssisted Intervention–MICCAI 2015. Springer (2015) 268–275
 [19] Uzunbas, M.G., Chen, C., Metaxas, D.: An efficient conditional random field approach for automatic and interactive neuron segmentation. Medical Image Analysis 27 (2016) 31–44
 [20] Parag, T., Plaza, S., Scheffer, L.: Small sample learning of superpixel classifiers for em segmentation. In: Medical Image Computing and ComputerAssisted Intervention–MICCAI 2014. Springer (2014) 389–397
 [21] Parag, T., Ciresan, D.C., Giusti, A.: Efficient classifier training to minimize false merges in electron microscopy segmentation. In: Proceedings of the IEEE International Conference on Computer Vision. (2015) 657–665
 [22] ArgandaCarreras, I., Seung, H.S., Vishwanathan, A., Berger, D.: 3D segmentation of neurites in EM images challenge  ISBI 2013. http://brainiac2.mit.edu/SNEMI3D/ (2013) accessed on Feburary 16, 2016.
 [23] Deerinck, T.J., Bushong, E.A., LevRam, V., Shu, X., Tsien, R.Y., Ellisman, M.H.: Enhancing serial blockface scanning electron microscopy to enable high resolution 3D nanohistology of cells and tissues. Microscopy and Microanalysis 16(S2) (2010) 1138–1139
 [24] Knott, G., Marchman, H., Wall, D., Lich, B.: Serial section scanning electron microscopy of adult brain tissue using focused ion beam milling. The Journal of Neuroscience 28(12) (2008) 2959–2964
 [25] Beucher, S., Meyer, F.: The morphological approach to segmentation: the watershed transformation. In: Mathematical Morphology in Image Processing. Volume 34. Marcel Dekker AG (1993) 433–481
 [26] Schindelin, J., ArgandaCarreras, I., Frise, E., Kaynig, V., Longair, M., Pietzsch, T., Preibisch, S., Rueden, C., Saalfeld, S., Schmid, B., et al.: Fiji: an opensource platform for biologicalimage analysis. Nature methods 9(7) (2012) 676–682