Fluorescence microscopy is a type of an optical microscopy that uses fluorescence to image 3D subcellular structures [1, 2]. Three dimensional segmentation is needed to quantify and characterize cells, nuclei or other biological structures.
Various nuclei segmentation methods have been investigated in the last few decades. Active contours [3, 4] which minimizes an energy functional to fit desired shapes has been one of the most successful methods in microscopy image analysis. Since active contours uses the image gradient to evolve a contour to the boundary of an object, this method can be sensitive to noise and highly dependent on initial contour placement. In 
an external energy term which convolves a controllable vector field kernel with an image edge map was introduced to address these problems. In 2D region-based active contours using image intensity to identify a region of interest was described. This achieves better performance on noisy image and is relatively independent of the initial curve placement. Extending this to 3D,  described 3D segmentation of a rat kidney structure. This technique was further extended to address the problem of 3D intensity inhomogeneity . However, these energy functional based methods cannot distinguish various structures. Alternatively, [9, 10] described a method known as Squassh to solve the energy minimization problem from a generalized linear model to couple image restoration and segmentation. In addition,  described multidimensional segmentation using random seeds combined with multi-resolution, multi-scale, and region-growing technique.
Convolutional neural network (CNN) has been used to address problems in segmentation and object identification . Various approaches, based on CNNs, have been used in the biomedical area . U-Net  is a 2D CNN which uses an encoder-decoder architecture with skip connections to segment cells in light microscopy images. In  a multi-input multi-output CNN for cell segmentation in fluorescence microscopy images to segment various size and intensity cells was described. Since these approaches [14, 15] are 2D segmentation methods, they may fail to produce reasonable segmentation in 3D. More specifically, stacking these 2D segmentation images into 3D volume may result in misalignment in the depth direction . Also, in  a method that trained three networks from different directions in a volume and combined these three results to produce a form of 3D segmentation was described. A 3D U-Net  was introduced to identify 3D structures by extending the architecture of  to 3D. However, this approach requires manually annotated groundtruth to train the network. Generating groundtruth for 3D volumes is tedious and is generally just done on 2D slices, obtaining true 3D groundtruth volumes are impractical. One way to address this is to use synthetic ground truth data [18, 19]. A method that segments nuclei by training a 3D CNN with synthetic microscopy volumes was described in . Here, the synthetic microscopy volumes were generated by blurring and noise operations.
Generating realistic synthetic microscopy image volumes remains a challenging problem since various types of noise and biological structures with different shapes are present and need to be modeled. Recently, in 
a generative adversarial network (GAN) was described to address image-to-image translation problems using two adversarial networks, a generative network and a discriminative network. In particular, the discriminative network learns a loss function to distinguish whether the output image is real or fake whereas the generative network tries to minimize this loss function. One of the extensions of GANs is Pix2Pix which uses conditional GANs to learn the relationship between the input image and output image that can generate realistic images. One issue with Pix2Pix  is that it still requires paired training data to train the networks. In 
coupled GANs (CoGAN) for learning the joint distribution of multi-domain images without having the corresponding groundtruth images was introduced. Later, cycle-consistent adversarial networks (CycleGAN) employed a cycle consistent term in the adversarial loss function for image generation without using paired training data. More recently, a segmentation method using concatenating segmentation network to CycleGAN to learn the style of CT segmentation and MRI segmentation was described in .
In this paper, we present a 3D segmentation method to identify and segment nuclei in fluorescence microscopy volumes without the need of manual segmented groundtruth volumes. Three dimensional synthetic training data is generated using spatially constrained CycleGAN. A 3D CNN network is then trained using 3D synthetic data to segment nuclei structures. Our method is evaluated using hand segmented groundtruth volumes of real fluorescence microscopy data from a rat kidney. Our data are collected using two-photon microscopy with nuclei labeled with Hoechst 33342 staining.
2 Proposed Method
Figure 1 shows a block diagram of our method. We denote as a 3D image volume of size . Note that is a focal plane image, of size , along the -direction in a volume, where . Note also that and is the original fluorescence microscopy volume and segmented volume, respectively. In addition, let be a subvolume of , whose -coordinate is , -coordinate is , -coordinate is , where , , , , , and . For example, is a subvolume of a segmented volume, , where the subvolume is cropped between 241 slice and 272 slice in -direction, between 241 slice and 272 slice in -direction, and between 131 slice and 162 slice in -direction.
As shown in Figure 1, our proposed method consists of two steps: 3D synthetic data generation and 3D CNN segmentation. We first generate synthetic binary volumes, , and then use them with a subvolume of the original image volumes, , to train a spatially constrained CycleGAN (SpCycleGAN) and obtain a generative model denoted as model . This model is used with another set of synthetic binary volume, , to generate corresponding synthetic 3D volumes, . For 3D CNN segmentation, we can utilize these paired and to train a 3D CNN and obtain model . Finally, the 3D CNN model is used to segment nuclei in to produce .
2.1 3D Synthetic Data Generation
Three dimensional synthetic data generation consists of synthetic binary volume generation, SpCycleGAN training, and SpCycleGAN inferences. In synthetic binary volume generation, nuclei are assumed to have an ellipsoidal shape, multiple nuclei are randomly generated in different orientations and locations in a volume . The original CycleGAN and our SpCycleGAN were trained to generate a set of synthetic volumes.
The CycleGAN is trained to generate a synthetic microscopy volume. CycleGAN uses a combination of discriminative networks and generative networks to solve a minimax problem by adding cycle consistency loss to the original GAN loss function as [21, 24]:
Here, is a weight coefficient and is norm. Note that Model maps to while Model maps to . Also, distinguishes between and while distinguishes between and . is an original like microscopy volume generated by model and is generated by model that looks similar to a synthetic binary volume. Here, and are unpaired set of images. In CycleGAN inference, is generated using the model on . As previously indicated and are a paired set of images. Here, is served as a groundtruth volume corresponding to .
2.1.2 Spatially Constrained CycleGAN
Although the CycleGAN uses cycle consistency loss to constrain the similarity of the distribution of and , CycleGAN does not provide enough spatial constraints on the locations of the nuclei. CycleGAN generates realistic synthetic microscopy images but a spatial shifting on the location of the nuclei in and was observed. To create a spatial constraint on the location of the nuclei, a network is added to the CycleGAN and takes as an input to generate a binary mask, . Here, the architecture of is the same as the architecture of . Network minimizes a loss, , between and . serves as a spatial regulation term in the total loss function. The network is trained together with . The loss function of the SpCycleGAN is defined as:
where and are the weight coefficients for and , respectively. Note that first three terms are the same and already defined in Equation (2.1.1). Here, can be expressed as
2.2 3D U-Net
Figure 2 shows the architecture of our modified 3D U-Net. The filter size of each 3D convolution is
. To maintain the same size of volume during 3D convolution, a voxel padding of
is used in each convolution. A 3D batch normalization
with stride of 2 is used. In the upsampling path, feature information is retrieved using 3D transpose convolutions. Our modified 3D U-Net is one layer deeper than conventional U-Net as can be seen in Figure2. Our training loss function can be expressed as a linear combination of the Dice loss () and the binary cross-entropy loss () such that
respectively . Note that is the set of the targeted groundtruth values and is a targeted groundtruth value at voxel location. Similarly,
is a probability map of binary volumetric segmentation andis a probability map at voxel location. Lastly, is the number of entire voxels and , serve as the weight coefficient between to loss terms in Equation (3). The network takes a grayscale input volume with size of
and produces an voxelwise classified 3D volume with the same size of the input volume. To train our model, pairs of synthetic microscopy volumes, , and synthetic binary volumes, , are used.
For the inference step we first zero-padded by voxels on the boundaries. A 3D window with size of is used to segment nuclei. Since the zero padded is bigger than the 3D window, the 3D windows is slided to , , and -directions by voxels on zero-padded . Nuclei partially observed on boundaries of the 3D window may not be segmented correctly. Hence, only the central subvolume of the output of the 3D window with size of is used to generate the corresponding subvolume of with size of . This process is done until the 3D window maps an entire volume.
3 Experimental Results
We tested our proposed method on two different rat kidney data sets. These data sets contain grayscale images of size . Data-I consists of images, Data-II consist of .
Our SpCycleGAN is implemented in Pytorch using the Adam optimizer with default parameters given by CycleGAN . In addition, we used in the SpCycleGAN loss function shown in Equation (2.1.2). We trained the CycleGAN and SpCycleGAN to generate synthetic volumes for Data-I and Data-II, respectively. A synthetic binary volume for Data-I denoted as and a subvolume of original microscopy volume of Data-I denoted as were used to train model . Similarly, a synthetic binary volume for Data-II denoted as and a subvolume of original microscopy volume of Data-II denoted as were used to train model .
We generated sets of synthetic binary volumes, and where and are generated according to different size of nuclei in Data-I and Data-II, respectively. By using the model on , pairs of synthetic binary volumes, , and corresponding synthetic microscopy volumes, , of size of were obtained. Similarly, by using model on , pairs of and corresponding , of size of were obtained. Since our modified 3D U-Net architecture takes volumes of size of , we divided , , , and into adjacent non overlapping . Thus, we have pairs of synthetic binary volumes and corresponded synthetic microscopy volumes per each data to train our modified 3D U-Net. Note that these synthetic binary volumes per each data are used as groundtruth volumes to be paired with corresponding synthetic microscopy volumes. Model and are then generated.
Our modified 3D U-Net is implemented in Pytorch using the Adam optimizer  with learning rate . For the evaluation purpose, we use different settings of using 3D synthetic data generation methods (CycleGAN or SpCycleGAN), different number of pairs of synthetic training volume ( or ) among pairs of synthetic binary volume corresponding synthetic microscopy volume. Also, we use different loss functions with different settings of the and . Moreover, we also compared our modified 3D U-Net with 3D encoder-decoder architecture . Lastly, small objects which are less than voxels were removed using 3D connected components.
Figure 3 shows the synthetic images generated by our proposed method. The left column indicates original images whereas middle column shows synthetic images artificially generated from corresponding synthetic binary images provided in right column. As can be seen from Figure 3, the synthetic images reflect characteristics of the original microscopy images such as background noise, nuclei shape, orientation and intensity.
Additionally, two synthetic data generation methods between CycleGAN and SpCycleGAN from the same synthetic binary image are compared in Figure 4. Here, the synthetic binary image is overlaid on the synthetic microscopy image and labeled in red. It is observed that our spatial constraint loss reduces the location shift of nuclei between a synthetic microscopy image and its synthetic binary image. Our realistic synthetic microscopy volumes from SpCycleGAN can be used to train our modified 3D U-Net.
|Subvolume 1||Subvolume 2||Subvolume 3|
|Method [9, 10]||90.14%||9.07%||0.79%||88.26%||11.67%||0.07%||87.29%||12.61%||0.10%|
|+ CycleGAN + BCE|
|+ SpCycleGAN + BCE|
|3D U-Net + SpCycleGAN||95.07%||2.94%||1.99%||93.01%||6.27%||0.72%||94.04%||5.84%||0.11%|
|3D U-Net + SpCycleGAN||94.76%||3.00%||2.24%||93.03%||6.03%||0.95%||94.30%||5.22%||0.40%|
|3D U-Net +SpCycleGAN||95.44%||2.79%||1.76%||93.63%||5.73%||0.64%||93.90%||5.92%||0.18%|
|+ DICE and BCE|
|3D U-Net +SpCycleGAN||95.37%||2.77%||1.86%||93.63%||5.69%||0.68%||94.37%||5.27%||0.36%|
|+ DICE and BCE|
|3D U-Net +SpCycleGAN||95.56%||2.57%||1.86%||93.67%||5.65%||0.68%||94.54%||5.10%||0.36%|
|+ DICE and BCE + PP|
Accuracy, Type-I and Type-II errors for known methods and our method on subvolume, subvolume and subvolume of Data-I
Our proposed method was compared to other 3D segmentation methods including 3D active surface , 3D active surface with inhomogeneity correction , 3D Squassh [9, 10], 3D encoder-decoder architecture , 3D encoder-decoder architecture with CycleGAN. Three original 3D subvolumes of Data-I were selected to evaluate the performance of our proposed method. We denote the original volume as subvolume (), subvolume (), and subvolume (), respectively. Corresponding groundtruth of each subvolume was hand segmented. Voxx  was used to visualize the segmentation results in 3D and compared to the manually annotated volumes. In Figure 5, 3D visualizations of the hand segmented subvolume and the corresponding segmentation results for various methods were presented. As seen from the 3D visualization in Figure 5, our proposed method shows the best performance among presented methods visually compared to hand segmented groundtruth volume. In general, our proposed method captures only nuclei structure whereas other presented methods falsely detect non-nuclei structures as nuclei. Note that segmentation results in Figure 5 yields smaller segmentation mask and suffered from location shift. Our proposed method shown in Figure 5 outperforms Figure 5 since our proposed method uses spatially constrained CycleGAN and takes consideration of the Dice loss and the binary cross-entropy loss.
All segmentation results were evaluated quantitatively based on voxel accuracy, Type-I error and Type-II error metrics, using 3D hand segmented volumes. Here,, , , where , , , , are defined to be the number of true-positives (voxels segmented as nuclei correctly), true-negatives (voxels segmented as background correctly), false-positives (voxels falsely segmented as nuclei), false-negatives (voxels falsely segmented as background), and the total number of voxels in a volume, respectively.
The quantitatively evaluations for the subvolumes are shown in Table 1. Our proposed method outperforms other compared methods. The smaller Type-I error shows our proposed method successfully rejects non-nuclei structures during segmentation. Also, our proposed method has reasonably low Type-II errors compared to other segmentation methods. Moreover, in this table, we show that our proposed SpCycleGAN creates better paired synthetic volumes which reflects in segmentation accuracy. Instead of 3D encoder-decoder structure, we use 3D U-Net which leads to better results since 3D U-Net has skip connections that can preserve spatial information. In addition, the combination of two loss functions such as the Dice loss and the BCE loss turns out to be better for the segmentation task in our application. In particular, the Dice loss constrains the shape of the nuclei segmentation whereas the BCE loss regulates voxelwise binary prediction. It is observed that training with more synthetic volumes can generalize our method to achieve better segmentation accuracy. Finally, the postprocessing (PP) that eliminates small components helps to improve segmentation performance.
To make this clear, segmentation results were color coded using 3D connected component labeling and overlaid on the original volumes. The method from  cannot distinguish between nuclei and non-nuclei structures including noise. This is especially recognizable from segmentation results of Data-I in which multiple nuclei and non-nuclei structures are colored with the same color. As can be observed from Figure 6 and 6, segmentation masks are smaller than nuclei size and suffered from location shifts. Conversely, our proposed method shown in Figure 6 and 6 segments nuclei with the right shape at the correct locations.
In this paper we presented a modified 3D U-Net nuclei segmentation method using paired synthetic volumes. The training was done using synthetic volumes generated from a spatially constrained CycleGAN. The combination of the Dice loss and the binary cross-entropy loss functions are optimized during training. We compared our proposed method to various segmentation methods and with manually annotated 3D groundtruth from real data. The experimental results indicate that our method can successfully distinguish between non-nuclei and nuclei structure and capture nuclei regions well from various microscopy volumes. One drawback of our proposed segmentation method is that our method cannot separate nuclei if they are physically touching to each other. In the future, we plan to develop nuclei localization method to identify overlapping nuclei to individuals.
This work was partially supported by a George M. O’Brien Award from the National Institutes of Health under grant NIH/NIDDK P30 DK079312 and the endowment of the Charles William Harrison Distinguished Professorship at Purdue University.
Data-I was provided by Malgorzata Kamocka of Indiana University and was collected at the Indiana Center for Biological Microscopy.
Address all correspondence to Edward J. Delp, email@example.com
-  C. Vonesch, F. Aguet, J. Vonesch, and M. Unser, “The colored revolution of bioimaging,” IEEE Signal Processing Magazine, vol. 23, no. 3, pp. 20–31, May 2006.
-  K. W. Dunn, R. M. Sandoval, K. J. Kelly, P. C. Dagher, G. A. Tanner, S. J. Atkinson, R. L. Bacallao, and B. A. Molitoris, “Functional studies of the kidney of living animals using multicolor two-photon microscopy,” American Journal of Physiology-Cell Physiology, vol. 283, no. 3, pp. C905–C916, September 2002.
M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,”
International Journal of Computer Vision, vol. 1, no. 4, pp. 321–331, January 1988.
-  R. Delgado-Gonzalo, V. Uhlmann, D. Schmitter, and M. Unser, “Snakes on a plane: A perfect snap for bioimage analysis,” IEEE Signal Processing Magazine, vol. 32, no. 1, pp. 41–48, January 2015.
-  B. Li and S. T. Acton, “Active contour external force using vector field convolution for image segmentation,” IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2096–2106, August 2007.
-  T. F. Chan and L. A. Vese, “Active contours without edges,” IEEE Transactions on Image Processing, vol. 10, no. 2, pp. 266–277, February 2001.
-  K. Lorenz, P. Salama, K. Dunn, and E. Delp, “Three dimensional segmentation of fluorescence microscopy images using active surfaces,” Proceedings of the IEEE International Conference on Image Processing, pp. 1153–1157, September 2013, Melbourne, Australia.
-  S. Lee, P. Salama, K. W. Dunn, and E. J. Delp, “Segmentation of fluorescence microscopy images using three dimensional active contours with inhomogeneity correction,” Proceedings of the IEEE International Symposium on Biomedical Imaging, pp. 709–713, April 2017, Melbourne, Australia.
-  G. Paul, J. Cardinale, and I. F. Sbalzarini, “Coupling image restoration and segmentation: A generalized linear model/Bregman perspective,” International Journal of Computer Vision, vol. 104, no. 1, pp. 69–93, March 2013.
-  A. Rizk, G. Paul, P. Incardona, M. Bugarski, M. Mansouri, A. Niemann, U. Ziegler, P. Berger, and I. F. Sbalzarini, “Segmentation and quantification of subcellular structures in fluorescence microscopy images using Squassh,” Nature Protocols, vol. 9, no. 3, pp. 586–596, February 2014.
-  G. Srinivasa, M. C. Fickus, Y. Guo, A. D. Linstedt, and J. Kovacevic, “Active mask segmentation of fluorescence microscope images,” IEEE Transactions on Image Processing, vol. 18, no. 8, pp. 1817–1829, August 2009.
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440, June 2015, Boston, MA.
-  G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. van der Laak, B. van Ginneken, and C. I. Sanchez, “A survey on deep learning in medical image analysis,” arXiv preprint arXiv:1702.05747, February 2017.
-  O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” Proceedings of the Medical Image Computing and Computer-Assisted Intervention, pp. 231–241, October 2015, Munich, Germany.
-  S. E. A. Raza, L. Cheung, D. Epstein, S. Pelengaris, M. Khan, and N. Rajpoot, “MIMO-Net: A multi-input multi-output convolutional neural network for cell segmentation in fluorescence microscopy images,” Proceedings of the IEEE International Symposium on Biomedical Imaging, pp. 337–340, April 2017, Melbourne, Australia.
A. Prasoon, K. Petersen, C. Igel, F. Lauze, E. Dam, and M. Nielsen, “Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network,”Proceedings of the Medical Image Computing and Computer-Assisted Intervention, pp. 246–253, September 2013, Nagoya, Japan.
-  O. Cicek, A. Abdulkadir, S. Lienkamp, T. Brox, and O. Ronneberger, “3D U-Net: Learning dense volumetric segmentation from sparse annotation,” Proceedings of the Medical Image Computing and Computer-Assisted Intervention, pp. 424–432, October 2016, Athens, Greece.
-  X. Zhang, Y. Fu, A. Zang, L. Sigal, and G. Agam, “Learning classifiers from synthetic data using a multichannel autoencoder,” arXiv preprint arXiv:1503.03163, pp. 1–11, March 2015.
-  I. B. Barbosa, M. Cristani, B. Caputo, A. Rognhaugen, and T. Theoharis, “Looking beyond appearances: Synthetic training data for deep CNNs in re-identification,” arXiv preprint arXiv:1701.03153, pp. 1–14, January 2017.
-  D. J. Ho, C. Fu, P. Salama, K. Dunn, and E. Delp, “Nuclei segmentation of fluorescence microscopy images using three dimensional convolutional neural networks,” Proceedings of the Computer Vision for Microscopy Image Analysis workshop at Computer Vision and Pattern Recognition, pp. 834–842, July 2017, Honolulu, HI.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Proceedings of the Advances in Neural Information Processing Systems, pp. 2672–2680, December 2014, Montreal, Canada.
P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,”Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5967–5976, July 2017, Honolulu, HI.
-  M. Y. Liu and O. Tuzel, “Coupled generative adversarial networks,” Proceedings of the Advances in Neural Information Processing Systems, pp. 469–477, December 2016, Barcelona, Spain.
-  J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” arXiv preprint arXiv:1703.10593, pp. 1–16, March 2017.
-  Y. Huo, Z. Xu, S. Bao, A. Assad, R. G. Abramson, and B. A. Landman, “Adversarial synthesis learning enables segmentation without target modality ground truth,” arXiv preprint arXiv:1712.07695, pp. 1–4, December 2017.
-  S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, March 2015.
-  F. Milletari, N. Navab, and S. A. Ahmadi, “V-Net: Fully convolutional neural networks for volumetric medical image segmentation,” Proceedings of the IEEE 2016 Fourth International Conference on 3D Vision, pp. 565–571, October 2016, Stanford, CA.
-  D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, pp. 1–15, December 2014.
-  J. L. Clendenon, C. L. Phillips, R. M. Sandoval, S. Fang, and K. W. Dunn, “Voxx: A PC-based, near real-time volume rendering system for biological microscopy,” American Journal of Physiology-Cell Physiology, vol. 282, no. 1, pp. C213–C218, January 2002.