Log In Sign Up

Fluorescence Microscopy Image Segmentation Using Convolutional Neural Network With Generative Adversarial Networks

by   Chichen Fu, et al.

Recent advance in fluorescence microscopy enables acquisition of 3D image volumes with better quality and deeper penetration into tissue. Segmentation is a required step to characterize and analyze biological structures in the images. 3D segmentation using deep learning has achieved promising results in microscopy images. One issue is that deep learning techniques require a large set of groundtruth data which is impractical to annotate manually for microscopy volumes. This paper describes a 3D nuclei segmentation method using 3D convolutional neural networks. A set of synthetic volumes and the corresponding groundtruth volumes are generated automatically using a generative adversarial network. Segmentation results demonstrate that our proposed method is capable of segmenting nuclei successfully in 3D for various data sets.


page 1

page 3

page 4

page 5


Center-Extraction-Based Three Dimensional Nuclei Instance Segmentation of Fluorescence Microscopy Images

Fluorescence microscopy is an essential tool for the analysis of 3D subc...

AxonDeepSeg: automatic axon and myelin segmentation from microscopy data using convolutional neural networks

Segmentation of axon and myelin from microscopy images of the nervous sy...

Microscopy Cell Segmentation via Adversarial Neural Networks

We present a novel approach for the segmentation of microscopy images. T...

NeuRegenerate: A Framework for Visualizing Neurodegeneration

Recent advances in high-resolution microscopy have allowed scientists to...

Microtubule Tracking in Electron Microscopy Volumes

We present a method for microtubule tracking in electron microscopy volu...

1 Introduction

Fluorescence microscopy is a type of an optical microscopy that uses fluorescence to image 3D subcellular structures [1, 2]. Three dimensional segmentation is needed to quantify and characterize cells, nuclei or other biological structures.

Various nuclei segmentation methods have been investigated in the last few decades. Active contours [3, 4] which minimizes an energy functional to fit desired shapes has been one of the most successful methods in microscopy image analysis. Since active contours uses the image gradient to evolve a contour to the boundary of an object, this method can be sensitive to noise and highly dependent on initial contour placement. In [5]

an external energy term which convolves a controllable vector field kernel with an image edge map was introduced to address these problems. In

[6] 2D region-based active contours using image intensity to identify a region of interest was described. This achieves better performance on noisy image and is relatively independent of the initial curve placement. Extending this to 3D, [7] described 3D segmentation of a rat kidney structure. This technique was further extended to address the problem of 3D intensity inhomogeneity [8]. However, these energy functional based methods cannot distinguish various structures. Alternatively, [9, 10] described a method known as Squassh to solve the energy minimization problem from a generalized linear model to couple image restoration and segmentation. In addition, [11] described multidimensional segmentation using random seeds combined with multi-resolution, multi-scale, and region-growing technique.

Figure 1: Block diagram of the proposed approach for 3D nuclei segmentation

Convolutional neural network (CNN) has been used to address problems in segmentation and object identification [12]. Various approaches, based on CNNs, have been used in the biomedical area [13]. U-Net [14] is a 2D CNN which uses an encoder-decoder architecture with skip connections to segment cells in light microscopy images. In [15] a multi-input multi-output CNN for cell segmentation in fluorescence microscopy images to segment various size and intensity cells was described. Since these approaches [14, 15] are 2D segmentation methods, they may fail to produce reasonable segmentation in 3D. More specifically, stacking these 2D segmentation images into 3D volume may result in misalignment in the depth direction [7]. Also, in [16] a method that trained three networks from different directions in a volume and combined these three results to produce a form of 3D segmentation was described. A 3D U-Net [17] was introduced to identify 3D structures by extending the architecture of [14] to 3D. However, this approach requires manually annotated groundtruth to train the network. Generating groundtruth for 3D volumes is tedious and is generally just done on 2D slices, obtaining true 3D groundtruth volumes are impractical. One way to address this is to use synthetic ground truth data [18, 19]. A method that segments nuclei by training a 3D CNN with synthetic microscopy volumes was described in [20]. Here, the synthetic microscopy volumes were generated by blurring and noise operations.

Generating realistic synthetic microscopy image volumes remains a challenging problem since various types of noise and biological structures with different shapes are present and need to be modeled. Recently, in [21]

a generative adversarial network (GAN) was described to address image-to-image translation problems using two adversarial networks, a generative network and a discriminative network. In particular, the discriminative network learns a loss function to distinguish whether the output image is real or fake whereas the generative network tries to minimize this loss function. One of the extensions of GANs is Pix2Pix

[22] which uses conditional GANs to learn the relationship between the input image and output image that can generate realistic images. One issue with Pix2Pix [22] is that it still requires paired training data to train the networks. In [23]

coupled GANs (CoGAN) for learning the joint distribution of multi-domain images without having the corresponding groundtruth images was introduced. Later, cycle-consistent adversarial networks (CycleGAN)

[24] employed a cycle consistent term in the adversarial loss function for image generation without using paired training data. More recently, a segmentation method using concatenating segmentation network to CycleGAN to learn the style of CT segmentation and MRI segmentation was described in [25].

In this paper, we present a 3D segmentation method to identify and segment nuclei in fluorescence microscopy volumes without the need of manual segmented groundtruth volumes. Three dimensional synthetic training data is generated using spatially constrained CycleGAN. A 3D CNN network is then trained using 3D synthetic data to segment nuclei structures. Our method is evaluated using hand segmented groundtruth volumes of real fluorescence microscopy data from a rat kidney. Our data are collected using two-photon microscopy with nuclei labeled with Hoechst 33342 staining.

2 Proposed Method

Figure 1 shows a block diagram of our method. We denote as a 3D image volume of size . Note that is a focal plane image, of size , along the -direction in a volume, where . Note also that and is the original fluorescence microscopy volume and segmented volume, respectively. In addition, let be a subvolume of , whose -coordinate is , -coordinate is , -coordinate is , where , , , , , and . For example, is a subvolume of a segmented volume, , where the subvolume is cropped between 241 slice and 272 slice in -direction, between 241 slice and 272 slice in -direction, and between 131 slice and 162 slice in -direction.

As shown in Figure 1, our proposed method consists of two steps: 3D synthetic data generation and 3D CNN segmentation. We first generate synthetic binary volumes, , and then use them with a subvolume of the original image volumes, , to train a spatially constrained CycleGAN (SpCycleGAN) and obtain a generative model denoted as model . This model is used with another set of synthetic binary volume, , to generate corresponding synthetic 3D volumes, . For 3D CNN segmentation, we can utilize these paired and to train a 3D CNN and obtain model . Finally, the 3D CNN model is used to segment nuclei in to produce .

Figure 2: Architecture of our modified 3D U-Net

2.1 3D Synthetic Data Generation

Three dimensional synthetic data generation consists of synthetic binary volume generation, SpCycleGAN training, and SpCycleGAN inferences. In synthetic binary volume generation, nuclei are assumed to have an ellipsoidal shape, multiple nuclei are randomly generated in different orientations and locations in a volume [20]. The original CycleGAN and our SpCycleGAN were trained to generate a set of synthetic volumes.

2.1.1 CycleGAN

The CycleGAN is trained to generate a synthetic microscopy volume. CycleGAN uses a combination of discriminative networks and generative networks to solve a minimax problem by adding cycle consistency loss to the original GAN loss function as [21, 24]:



Here, is a weight coefficient and is norm. Note that Model maps to while Model maps to . Also, distinguishes between and while distinguishes between and . is an original like microscopy volume generated by model and is generated by model that looks similar to a synthetic binary volume. Here, and are unpaired set of images. In CycleGAN inference, is generated using the model on . As previously indicated and are a paired set of images. Here, is served as a groundtruth volume corresponding to .

2.1.2 Spatially Constrained CycleGAN

Although the CycleGAN uses cycle consistency loss to constrain the similarity of the distribution of and , CycleGAN does not provide enough spatial constraints on the locations of the nuclei. CycleGAN generates realistic synthetic microscopy images but a spatial shifting on the location of the nuclei in and was observed. To create a spatial constraint on the location of the nuclei, a network is added to the CycleGAN and takes as an input to generate a binary mask, . Here, the architecture of is the same as the architecture of . Network minimizes a loss, , between and . serves as a spatial regulation term in the total loss function. The network is trained together with . The loss function of the SpCycleGAN is defined as:


where and are the weight coefficients for and , respectively. Note that first three terms are the same and already defined in Equation (2.1.1). Here, can be expressed as

2.2 3D U-Net

Figure 2 shows the architecture of our modified 3D U-Net. The filter size of each 3D convolution is

. To maintain the same size of volume during 3D convolution, a voxel padding of

is used in each convolution. A 3D batch normalization


and a leaky rectified-linear unit activation function are employed after each 3D convolution. In the downsampling path, a 3D max pooling uses

with stride of 2 is used. In the upsampling path, feature information is retrieved using 3D transpose convolutions. Our modified 3D U-Net is one layer deeper than conventional U-Net as can be seen in Figure

2. Our training loss function can be expressed as a linear combination of the Dice loss () and the binary cross-entropy loss () such that



respectively [27]. Note that is the set of the targeted groundtruth values and is a targeted groundtruth value at voxel location. Similarly,

is a probability map of binary volumetric segmentation and

is a probability map at voxel location. Lastly, is the number of entire voxels and , serve as the weight coefficient between to loss terms in Equation (3). The network takes a grayscale input volume with size of

and produces an voxelwise classified 3D volume with the same size of the input volume. To train our model

, pairs of synthetic microscopy volumes, , and synthetic binary volumes, , are used.

2.2.1 Inference

For the inference step we first zero-padded by voxels on the boundaries. A 3D window with size of is used to segment nuclei. Since the zero padded is bigger than the 3D window, the 3D windows is slided to , , and -directions by voxels on zero-padded [20]. Nuclei partially observed on boundaries of the 3D window may not be segmented correctly. Hence, only the central subvolume of the output of the 3D window with size of is used to generate the corresponding subvolume of with size of . This process is done until the 3D window maps an entire volume.

3 Experimental Results

We tested our proposed method on two different rat kidney data sets. These data sets contain grayscale images of size . Data-I consists of images, Data-II consist of .

Our SpCycleGAN is implemented in Pytorch using the Adam optimizer

[28] with default parameters given by CycleGAN [24]. In addition, we used in the SpCycleGAN loss function shown in Equation (2.1.2). We trained the CycleGAN and SpCycleGAN to generate synthetic volumes for Data-I and Data-II, respectively. A synthetic binary volume for Data-I denoted as and a subvolume of original microscopy volume of Data-I denoted as were used to train model . Similarly, a synthetic binary volume for Data-II denoted as and a subvolume of original microscopy volume of Data-II denoted as were used to train model .

We generated sets of synthetic binary volumes, and where and are generated according to different size of nuclei in Data-I and Data-II, respectively. By using the model on , pairs of synthetic binary volumes, , and corresponding synthetic microscopy volumes, , of size of were obtained. Similarly, by using model on , pairs of and corresponding , of size of were obtained. Since our modified 3D U-Net architecture takes volumes of size of , we divided , , , and into adjacent non overlapping . Thus, we have pairs of synthetic binary volumes and corresponded synthetic microscopy volumes per each data to train our modified 3D U-Net. Note that these synthetic binary volumes per each data are used as groundtruth volumes to be paired with corresponding synthetic microscopy volumes. Model and are then generated.

Our modified 3D U-Net is implemented in Pytorch using the Adam optimizer [28] with learning rate . For the evaluation purpose, we use different settings of using 3D synthetic data generation methods (CycleGAN or SpCycleGAN), different number of pairs of synthetic training volume ( or ) among pairs of synthetic binary volume corresponding synthetic microscopy volume. Also, we use different loss functions with different settings of the and . Moreover, we also compared our modified 3D U-Net with 3D encoder-decoder architecture [20]. Lastly, small objects which are less than voxels were removed using 3D connected components.

Figure 3: Slices of the original volume, the synthetic microscopy volume, and the corresponding synthetic binary volume for Data-I and Data-II (a) original image of Data-I, (b) synthetic microscopy image of Data-I, (c) synthetic binary image of Data-I, (d) original image of Data-II, (e) synthetic microscopy image of Data-II, (f) synthetic binary image of Data-II
Figure 4: A comparison between two synthetic data generation methods overlaid on the corresponding synthetic binary image (a) CycleGAN, (b) SpCycleGAN

Figure 3 shows the synthetic images generated by our proposed method. The left column indicates original images whereas middle column shows synthetic images artificially generated from corresponding synthetic binary images provided in right column. As can be seen from Figure 3, the synthetic images reflect characteristics of the original microscopy images such as background noise, nuclei shape, orientation and intensity.

Additionally, two synthetic data generation methods between CycleGAN and SpCycleGAN from the same synthetic binary image are compared in Figure 4. Here, the synthetic binary image is overlaid on the synthetic microscopy image and labeled in red. It is observed that our spatial constraint loss reduces the location shift of nuclei between a synthetic microscopy image and its synthetic binary image. Our realistic synthetic microscopy volumes from SpCycleGAN can be used to train our modified 3D U-Net.

Subvolume 1 Subvolume 2 Subvolume 3
Method Accuracy Type-I Type-II Accuracy Type-I Type-II Accuracy Type-I Type-II
Method [7] 84.09% 15.68% 0.23% 79.25% 20.71% 0.04% 76.44% 23.55% 0.01%
Method [8] 87.36% 12.44% 0.20% 86.78% 13.12% 0.10% 83.47% 16.53% 0.00%
Method [9, 10] 90.14% 9.07% 0.79% 88.26% 11.67% 0.07% 87.29% 12.61% 0.10%
Method [20] 92.20% 5.38% 2.42% 92.32% 6.81% 0.87% 94.26% 5.19% 0.55%
3D Encoder-Decoder 93.05% 3.09% 3.87% 91.30% 5.64% 3.06% 94.17% 3.96% 1.88%
+ CycleGAN + BCE
(, ,)
3D Encoder-Decoder 94.78% 3.42% 1.79% 92.45% 6.62% 0.92% 93.57% 6.10% 0.33%
+ SpCycleGAN + BCE
(, ,)
3D U-Net + SpCycleGAN 95.07% 2.94% 1.99% 93.01% 6.27% 0.72% 94.04% 5.84% 0.11%
(, ,)
3D U-Net + SpCycleGAN 94.76% 3.00% 2.24% 93.03% 6.03% 0.95% 94.30% 5.22% 0.40%
(, ,)
3D U-Net +SpCycleGAN 95.44% 2.79% 1.76% 93.63% 5.73% 0.64% 93.90% 5.92% 0.18%
+ DICE and BCE
(, ,)
3D U-Net +SpCycleGAN 95.37% 2.77% 1.86% 93.63% 5.69% 0.68% 94.37% 5.27% 0.36%
+ DICE and BCE
(, ,)
3D U-Net +SpCycleGAN 95.56% 2.57% 1.86% 93.67% 5.65% 0.68% 94.54% 5.10% 0.36%
+ DICE and BCE + PP
(, ,)
(Proposed method)
Table 1:

Accuracy, Type-I and Type-II errors for known methods and our method on subvolume

, subvolume and subvolume of Data-I
Figure 5: 3D visualization of subvolume of Data-I using Voxx [29] (a) original volume, (b) 3D ground truth volume, (c) 3D active surfaces from [7], (d) 3D active surfaces with inhomogeneity correction from [8], (e) 3D Squassh from [9, 10], (f) 3D encoder-decoder architecture from [20], (g) 3D encoder-decoder architecture with CycleGAN, (h) 3D U-Net architecture with SpCycleGAN (Proposed method)

Our proposed method was compared to other 3D segmentation methods including 3D active surface [7], 3D active surface with inhomogeneity correction [8], 3D Squassh [9, 10], 3D encoder-decoder architecture [20], 3D encoder-decoder architecture with CycleGAN. Three original 3D subvolumes of Data-I were selected to evaluate the performance of our proposed method. We denote the original volume as subvolume (), subvolume (), and subvolume (), respectively. Corresponding groundtruth of each subvolume was hand segmented. Voxx [29] was used to visualize the segmentation results in 3D and compared to the manually annotated volumes. In Figure 5, 3D visualizations of the hand segmented subvolume and the corresponding segmentation results for various methods were presented. As seen from the 3D visualization in Figure 5, our proposed method shows the best performance among presented methods visually compared to hand segmented groundtruth volume. In general, our proposed method captures only nuclei structure whereas other presented methods falsely detect non-nuclei structures as nuclei. Note that segmentation results in Figure 5 yields smaller segmentation mask and suffered from location shift. Our proposed method shown in Figure 5 outperforms Figure 5 since our proposed method uses spatially constrained CycleGAN and takes consideration of the Dice loss and the binary cross-entropy loss.

Figure 6: Original images and their color coded segmentation results of Data-I and Data-II (a) Data-I , (b) Data-II , (c) Data-I using [20], (d) Data-II using [20], (e) Data-I using 3D encoder-decoder architecture with CycleGAN, (f) Data-II using 3D encoder-decoder architecture with CycleGAN, (g) Data-I using 3D U-Net architecture with SpCycleGAN (Proposed method), (h) Data-II using 3D U-Net architecture with SpCycleGAN (Proposed method)

All segmentation results were evaluated quantitatively based on voxel accuracy, Type-I error and Type-II error metrics, using 3D hand segmented volumes. Here,

, , , where , , , , are defined to be the number of true-positives (voxels segmented as nuclei correctly), true-negatives (voxels segmented as background correctly), false-positives (voxels falsely segmented as nuclei), false-negatives (voxels falsely segmented as background), and the total number of voxels in a volume, respectively.

The quantitatively evaluations for the subvolumes are shown in Table 1. Our proposed method outperforms other compared methods. The smaller Type-I error shows our proposed method successfully rejects non-nuclei structures during segmentation. Also, our proposed method has reasonably low Type-II errors compared to other segmentation methods. Moreover, in this table, we show that our proposed SpCycleGAN creates better paired synthetic volumes which reflects in segmentation accuracy. Instead of 3D encoder-decoder structure, we use 3D U-Net which leads to better results since 3D U-Net has skip connections that can preserve spatial information. In addition, the combination of two loss functions such as the Dice loss and the BCE loss turns out to be better for the segmentation task in our application. In particular, the Dice loss constrains the shape of the nuclei segmentation whereas the BCE loss regulates voxelwise binary prediction. It is observed that training with more synthetic volumes can generalize our method to achieve better segmentation accuracy. Finally, the postprocessing (PP) that eliminates small components helps to improve segmentation performance.

To make this clear, segmentation results were color coded using 3D connected component labeling and overlaid on the original volumes. The method from [20] cannot distinguish between nuclei and non-nuclei structures including noise. This is especially recognizable from segmentation results of Data-I in which multiple nuclei and non-nuclei structures are colored with the same color. As can be observed from Figure 6 and 6, segmentation masks are smaller than nuclei size and suffered from location shifts. Conversely, our proposed method shown in Figure 6 and 6 segments nuclei with the right shape at the correct locations.

4 Conclusion

In this paper we presented a modified 3D U-Net nuclei segmentation method using paired synthetic volumes. The training was done using synthetic volumes generated from a spatially constrained CycleGAN. The combination of the Dice loss and the binary cross-entropy loss functions are optimized during training. We compared our proposed method to various segmentation methods and with manually annotated 3D groundtruth from real data. The experimental results indicate that our method can successfully distinguish between non-nuclei and nuclei structure and capture nuclei regions well from various microscopy volumes. One drawback of our proposed segmentation method is that our method cannot separate nuclei if they are physically touching to each other. In the future, we plan to develop nuclei localization method to identify overlapping nuclei to individuals.

5 Acknowledgments

This work was partially supported by a George M. O’Brien Award from the National Institutes of Health under grant NIH/NIDDK P30 DK079312 and the endowment of the Charles William Harrison Distinguished Professorship at Purdue University.

Data-I was provided by Malgorzata Kamocka of Indiana University and was collected at the Indiana Center for Biological Microscopy.

Address all correspondence to Edward J. Delp,