Automated cervical nucleus segmentation based on deep learning can effectively improve the quantitative analysis of cervical cancer. However, accurate nuclei segmentation is still challenging. The classic U-net has not achieved satisfactory results on this task, because it mixes the information of different scales that affect each other, which limits the segmentation accuracy of the model. To solve this problem, we propose a progressive growing U-net (PGU-net+) model, which uses two paradigms to extract image features at different scales in a more independent way. First, we add residual modules between different scales of U-net, which enforces the model to learn the approximate shape of the annotation in the coarser scale, and to learn the residual between the annotation and the approximate shape in the finer scale. Second, we start to train the model with the coarsest part and then progressively add finer part to the training until the full model is included. When we train a finer part, we will reduce the learning rate of the previous coarser part, which further ensures that the model independently extracts information from different scales. We conduct several comparative experiments on the Herlev dataset. The experimental results show that the PGU-net+ has superior accuracy than the previous state-of-the-art methods on cervical nuclei segmentation.READ FULL TEXT VIEW PDF
Accurate segmentation of prostate and surrounding organs at risk is impo...
In the last few years, Deep Learning (DL) has been showing superior
Due to cellular heterogeneity, cell nuclei classification, segmentation,...
Brain image segmentation is used for visualizing and quantifying anatomi...
We explore the use of deep learning for breast mass segmentation in
State-of-the-art segmentation methods rely on very deep networks that ar...
Medical imaging only indirectly measures the molecular identity of the t...
Pap smear is an important test for early screening of precancerous lesions and malignant tumors in gynecology. Accurate segmentation of cervical cancer cells, especially the segmentation of the nuclei, is significant to quantitatively analyze the cervical cancer. Traditional cervical segmentation methods based on image representation are widely used, such as Wavelet 2], template fitting , adaptive thresholding 5] and graph-cuts . Such methods are based on low-level hand-crafted features that usually represent the texture features of the image rather than high-level semantic features. Since the cervical cells of different disease stages undergo global (semantic) changes, if these methods are unable to effectively extract the semantic information of the images, their segmentation accuracy will not satisfy the actual clinical requirements.
The method of deep learning pixel-based object segmentation or detection can simultaneously take into account the characteristic information of different cell structures. The structure of a neural network adjusts the sizes of the receptive fields to adapt to different sizes of targets. Continuous feature extraction through multiple iterations can greatly promote the accuracy of segmentation results. Traditional convolutional neural network U-net realizes multi-scale information extraction through skip connection. The multi-scale information may have much redundancy and repetition. The use of fixed-size receptive fields for different scale targets is limited to multi-scale learning. Many studies have begun to focus on multi-scale information extraction methods for different target sizes and shapes, such as increasing the receptive field, adding dilated convolution, and merging feature information of different convolution layers, thus improving the classification accuracy of each pixel and generalization of detail features.  proposed multi-scale convolutional networks and segmentation methods for cervical nucleus and cytoplasm based on graph partitioning. Song et. al. uses a multi-scale deep convolutional neural network to extract diverse feature information and segment overlapping cervical cells . The dilated convolution model, which combines multi-scale context information while maintains the receptive field of the original network without losing the resolution of the image space. It has good effects in image classification, target detection and semantic segmentation [10, 11]. However, the dilated rate of the dilated convolution is difficult to design. The artificially designed dilated convolution cannot take into account the characteristic information embodied by the targets of different sizes and shapes. At the same time, learning the feature information of different scales is powerless for the neural network.
To address the aforementioned problems, we propose a novel model - the progressive growing of U-net with residual modules (PGU-net+). Based on the classic U-net, we propose two improvements in the network architecture. First, we added residual modules between different stages (i.e. scales) of the classic U-net. In the first stage with the lowest resolution, we downsample the image and the annotation and train the coarsest part of the model, which learns an approximate shape of the segmentation. We then pass this approximate shape through a residual connection to the next stage with higher resolution, which only learns the residuals of the approximate shape and the annotation (images and annotations will be resampled accordingly in all stages). Thus at each stage, we enforce the model to learn the information related to the current scale. We name this architecture as U-net+. Experiments show that U-net+ can effectively improve the segmentation accuracies.
Second, we adopt a network training paradigm in , called progressive growing. We start to train the model with the coarsest part with downsampled images and annotations, and then progressively add finer part to the training until the full model is included. When training a finer part, we will reduce the learning rate of the previous coarser part, which further encourages the model to extract information from different scales independently. In addition, such paradigm significantly reduces the computational consumption than training the entire model simultaneously. Fig 1 shows the flow chart of this method comprises four stages.
Classical U-net comprises two major parts: contracting path and expansive path. In the contracting path of deep neural networks, a series of convolution operations can extract feature information to generate coarser feature maps. In the expansive path, corresponding decoding stages progressively recover the resolution of feature maps from coarse to fine.
In order to avoid information loss, we introduce a residual module (as shown in Fig 2) between adjacent scales. The low-resolution feature map of the previous layer is added directly to the high-resolution feature map of the next layer at the pixel level to form residual module. The module is defined as follows:
Here represents the input feature map, , denote the weight of the convolution kernel, represents the output feature map, and the function is the convolution of the expansive path and the upsamle operation. represents the residual module. This kind of structure can extract more abundant multi-scale information without increasing the parameters and calculation cost. At each stage, the current network pays more attention to the residual information of adjacent scales to ensure good performance.
Traditional convolution kernels or deformable convolution kernels simultaneously learn target information of all scales, which can easily lead to a large number of repetitive or redundant features. If the network is deepened and widened, it will result in high computational and memory cost. Our proposed PGU-net+ model extracts multi-scale feature by introducing a progressive growing  training approach. As shown in Fig 1, we set up 4 training phases. In the first phase, we input a low-resolution image () to a small U-net network to get the same size of low-resolution output. Then we gradually increase the resolution of the input image to , and , and continuously add convolution layers to the network to form deeper U-net structures. This type of training allows the network to learn large-scale image coarse structure information first, and then focus on more detailed features at a later stage, rather than learn information of all the scale at the same time. At each stage, the model receives input images of different sizes, so that multi-scale information of target regions of different sizes can be learned step by step. This method makes the model converge faster and have better generalization ability and stability without extra parameters and calculations. Fig 3 shows the U-net structure in the final stage with the residual module added to each expansive path.
We introduce residual module in the extended path of the U-net structure, and adopt a progressive growing training method. At each stage, the model iteratively learns the residual information of adjacent scales. All existing layers in networks remain trainable throughout the training process. When new layers are added to the networks, we adjust smaller learning rate to well-trained, smaller-resolution layers with transferred parameters to avoid sudden shocks on existing networks. By migrating low-resolution image features, the learning of high-resolution images is easier, and the convergence process is faster. The task division of multi-scale learning is further clarified, and the extracted multi-scale information is more accurate and rich.
In response to our proposed PGU-net+ structure, this experiment validates our method on the Herlev dataset. The dataset contains 917 images of cervical cancer cells, with each image containing four parts: background, cytoplasm, nucleus and unknown area. Here, we manually determine the unknown area as the background. Considering the difference between large and small nuclei, large and small nuclei are segmented as two types during model training, and all images are normalized to zero mean with unit variance intensity and are resized to a size of.
We train the model on a single NVIDIA GPU-TITAN. In the first stage, a raw data is used as input for a small U-net. In the expansive path, the low-resolution feature map is directly doubled and then added to the adjacent high-resolution output to form a residual module, so that the network focuses on learning the residual information of different scales. In the second stage, the original image of
size is used as a U-net input with 2 downsampling and upsampling. In the expansive path, the low resolution feature map is also doubled and added to the adjacent one. And so on into the third and fourth stages. After training 40 epochs at each stage, the next stage is entered. During the parameter transferring process, the learning rate of the trained low-resolution convolutional layer is set to 1e-6, and the newly added convolutional layer learning rate is set to 1e-4 to maintain large-scale feature information and avoid the impact of model changes on existing parameters. We use RMSprop optimization to adaptively adjust the model weights, and the activation function uses RELU.
We conduct four sets of experiments. The first group uses a traditional U-net structure to perform nuclear segmentation on images, including four layers of downsampling and upsampling operations. The second group (short for U-net+) adds a residual module to the expansive path of the traditional U-net structure, making it easier for the training process to grasp features at different scales. The third group (short for PGU-net) applies the progressive growing training method to the traditional U-net structure, continuously increasing the resolution of the input image from 32 to 256 and slowly migrating the low-resolution layer parameters trained in the previous stage. The fourth group adds residual modules in the traditional U-net structure and introduces a progressive growing training mode. The superiority of our proposed PGU-net+ is verified by comparing the four sets of experiments.
By comparing experiments on the Herlev dataset, a total of four set of segmentation results for the dataset are summarized. As shown in Table 1
, we give three indicators of ZSI, precision and recall. It shows that the U-net network structure with residual module (PGU-net+ model and U-net+ model) is superior to the classic U-net neural network (PGU-net model and U-net model). The progressive growing U-net network structure (PGU-net+ model and PGU-net model) is superior to the classic U-net neural network (U-net+ model and U-net model). The progressive growing with the residual module U-net structure we proposed achieves the best segmentation results. The results of the two groups of cell segmentation experiments are shown in Fig4. It can be seen that the PGU-net+ has better segmentation results for cells of different sizes and shapes. We also compare other studies for this dataset. Table 2 shows the superiority of our model in the three indicators of ZSI, precision and recall under a single model. Our proposed PGU-net+ structure has a segmentation accuracy of 0.925 on the Herlev dataset, and the parameter amount (13M) and computation are much smaller than other models.
In this work, we propose to add the residual module in the expansive path of the classic U-net structure, and adopt the progressive growing training mode. Four models (PGU-net+, U-net+,PGU-net and U-net)are used to test on the Herlev dataset. The experimental results show that our model is effective to extract multi-scale information, making the task of extracting multi-scale information more explicit. Furthermore, this residual module can be easily inserted into other higher-order and more complex neural network structures, and the progressive growing training method can also be optimized to solve different scale target detection and target segmentation problems in other fields.
Acknowledgments. This work is supported in part by the National Key Research and Development Program of China under Grant 2018YFC0910700 and the National Natural Science Foundation of China (NSFC) under Grants 81801778, 11831002, 11701018.