In the past few decades, the incidence of thyroid cancer has increased a lot in many countries . Early and precision diagnosis is the key factor in curing thyroid cancer. Thyroid fine needle aspiration (FNA) achieves exceedingly accurate results in identifying papillary thyroid carcinoma . The clinicians then examine the slides made by the tissue under a microscope and make judgements. However, this judgement is time-consuming and subjective .
It is important to develop fast and objective automatic thyroid cancer diagnosis based on computational tools. In fact, the automatic diagnosis of thyroid cancer usually adopts the Whole Slide Image (WSI), which is generated through an electronic scanner. These WSIs often are in a very large size (), which means the direct use of the above schemes to the entire image is impossible due to the great memory usage requirement . The follicular areas contain the most important information for experts to make diagnosis decision, and follicular segmentation is also a vital step for the automatic diagnostic algorithms. In this paper, we focus on the highly efficient follicular segmentation in thyroid cytopathological WSIs.
Automatic follicular area segmentation for thyroid WSIs faces many challenges due to the following difficulties. Firstly, the data size of the WSI is too large for computers to handle at one time. Secondly, after Pap staining, a large difference between the slides occurs. Figure 1 shows the staining of different slices. It can be seen that the stainings of different slides vary greatly. Besides, the follicular cells are usually tightly wrapped by the massive colloid areas, which makes follicular segmentation much harder.
In this paper, we design a highly efficient accurate follicular segmentation method for thyroid FNA WSIs. We will firstly introduce the hybrid method and the loss function in detail. Secondly, we will experiment with patches and WSIs. Finally, the model will be compared with classic classification models and segmentation models, which will be trained with the same dataset as ours and evaluated with both patches and WSIs.
2 Related Work
Traditional machine learning[8, 9]
methods and deep learning methods[7, 12] greatly improve the accuracy of automatic lesion classification in medical areas. Gopinath et al. 
perform support vector machine(SVM) and achieve a diagnostic accuracy of 96.7%. B. Gopinath et al. fusion four classifiers and obtain a diagnostic accuracy of 96.66%. Different from the works mentioned above, Edward Kim et al.  utilize a deep CNN to the application of thyroid cytopathology classification. Ghosh et al.  present a high accuracy by fine-tuning GoogLeNet  in breast FNAC cell sample diagnosis in malignant or benign categories.
Traditional semantic segmentation methods  learn the representation from hand-craft features instead of the semantic features. Recently, CNN-based methods largely improve performance. FCN  is the pioneering work on semantic segmentation by modifying fully connected layers into convolution layers in classification. DeepLab [4, 5, 6] uses dilated convolutions to provide dense labeling and enlarge the receptive field. Semantic segmentation methods have already been used in the pathological image segmentation. Rueckert et al.  propose a fully automated segmentation framework to identify placental candidate pixels. Cai et al. 
introduce an image segmentation method based on recurrent neural network.
3.1 Dataset Preprocessing
The dataset used in this paper is the thyroid cytopathological slide provided by a national top-level comprehensive hospital, which is clinical data collected from patients.
We use the color adjustment method in  to reduce the influence of the staining. A patch is chosen as a standard of the staining and the other patches are adjusted based on the staining mode of the selected patch.
In the generated patches of a WSI, only less than 10% patches contain follicular cells. To label patches and filter the irrelevant patches out, we merge a classifier into the segmentation model.
Patches are divided into three categories: the patches containing the follicular area, the patches of the colloidal area and the patches of the blank non-information area. The patches labeled Follicular are the target patches for the segmentation.
We share the same layers in classifier and segmentation model in order to avoid introducing many parameters. The shared structure is Block 1 of ResNet 101 . The structures of ResNet 101 and Blocks are shown in Figure 2(a)(b).
We design other layers of the classifier as Figure 2(c) shows. The input of the classifier is the output of Block 1 in ResNet 101. A convolution layer and two fully connected layers are added. The final fully connected layer has 3 output nodes which are the same as the category number of the dataset. The loss function of the classification model is the average cross entropy.
The dilated convolutions used in the atrous spatial pyramid pooling (ASPP) are extracted multi-scale information. However, they ignore many relevant detail features which are significant for the thyroid cytopathological WSI dataset. We propose an enhanced ASPP (E-ASPP), which adds precise low scale features to ASPP in order to make up for the deficiencies.
Figure 3 shows E-ASPP in our method. Beside the structure already existed, we add the low scale features from Block 3 into the original ASPP. E-ASPP offsets the deficiencies of ASPP and improves the accuracy on the follicular segmentation.
3.3.2 Criterion-Oriented Adaptive Loss Function.
To lead the model converging much faster, we propose a criterion-oriented adaptive loss function.
Equation 1 shows the criterion-oriented adaptive loss function. In a batch, represents the value of the certain criterion while the denominator is the average cross entropy of patches. It gives a weight to lead the model converging much faster based on the criterion which is used to evaluate the model.
is expected probability distribution that comes from ground truth,is predicted probability distribution that comes from the prediction of the model.
In this paper, four traditional criteria is used to give practical meanings: pixel accuracy (pAcc), mean accuracy (mAcc), mean intersection over union (mIoU) and frequency weighted intersection over union (fwIoU) . They usually are used to evaluate the performance of the semantic segmentation.
We compare the effects of criterion-oriented adaptive loss functions for different criteria with the effect of the cross-entropy loss function in Figure 4. Under the same number of iterations, the loss function proposed in this paper can make the certain criterion achieve better results faster.
3.4 Training Method
We jointly train the hybrid model. As two problems generate two different loss functions, the final loss function is weighted summing of them. The weight can be adjusted according to different situations. In our experiments, the weight is .
4 Experimental Evaluation
4.1 Training Environment & Dataset
We use Centos 7.0 server to conduct the experiments. The training process uses 2 NVIDIA GTX 1080Ti 12GB GPU (NVIDIA Corporation, Santa Clara, CA) and the NVIDIA Deep Learning GPU Training System (DIGITS 4.0) which has the tensorflow deep learning framework inside.
The dataset used in this paper contains 15 WSIs. The dataset is divided into two parts: the patch dataset and the WSI dataset. The patch dataset consists of 13 WSIs while the WSI dataset consists of 2 WSIs. We use the patch dataset to train and preliminary test model effect. The WSI dataset is used to test the effectiveness of the hybrid model in practice.
It is worth noting that all the models in this paper (our model and comparative models) are trained using the thyroid cytopathological image dataset instead of using pre-trained models for fine-tuning.
4.2 Performance of the Classifier
To evaluate the classifier objectively, we compare it with classic classification models:LeNet , AlexNet , GoogLeNet . All the models are trained using thyroid cytopathological image dataset. Table 1 shows the comparison results through accuracies. Obviously, except GoogLeNet, the classifier we propose has the best performance on this classification problem. The time spent by GoogLeNet is nearly 4 times the time spent by our model while only improving 0.67% of accuracy. The structure of GoogLeNet is unique and cannot share the same structure with segmentation models. Our classifier finds best balance in accuracy and calculation, which guarantees the efficiency within the scope of fault tolerance.
4.3 Performance of the Segmentation Model
We experiment with the segmentation structure and compare it with classic segmentation models: FCN, Unet, Deeplab V3. For all the models in this experiment, we train them with the patch dataset to exclude other factors.
To evaluate models accurately, we calculate four criteria to compare the performance specifically. The pAcc and the mAcc evaluate models in the pixel level so that we set as the definition of pAcc in this session. The mIoU and the fwIoU evaluate models in the IoU level so that we set as the definition of mIoU in this session. Table 2 shows criteria values of different models. All the criteria perform best with our method. It proves that E-ASPP and the criterion-oriented adaptive loss function are effective.
4.4 WSI Segmentation of the Hybrid Model
We experiment with our method and other models with the WSI dataset. To compare model efficiency more fairly, we add data preprocessing to FCN, Unet, and Deeplab V3 . The preprocessing method is to use the gradient clustering method to filter non-information patches. Table 2 shows the accuracies and times of the model after adding data preprocessing.
All the values of accuracy criteria decrease in the WSI dataset since the WSI dataset is more complex than the patch dataset. However, our method performs well in the complex situation, which exists in real medical diagnosis. The time taken for our model to calculate a WSI is much less than the time taken to other models after preprocessing while ensuring accuracy.
Focusing on the practical problems of thyroid cytopathological diagnosis, we propose a highly efficient hybrid method for the follicular segmentation problem. The hybrid method integrates a classifier into the segmentation model. At the same time, we propose E-ASPP and a criterion-oriented adaptive loss function which have achieved good results in the accuracy in the follicular segmentation. We experiment with the patch dataset and the WSI dataset. The hybrid method significantly improves previous solutions of the follicular segmentation in thyroid cytopathological WSIs and achieves good performance of efficiency and accuracy.
This work is supported in part by the Beijing Natural Science Foundation (4182044), Basic scientific research project of Beijing University of Posts and Telecommunications (2018RC11). This work is conducted on the platform of Center for Data Science of Beijing University of Posts and Telecommunications.
-  Alansary, A., Kamnitsas, K., Davidson, A., Khlebnikov, R., Rajchl, M., Malamateniou, C., Rutherford, M., Hajnal, J.V., Glocker, B., Rueckert, D.: Fast Fully Automatic Segmentation of the Human Placenta from Motion Corrupted MRI (2016)
-  Barbosa, G.F., Milas, M.: Peripheral thyrotropin receptor mrna as a novel marker for differentiated thyroid cancer diagnosis and surveillance. Expert review of anticancer therapy 8(9), 1415–1424 (2008)
Cai, J., Lu, L., Zhang, Z., Xing, F., Yang, L., Yin, Q.: Pancreas segmentation in mri using graph-based decision fusion on convolutional neural networks. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 442–450 (2016)
-  Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected crfs. Computer Science (4), 357–361 (2015)
-  Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis & Machine Intelligence 40(4), 834–848 (2018)
-  Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation (2017)
-  Garud, H., Karri, S.P.K., Sheet, D., Chatterjee, J., Mahadevappa, M., Ray, A.K., Ghosh, A., Maity, A.K.: High-magnification multi-views based classification of breast fine needle aspiration cytology cell samples using fusion of decisions from deep convolutional networks. In: CVPR Workshops. pp. 828–833 (2017)
-  Gopinath, B., Shanthi, N.: Support vector machine based diagnostic system for thyroid cancer using statistical texture features. Asian Pacific Journal of Cancer Prevention 14(1), 97–102 (2013)
-  Gopinath, B., Shanthi, N.: Development of an automated medical diagnosis system for classifying thyroid tumor cells using multiple classifier fusion. Technology in cancer research & treatment 14(5), 653–662 (2015)
-  He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition pp. 770–778 (2015)
-  James, B.C., Mitchell, J.M., Jeon, H.D., Vasilottos, N., Grogan, R.H., Aschebrook-Kilfoy, B.: An update in international trends in incidence rates of thyroid cancer, 1973–2007. Cancer Causes & Control 29(4-5), 465–473 (2018)
-  Kim, E., Corte-Real, M., Baloch, Z.: A deep semantic mobile application for thyroid cytopathology. In: Medical Imaging 2016: PACS and Imaging Informatics: Next Generation and Innovations. vol. 9789, p. 97890A. International Society for Optics and Photonics (2016)
-  Komura, D., Ishikawa, S.: Machine learning methods for histopathological image analysis. Computational & Structural Biotechnology Journal (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems. pp. 1097–1105 (2012)
-  Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. Tech. rep. (2014)
-  Lécun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
-  Preetha, M.M.S.J., Suresh, L.P., Bosco, M.J.: Image segmentation using seeded region growing. In: International Conference on Computing, Electronics and Electrical Technologies. pp. 576–583 (2012)
-  Reinhard, E., Ashikhmin, M., Gooch, B., Shirley, P.: Color transfer between images. IEEE Computer Graphics & Applications 21(5), 34–41 (2002)
-  Samsi, S., Krishnamurthy, A.K., Gurcan, M.N.: An efficient computational framework for the analysis of whole slide images: Application to follicular lymphoma immunohistochemistry. Journal of computational science 3(5), 269–279 (2012)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1–9 (2015)