A Semi-Supervised Framework for Automatic Pixel-Wise Breast Cancer Grading of Histological Images

07/03/2019 ∙ by Yanyuet Man, et al. ∙ 4

Throughout the world, breast cancer is one of the leading causes of female death. Recently, deep learning methods are developed to automatically grade breast cancer of histological slides. However, the performance of existing deep learning models is limited due to the lack of large annotated biomedical datasets. One promising way to relieve the annotating burden is to leverage the unannotated datasets to enhance the trained model. In this paper, we first apply active learning method in breast cancer grading, and propose a semi-supervised framework based on expectation maximization (EM) model. The proposed EM approach is based on the collaborative filtering among the annotated and unannotated datasets. The collaborative filtering method effectively extracts useful and credible datasets from the unannotated images. Results of pixel-wise prediction of whole-slide images (WSI) demonstrate that the proposed method not only outperforms state-of-art methods, but also significantly reduces the annotation cost by over 70



There are no comments yet.


page 3

page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Breast cancer is the most commonly diagnosed cancer for women, which is estimated to account for 30% of new cancer diagnoses and 15% of cancer deaths in the United States

[11]. Early and precise diagnosis of breast cancer is crucial to improve the survival rate of patients [10]

. Microscopic examination of stained tissue sections is among the most accurate methods of diagnosing and classifying cancer. The cancer effects can be observed in WSIs in the cellular and tissue levels.

Figure 1: An annotated whole slide image: The patches framed in red contour are invasive cancer. Those framed in blue contour are In-Situ cancer. Those framed in green contour are Benign. The rest part of the slide is normal.

Fig.1 shows an example of the cancer effects, which are classified into four categories, i.e., invasive cancer, In-Situ cancer, Benign, and normal. Recently, many computer-aided systems utilize deep learning models to improve the classification consistency and accuracy [5, 13, 2, 5]. However, robust deep learning models require large annotated datasets, which are costly to produce especially for medical images. Recent studies integrate active learning with deep learning, which utilize unannotated data to improve the performance of deep learning model [3, 7, 12, 6]. Yang et al. applied active learning method on fully convolutional network (FCN) to select the most representative and uncertain areas for annotation [14]. One of the drawbacks is that the FCN cannot be applied to high resolution images, such as WSIs. And it is difficult to acquire iterative annotation on the high-resolution WSIs. Generative Adversarial Network (GAN) is widely applied to generate realistic images, which overcomes the limitations of small training datasets. Mahapatra et al. applies conditional generative adversarial networks (cGANs) to generate informative and realistic chest X-ray images, which enlarge the training datasets [8]. However, GAN generates artifacts on the original images. The artifacts introduced to the images, although subtle, could substantially alter the features of cells and tissues, which mislead the model and affect the convergence of parameters. Existing methods fail to provide an efficient solution for automatically grading breast cancer on limited annotated WSIs. In this paper, we present a new semi-supervised framework based on expectation maximization (EM). We leverage unannotated WSIs to adjust the deep learning model on a limited annotated dataset, reducing the reliance on expensive pixel-wise annotations. The main contributions of the paper are:

  • To the best of our knowledge, we first apply active learning method in breast cancer grading, and propose a semi-supervised approach based on EM to effectively reduce annotated dataset for multi-classes pixel-wise breast cancer grading on WSIs.

  • We propose a sample selection method based on collaborative filtering, which selects the credible and representative unannotated datasets for enlarging the training dataset.

  • Using the proposed semi-supervise framework, significantly enhanced performance on pixel-wise prediction of WSIs is achieved with only 30% of annotated dataset.

Figure 2: Overall flow of the EM model.

2 Method

2.1 Semi-supervised Learning Framework Based on EM model

An overview of our semi-supervised learning framework is shown in Fig.2 In the semi-supervised learning framework, only part of whole slide images is annotated, which is defined as set . The label of some slides is unknown, which is defined as set . Let denote the label for patch . Let hidden variable denote the label for patch . We initialize the CNN model on and update the model parameter to

. We apply initial CNN model to produce the probability map

of . The EM algorithm alternates between the E-step for estimating the hidden labels and the M-step for computing optimal model parameters with maximized . The probability map is projected to a scaled value between 0 and 1, which is used to generate the consistent heatmap (see Fig.2

(C)). The fixed vector

= is applied on the heatmap to generate the classmap as shown in Fig.2(D).

Next, The most representative and credible patches based on collaborative filtering are selected to train the CNN model in the next iteration.
Initialization Assume the patches are independently and identically distributed (i.i.d.). The initial parameter is obtained from the CNN model, which is trained on annotated dataset . Here, is computed as:


E-step: Based on the current parameters at EM iteration , we calculate the probability maps of unannotated patches, and then re-scale the probability maps to . We generate the class label based on and then obtain the classmap.

The ground truth of the unannotated patches is then extracted as , and the effective dataset is selected according to the method described in Section 2.2.
M-step: The CNN model is retrained on the effective dataset produced in the E-step. The model parameter is updated to maximize the likelihood defined in Equation (2).


Assume that

follows an uniform distribution, we formulate the objective function



2.2 Patch Selection

Hard Example Mining Hard example mining is used in the initialization step to fully exploit the annotated dataset, especially those with wrong classification results. An effective coefficient is defined as in Equation (4). The higher the value of is, the harder and more valuable the corresponding patch is for model training.


Here, denotes the class label of the patch , and denotes the probability map.

For the initialization step, we first train our model on 50% of the annotated data. Then, we apply this model on the rest of the data and calculate the effective coefficients. Patches with effective coefficient in the first quintile (top 20%) are selected to retrain the model.
Collaborative Filtering

0:    (Unannotated dataset) (Annotated dataset) (CNN model in iteration ) (Similarity threshold for patch selection)
0:    (Set of unannotated patches in iteration )
0:    {Output feature of given patch } {Prediction result of for patch } {Output label with largest number}
1:  for each  do
4:     Set to vector [0,0,0,0]
5:     for each  do
7:        if  then
9:        end if
10:     end for
12:     if  then
14:     end if
15:  end for
Algorithm 1 Patch selection method.

In the E-step, patches are selected using Algorithm 1. We first apply CNN to extract the features of all patches, and then calculate similarity between each unannotated patch and annotated patch . For each unannotated patch , we compute the set of annotated patches as . Then we apply the majority voting method on the above computed patches to determine the label of the unannotated patch. If the assigned label is consistent with the predicted one by the model, we insert unannotated patch into .

3 Experiment

3.1 Data Preprocess

Foreground Patch Extraction on WSI Image The high resolution images of WSIs need to be converted into patches for use. However, a large part of WSI is background, which produces uninformative patches in the datasets, and thus should be excluded. The widely used foreground extraction method Ostu fails to extract certain parts of the tissue from the slide. Therefore, we adapt the graph-based image segmentation method in [1] for foreground extraction. For a given slide, we construct an undirected graph . In , each node corresponds to a pixel. The edge set correspond to the connection between adjacent pixels. We set the edge weight to be . Then we compute the minimum spanning tree using Kruskal’s algorithm [4], and delete the edges in whose weights are greater than a prespecified threshold (100 in the experiments). The deletion of these edges produces a forest, i.e., a set of sub-trees (e.g., ). Now we compute the average RGB values for the sub-trees (e.g., ). Among the computed average RGB values, assume the maximum value is . Then all the sub-trees with average RGB value greater than are set as background. Then the foreground mask is obtained as shown in Fig.3. According to the foreground mask, we crop the WSI into patches with 50% overlap, where each patch consists of pixels. Patches with less than 40% foreground pixels are considered to be background, which are not used for classification.

Figure 3: The foreground extraction of a WSI: (a) The original WSI, (b) Otsu method [9], and (c) our method.

Patch Label Extraction We assign the label of the patches according to the ground-truth contour of WSI. In most cases, the label of the patch is obtained according to the type of cancer (Benign, In-Situ or Invasive) with the largest area in the patch. However, there are two special cases as follows:

  • If the cancer area in a patch is less than one-third of the whole patch area, this patch is labeled as normal.

  • If there are two or more types of cancer in a patch, and the corresponding tissue areas are both greater than one-third of the whole patch area, this patch is considered to be a noisy patch and discarded. In the experiments, the number of such patches is very few.

In Kwok’s work in [5], the class value of a patch is the mean of the class values of all pixels in the patch. Our experiments show that the above method tends to generate wrong labels, which disrupt the learning process. For example, when half of a patch contains In-Situ areas and the rest is normal, Kwok’s method labels this patch as benign even if there are no benign tissues at all. In contrast, our EM-based method effectively avoids the drawbacks of Kwok’s method.

Metric Patch-wise Pixel-wise
Precision Accuracy F1-Measure Score Metric
Kwok et al. 0.5798 0.7784 0.6391 0.7715
ALM-10 % 0.7205 0.8466 0.8078 0.7186
ALM-20 % 0.7350 0.8684 0.8082 0.7447
ALM-30 % 0.7477 0.9035 0.8303 0.7759
FSL 0.8239 0.8963 0.7698 0.7592
Our-10 % 0.7218 0.8856 0.8054 0.7675
Our-20 % 0.7499 0.8852 0.8048 0.7539
Our-30 % 0.7987 0.9197 0.8623 0.7858
iteration2-Our-30% 0.8293 0.9210 0.8751 0.8027
Table 1: Demographic Prediction performance comparison by three evaluation metrics.
Figure 4: The pixel-wise classification results of slides A02 and A08 (Green contour: Benign, Red contour: Invasive, Blue contour: In-Situ, Others: Normal), (a) and (e) give the results of Kwok methods, (b) and (f) give the results of the model trained on full annotated datasets, (c) and (g) give the results of our EM-based method trained on 30% of annotated datasets combined with unannotated datasets, (d) and (h) give the results labeled by pathologist.

3.2 Patch Classification

Our neural network is a fine-tuned vgg19 network with batch normalization. Given the large patches of size

, we resize them into , and then feed them into the network. For adapting to the fully connected layers, we add an average pooling layer, which converts the feature map into a vector. The patch-wise experimental results are summarized in Tabel 1. we first apply active learning method (ALM) to continually finetune the classification model with informative and effective datasets instead of retrain the model with all datasets. ALM-10%, ALM-20%, and ALM-30% refer to different models trained on corresponding portions of annotated datasets. FSL is a model trained on all the annotated dataset. Our-10%, Our-20%, and Our-30% are the proposed EM-based model trained on corresponding portions of annotated datasets. For example, Our-30% denotes our proposed method using 30% of the whole annotated dataset.

From the experiment, FSL obtains 0.76, 0.89 and 0.82 for F1 score, accuracy and precision, respectively. Kwok’s method obtains similar results of 0.63, 0.77, and 0.57. However, with 30% of the whole annoatated dataset, ALM obtains 0.83, 0.90, and 0.74, which outperforms FSL. This can be explained by the exclusion of the uninformative data. In contrast, our proposed method obtains 0.86, 0.91, 0.79 for F1 score, accuracy and precision, respectively. Among the different methods, Our-30% achieved the best results using only 30% of annotated data combined with unannotated data. Moreover, our method significantly reduces the runtime for finetuning the model as in ALM.

3.3 Pixel-wise Classification on WSI

In the experiment, we first extract the probability map from the softmax layer of our model, and then normalize the probability map to a scaled value. The scaled value represents the intensity of cancer, where the scaled value of Normal cancer is close to 0, and the scaled value of Invasive cancer is close to 1. Given the scaled value of each patch, we generate a heatmap of WSI, where the intensity of every pixel is the scaled value of the patch that the pixel belongs to. Next, we map the heatmap to classmap by the fixed vector

, where a pixel is classified according to its intensity value as follows: (1) for Normal, (2) for Benign, (3) for In-Situ, and for Invasive. The pixel-wise classification results are summarized in Table 1 in terms of score metrics defined on ICIAR. Our method with 30% annotated dataset achieves the best performance with a score of 0.785, where the best score of Kwok, FSL, and ALM methods are 0.771,0.759 and 0.775, respectively.

The three methods Kwok, FSL and ALM perform relatively well in detecting large areas of cancer. However, for small areas of cancer, these methods usually fail. Fig.4 shows an example of invasive tissues in A08 slide, which consists of many small cancer areas. Kwok’s method tends to classify small invasive tissues to In-Situ tissues. On the other hand, FSL is unable to recognize lots of small invasive tissues in A02 and A08 slides shown in Fig.3. In contrast, the proposed EM-based method is able to detect small areas of cancer, which are crucial for correct diagnosis.

4 Conclusion

In this paper, we have proposed an effective semi-supervised approach based on the EM model, which significantly reduces the reliance on the annotated dataset. Experiment results show that the proposed method achieves remarkable performance with only 30% annotated datasets. Moreover, the proposed method effectively traces the small cancer areas, which is one of the key markers for cancer diagnosis. In the future, more parameters and metrics will be introduced in the system, such as max area of cancer, number of different types of cancer, degree of patient, etc. More prior knowledge will be introduced for generating adaptive parameters in the proposed EM framework.


  • [1]

    Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. International journal of computer vision

    59(2), 167–181 (2004)
  • [2]

    Gecer, B., Aksoy, S., Mercan, E., Shapiro, L.G., Weaver, D.L., Elmore, J.G.: Detection and classification of cancer in whole slide breast histopathology images using deep convolutional networks. Pattern recognition

    84, 345–356 (2018)
  • [3] Greenspan, H., Van Ginneken, B., Summers, R.M.: Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE Transactions on Medical Imaging 35(5), 1153–1159 (2016)
  • [4] Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical society 7(1), 48–50 (1956)
  • [5] Kwok, S.: Multiclass classification of breast cancer in whole-slide images. In: International Conference Image Analysis and Recognition. pp. 931–940. Springer (2018)
  • [6]

    Li, J.: Active learning for hyperspectral image classification with a stacked autoencoders based neural network. In: 2015 7th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS). pp. 1–4. IEEE (2015)

  • [7]

    Lu, L., Zheng, Y., Carneiro, G., Yang, L.: Deep learning and convolutional neural networks for medical image computing. Advances in Computer Vision and Pattern Recognition; Springer: New York, NY, USA (2017)

  • [8] Mahapatra, D., Bozorgtabar, B., Thiran, J.P., Reyes, M.: Efficient active learning for image classification and segmentation using a sample selection and conditional generative adversarial network. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 580–588. Springer (2018)
  • [9] Otsu, N.: A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics 9(1), 62–66 (1979)
  • [10] O’Dell, W., Takita, C., Casey-Sawicki, K., Daily, K., Heldermon, C.D., Okunieff, P.: Projected clinical benefit of surveillance imaging for early detection and treatment of breast cancer metastases. The breast journal 25(1), 75–79 (2019)
  • [11] Siegel, R.L., Miller, K.D., Jemal, A.: Cancer statistics, 2019. CA: a cancer journal for clinicians 69(1), 7–34 (2019)
  • [12] Stark, F., Hazırbas, C., Triebel, R., Cremers, D.: Captcha recognition with active deep learning. In: GCPR Workshop on New Challenges in Neural Computation. vol. 10 (2015)
  • [13] Tellez, D., Balkenhol, M., Otte-Höller, I., van de Loo, R., Vogels, R., Bult, P., Wauters, C., Vreuls, W., Mol, S., Karssemeijer, N., et al.: Whole-slide mitosis detection in h&e breast histology using phh3 as a reference to train distilled stain-invariant convolutional networks. IEEE transactions on medical imaging 37(9), 2126–2136 (2018)
  • [14] Yang, L., Zhang, Y., Chen, J., Zhang, S., Chen, D.Z.: Suggestive annotation: A deep active learning framework for biomedical image segmentation. In: Medical Image Computing and Computer Assisted Intervention. pp. 399–407 (2017)