AMDet: A Tool for Mitotic Cell Detection in Histopathology Slides

by   Walt Williams, et al.

Breast Cancer is the most prevalent cancer in the world. The World Health Organization reports that the disease still affects a significant portion of the developing world citing increased mortality rates in the majority of low to middle income countries. The most popular protocol pathologists use for diagnosing breast cancer is the Nottingham grading system which grades the proliferation of tumors based on 3 major criteria, the most important of them being mitotic cell count. The way in which pathologists evaluate mitotic cell count is to subjectively and qualitatively analyze cells present in stained slides of tissue and make a decision on its mitotic state i.e. is it mitotic or not?This process is extremely inefficient and tiring for pathologists and so an efficient, accurate, and fully automated tool to aid with the diagnosis is extremely desirable. Fortunately, creating such a tool is made significantly easier with the AutoML tool available from Microsoft Azure, however to the best of our knowledge the AutoML tool has never been formally evaluated for use in mitotic cell detection in histopathology images. This paper serves as an evaluation of the AutoML tool for this purpose and will provide a first look on how the tool handles this challenging problem. All code is available at



page 2

page 3


Evaluation of Joint Multi-Instance Multi-Label Learning For Breast Cancer Diagnosis

Multi-instance multi-label (MIML) learning is a challenging problem in m...

Automatic Application Level Set Approach in Detection Calcifications in Mammographic Image

Breast cancer is considered as one of a major health problem that consti...

Ensemble classifier approach in breast cancer detection and malignancy grading- A review

The diagnosed cases of Breast cancer is increasing annually and unfortun...

Unsupervised Method to Localize Masses in Mammograms

Breast cancer is one of the most common and prevalent type of cancer tha...

Detection and classification of masses in mammographic images in a multi-kernel approach

According to the World Health Organization, breast cancer is the main ca...

Deep learning assessment of breast terminal duct lobular unit involution: towards automated prediction of breast cancer risk

Terminal ductal lobular unit (TDLU) involution is the regression of milk...

Automated Scoring System of HER2 in Pathological Images under the Microscope

Breast cancer is the most common cancer among women worldwide. The human...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Breast Cancer is the most prevalent cancer in the world and represents roughly 25% of all cancers diagnosed in women [1]. The CDC reports that the annual number of new breast cancer diagnoses in the United States has risen every year since 2011, and the trend is projected to sustain into the future [2]. Even with these disparities, the WHO has reported that current Breast Cancer treatments are highly effective at dealing with the cancer if spotted early, up to a 90% 5-year survival rate. One of the most common methods pathologists use to diagnose breast cancer is analyzing histopathology slides of breast tissue that have been stained with Hematoxylin and Eosin (H&E), 2 chemicals used to highlight structures within a cell [3]. When examining these H&E stained images the Nottingham grading system is often used to grade the aggressiveness of the cancer. In this system there are three primary markers used for grading: nuclear atypia, tubule formation, and the mitotic cell count. The mitotic cell count is the most important marker among them because it is directly related to the prognosis of tumors [3].

The analysis of histopathology slides by pathologists is often a slow, tiring, and extremely subjective process susceptible to high variability among different pathologists. To aid in this task several competitions have been conducted with the intention of developing tools capable of aiding pathologists with the analysis of histopathology slides. Deep Learning has emerged as a successful method for creating these tools seeing as the top winners for one of the prominent competitions, the ICPR 2014 competition, was created with the use of deep learning

[4]. In this paper we propose AutoML Mitosis Detector, or AMDet, a deep learning tool used to aid in the diagnosis of breast cancer from h&e stained images by drawing bounding boxes around non-mitotic and mitotic cells in the images. AMDet works by taking in a single histopathology slide, dividing the image into patches, running forward pass for each of the patches to aggregate predictions, then combining the patches back together to form the original slide and saving it to a specified directory. Figure 1 shows some examples of AMDet’s predictions.

Figure 1: Example detections using AMDet on images taken from the ICPR2014 Dataset. Blue boxes indicate mitotic cells and green boxes are non-mitotic cells.

The goal of this paper is to serve as an evaluation of the AutoML tool for the purpose of detecting small objects (cells) present in medical images. The rest of the paper is divided as followed: Section 2 will be a literature review discussing recently proposed pipelines for the detection of mitotic figures in histopathology slides. Section 3 will detail the specific methodology used in the creation and evaluation of our pipeline. Section 4 discusses the results of the experiments. Section 5 is a discussion on limitations of this work and directions for future research. Section 6 is the conclusion.

2 Related Work

Machine Learning has seen a surge in popularity for the analysis of histopathology slides in recent years [12, 13, 14, 15]

. Several models have been proposed in the literature for performing object detection, instance segmentation, and semantic segmentation in these slides with the aim to assist pathologists with their diagnoses. One of the most popular models used for semantic segmentation in medical images is the U-Net model. U-Net is a fully convolutional network (FCN) that features a down-sampling path that plays the role of a feature extractor. The low-dimensional feature map is then sent through an upsampling path that combines the low-dimensional feature maps with its corresponding high-dimensional representation in the down-sampling branch. The result is a segmentation map that classifies each pixel in the image as belonging to a particular class

[13]. It is a landmark study in the field of biomedical image segmentation and widely used. Another popular model proposed for nucleic segmentation is NuSet, short for Nuclear Segmentation Tool [12]. NuSet is a deep learning model used to segment crowded and overlapping cells in histopathology images. It does so with three major components, a region proposal network to generate potential regions-of-interest inside an image, a U-Net to provide segmentation maps, and a watershed algorithm that approximates the borders of touching cells in order to segment them. Graham et al. propose HoVer-Net, a model used for instance segmentation and classification of cells located in histopathology images [14]. Yancy proposes Multi-Stream Faster-RCNN, a model for detecting bounding boxes around mitotic cells. This model combines a segmentation map from a U-Net with an RGB image into 2 separate Faster-RCNN streams in order to add more context to the detections [15].

3 Methods

3.1 Network Architecture

AMDet is created from a Faster-RCNN model using a pre-trained residual network with 50 layers (resnet-50) trained as its feature extractor. Faster-RCNN is a deep learning model introduced by Ren et al. for the purpose of object detection in natural images. It involves 3 major components: a Region Proposal Network to locate sections of the image which may contain an object (region proposals), followed by region-of-interest (ROI) pooling layers, and then classification+regression layers to predict the bounding box and classify the object inside [11].

Figure 2: A set of images before and after being tranformed via Reinhard Normalization. The topmost image is the target which we will be trying to match in the rest of the images. The next 4 rows contain the original image followed by the image after normalization.

Faster-RCNN builds on previous work that proposed Regional-CNNs (R-CNNs) for the simultaneous localization and classification of objects inside images, also known as object detection [9, 10]

. One issue with the original R-CNN model was the time it took to generate predictions. This was largely a result of 2 main factors, a multi-stage pipeline leading to issues of repeated computations for all region proposals and the the use of selective search: an algorithm for proposing potential objects in an image in a heuristic manner. The issue of the multi-stage pipeline was largely alleviated with the work from the second version of R-CNN, called Fast-RCNN

[10]. Fast-RCNN replaced the 3-stage detection pipeline of the original R-CNN with a unified framework trained end-to-end with a multi-task loss. After this change, the last bottleneck in terms of computational time was the region proposal algorithm being used by Fast-RCNN, selective search. To help speed up the process Faster-RCNN introduces a Region Proposal Network (RPN) in order to extract candidate regions from an image and then share weights with the rest of the network thereby replacing the old and slow selective search algorithm. The RPN is a modified CNN designed to extract features from the original image and generate region proposals to be sent to the ROI pooling layers. It does this with the use of anchors, which are boxes of varying aspect ratio and size that are placed on the original image at a location relative to their point on the output feature map. These anchors are then graded by their likelihood of containing an object before being sent to the ROI pooling layers.

Once the candidate regions have been proposed, the next step is to prepare the ROIs for the pass into the fully connected dense layers for classification and regression. Dense layers depend on fixed-size feature vectors so the ROIs which come in varying shapes have to be resized to a fixed sized in order to be passed through. The ROI pooling layers resize the images by applying max pooling to the ROIs in order to reshape them into the correct size. When the images are correctly resized they are then sent to the final classification and regression layers to output the class and the bounding box coordinates for the object in the original image.

3.2 ICPR 2014 Mitosis Atypia

AMDet was trained using h&e stained slides collected from breast cancer tissue from the Mitos-Atypia-14 Grand challenge. The goal of the competition was to to create a model to classify instances of mitosis inside the slides based on morphological features. The original whole slide images were scanned with one of two slide scanners, the Aperio Scanscope XT and the Hamamatsu Nanozoomer 2.0-Ht. Pathologists then filtered out much of the unneeded information by only frames from the whole slide image that were located inside tumors. Afterwards the frames were subdivided into 4 slides at X40 magnification for grading of nuclear atypia. These X40 slides were labeled by 2 pathologists looking for individual instances of mitotic cells. When a cell was found, the center coordinate of that cell was used as the label along with a confidence score from 0.0 (non-mitosis) to 1.0 (true mitosis). When there was disagreement between the two pathologists on the state of a cell a third pathologist was made to examine the cell and the majority decision was adopted. In the end, the images that were used in the dataset were 1539×1376 pixels from the Aperio Scanscope XT and 1663x1485 from the Hamamatsu Nanozoomer scanner [4].

3.3 Data Pre-Processing

Some images in the original training set contained no instances of mitotic or non-mitotic cells in their labels so they were removed. In order to train a Faster-RCNN model the centroid labels given in the ground truth annotations were converted into 70x70 pixel bounding boxes. To speed up training the only data augmentation used in our pipeline was horizontal flipping. The data was then divided into a training set containing 1385 images and a validation set containing 347 images. During our initial training run there was an issue where the model was not producing bounding box proposals on the images. Our hypothesis was that the instances of cells in the images were too small for consistent detection by the model. To test out our theory we created 2 separate datasets made out of patches of the original images with varying patch sizes: 256x256 512x512, and 1024x1024 in order to get a larger representation of the cells in the images.

One major issue when it comes to quantitative analysis of histopathology slides is the variation in color as a result of the h&e staining. In our pipeline, color normalization via the Reinhard method was implemented to combat the issue of color variation between stains. Reinhard’s method involves matching the color distribution of one image to a target image with the use of a linear transformation in the

space defined in [7]. In our case , , and are the red, green, and blue color channels of our image respectively. Reinhard’s method was chosen because the transformation is simple and computationally cheap to implement via the following equations:


where , , and are the mean pixel activations of the RGB color channels respectively and , , and

are the standard deviations of each channel of the image

[5, 6]. Figure 2 shows an illustration of images that are transformed via the Reinhard method.

3.4 Training and Inference

The architecture used for training was a Faster-RCNN model using resnet-50 as the feature extractor. The initial learning rate used for our experiments was 0.005 with a cosine decay learning rate schedule implemented to combat overfitting. The optimizer used was Stochastic Gradient Descent with a momentum of 0.9. All models were trained for 25 epochs with early stopping implemented for 5 epochs to avoid overfitting.

4 Results

4.1 Mean Average Precision

The metric used for evaluation of our model was mean average precision (mAP). mAP is a commonly used metric in the field of object detection and has been used as the primary metric for evaluating models on major benchmark datasets like Microsoft COCO and Pascal VOC

[16]. The calculation of mAP involves 3 major steps. The first step is find the area under the precision recall curve for every class in a dataset for a specific Intersection Over Union (IoU) threshold. The next step is to average the areas for every class together to get the Average Precision (AP) for one IoU threshold. The third and final step is to average all the APs together for IoUs (referred to as mAP@[0.5, .95] for MS COCO at mAP@0.5 for Pascal VOC [11]) to finally arrive at the mean average precision [8].

Figure 3:

Chart detailing model performance with differing patch sizes as well as using Reinhard image normalization and the standard imagenet normalization included in the AutoML package.

4.2 Validation Results

After each model was trained it was evaluated on a validation dataset originally containing 357 images that were sliced into patches of the same size as the training patches. The mAP metric was evaluated using the standard pycocotools library to evaluate bounding box proposals. All the reported mAPs are using an IoU threshold of 0.5 similar to the Pascal VOC challenge. The reason for this choice is because the tool is designed to aid pathologists with localizing instances of mitotic cells, the predictions themselves do not need to be exact as long as the bounding box locates the approximate area of the cell for the pathologist to examine. The first model was trained with a patch size of 256x256 and no color normalization and achieved an mAP of 0.629. We then normalized all the images based on the pixel distribution from the first image in our training set and re-trained the model using these normalized images. This second model achieved an mAP of 0.528. Our next model was trained with patches that were 512x512 in size and the same process of training with unnormalized and normalized images was implemented once again. The mAPs were 0.329 and 0.411 for the unnormalized and normalized image models respectively. the final set of tests used patches of size 1024x1024 and achieved mAPs of 0.271 and 0.219 for the normalized and unnormalized images respectively. Figure 3 summarizes the results of our experiments with the different sets of hyperparameters used.

5 Discussion

The AutoML tool has performed well for the task of mitotic cell detection in histopathology images. It is apparent that performance seemed to decrease as the size of the individual patches increasd. One reason for this might the increasingly small representations of cells inside the images. It was brought up earlier in the paper that the AutoML tool would not produce any bounding box predictions for the original images due to the exceedingly small representations of the cells. This would help to explain why the model seemed to be performing worse as the patch sizes increased, the representation of cells inside the patches was becoming smaller and smaller thus harder to detect correctly.

5.1 Limitations and Future Work

Despite the satisfactory performance of the AMDet tool, there are some limitations in this work and directions for future research. For one, only 3 dimensions were chosen for the patches during training. A future study could conduct a more thorough investigation to see how the AutoML tool fares with several larger or smaller image dimensions. Another weakness of this study is the normalization method used. The Reinhard method has been shown to help alleviate the issue of color variation in stains [6]

however the method is over 2 decades old and there have been more recent advances in color normalization methods. Future work could examine the effect of using different color normalization methods in the AutoML package for mitotic cell detection. Furthermore, the AutoML tool contains standard ImageNet color normalization in the pipeline. This pre-built normalization was not removed during our experiments and seemed to have a negative effect on model performance as can be seen in Figure 3. Further studies can look at how the standard image normalization featured in the AutoML tool compares to different color normalization methods proposed.

6 Conclusion

In this work AMDet, an automated tool for mitotic cell detection, was created with the use of the AutoML pipeline. To the best of our knowledge this is the first time the tool has been formally tested for detecting small objects within histopathology images. All the code is available at