Deep-Learning for Classification of Colorectal Polyps on Whole-Slide Images

03/05/2017 ∙ by Bruno Korbar, et al. ∙ Dartmouth College 0

Histopathological characterization of colorectal polyps is an important principle for determining the risk of colorectal cancer and future rates of surveillance for patients. This characterization is time-intensive, requires years of specialized training, and suffers from significant inter-observer and intra-observer variability. In this work, we built an automatic image-understanding method that can accurately classify different types of colorectal polyps in whole-slide histology images to help pathologists with histopathological characterization and diagnosis of colorectal polyps. The proposed image-understanding method is based on deep-learning techniques, which rely on numerous levels of abstraction for data representation and have shown state-of-the-art results for various image analysis tasks. Our image-understanding method covers all five polyp types (hyperplastic polyp, sessile serrated polyp, traditional serrated adenoma, tubular adenoma, and tubulovillous/villous adenoma) that are included in the US multi-society task force guidelines for colorectal cancer risk assessment and surveillance, and encompasses the most common occurrences of colorectal polyps. Our evaluation on 239 independent test samples shows our proposed method can identify the types of colorectal polyps in whole-slide images with a high efficacy (accuracy: 93.0 in this paper can reduce the cognitive burden on pathologists and improve their accuracy and efficiency in histopathological characterization of colorectal polyps, and in subsequent risk assessment and follow-up recommendations.



There are no comments yet.


page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

At least half of Western adults will have a colorectal polyp in their lifetime and one-tenth of these polyps will progress to cancer (Wong et al., 2009). If colorectal polyps are detected early, they can be removed before they transform to cancer. While there are multiple screening methods to detect colorectal polyps, colonoscopy has become the most common screening test in the United States (Lieberman et al., 2012). In 2012, US multi-society task force on colorectal cancer issued updated guidelines on colorectal cancer surveillance after colonoscopy screening—a key principle of which is risk assessment and follow-up recommendation based on histopathological characterization of the detected polyps in the baseline colonoscopy. Therefore, detection and histopathological characterization of colorectal polyps are an important part of colorectal cancer screening, through which high-risk colorectal polyps are distinguished from low-risk polyps. The risk of subsequent polyps and colorectal cancer and the timing of follow-up colonoscopies depend on this characterization (Lieberman et al., 2012); however, accurate characterization of certain polyp types can be challenging and there is a large degree of variability for how pathologists characterize and diagnose these polyps. As an example, sessile serrated polyps can potentially develop more aggressively into colorectal cancer compared to other colorectal polyps, because of the serrated pathway in tumorigenesis (Leggett and Whitehall, 2010). The serrated pathway is associated with mutations in the BRAF or KRAS oncogenes, and CpG island methylation, which can lead to the silencing of mismatch repair genes (e.g., MLH1) and a more rapid progression to malignancy (Vu et al., 2011). Therefore, differentiating sessile serrated polyps from other types of polyps is critical for an appropriate surveillance (Biscotti et al., 2005). Histopathological characterization is the only reliable existing method for diagnosing sessile serrated polyps, because other screening methods designed to detect pre-malignant lesions (such as fecal blood, fecal DNA, or virtual colonoscopy) are not well suited for differentiating sessile serrated polyps from other polyps (Kahi, 2015). However, differentiation between sessile serrated polyps and innocuous hyperplastic polyps is a challenging task for pathologists (Vu et al., 2011; Aptoula et al., 2013; Irshad et al., 2014; Veta et al., 2015). This is because sessile serrated polyps, like hyperplastic polyps, often lack the dysplastic nuclear changes that characterize conventional adenomatous polyps, and their histopathological diagnosis is entirely based on morphological features, such as serration, dilatation, and branching. Accurate diagnosis of sessile serrated polyps and their differentiation from hyperplastic polyps is needed to ensure that patients receive appropriate/frequent follow-up surveillance, and to prevent the patients from being over-screened. However, in a recent colorectal cancer study, more than 7,000 patients underwent colonoscopy in 32 centers—ultimately, a sessile serrated polyp was not diagnosed in multiple centers despite the statistical unlikeliness of this outcome (Snover, 2011). This indicates there are still considerable gaps in the performance and education of pathologists regarding histologic features of colorectal polyps and their diagnostic accuracy (Abdeljawad et al., 2015).

In the past years, computational methods have been developed to assist pathologists in the analysis of microscopic images (Gurcan et al., 2009; Madabhushi and Lee, 2016; Naik et al., 2007). These image analysis methods primarily focus on basic structural segmentation (e.g., nuclear segmentation) (Nakhleh, 2006; Raab et al., 2005; Malkin, 1998)

and feature extraction (e.g., orientation, shape, and texture)

(Gil et al., 2002; Boucheron, 2008; Sertel et al., 2009; Doyle et al., 2007)

. In some methods, these extracted or hand-constructed features are used as an input to a standard machine-learning classification framework, such as a support vector machine

(Rajpoot and Rajpoot, 2004; Kallenbach-Thieltges et al., 2013)

or a random forest

(Sims et al., 2003), for automated tissue classification and disease grading.

In the field of artificial intelligence, deep-learning computational models, which are composed of multiple processing layers, can learn numerous levels of abstraction for data representation

(LeCun et al., 2015)

. These data abstractions have dramatically improved the state-of-the-art computer vision and visual object recognition applications, and, in some cases, even exceed human performance

(He et al., 2015b). Currently, deep-learning models are successfully utilized in autonomous mobile robots and self-driving cars (Farabet et al., 2012; Hadsell et al., 2009). The construction of deep-learning models only recently became practical due to large amounts of training data becoming available through the World Wide Web, public data repositories, and new high-performance computational capabilities that are mostly due to the new generation of graphics processing units (GPUs) needed to optimize these models (LeCun et al., 2015).

Recent work has proven the deep-learning approach to be superior for tasks of classification and segmentation on histology whole-slide images, compared to the previous image processing techniques (Xie et al., 2015; Sirinukunwattana et al., 2016; Janowczyk and Madabhushi, 2016). As examples, deep-learning models have been developed to detect metastatic breast cancer (Cruz-Roa et al., 2013), to find mitotically active cells (Ertosun and Rubin, 2015), to identify basal-cell carcinoma (Malon et al., 2013), and to grade brain gliomas (Wang et al., 2014) using H&E-stained images. Particularly, Sirinukunwattana et al. (Sirinukunwattana et al., 2015) presented a deep-learning approach for nucleus detection and classification in H&E-stained images of colorectal cancer. This model was based on a standard 8-layer convolutional network (Le Cun et al., 1990) to identify the centers of nuclei and classify them in four categories of epithelial, inflammatory, fibroblastic, and miscellaneous. Janowczyk et al. released a survey of the applications of deep learning in pathology, exploring domains such as lymphocyte detection, mitosis detection, invasive ductal carcinoma detection, and lymphoma classification (Janowczyk and Madabhushi, 2016)

. All models in the survey used the convolutional neural network proposed by Krizhevsky et al.

(Krizhevsky et al., 2012).

With the recent expansion in the use of whole-slide digital scanners, high-throughput tissue banks, and archiving of digitized histological studies, the field of digital pathology is ripe for development and application of computational models to assist pathologists in the histopathological analysis of microscopic images, disease diagnosis, and management of patients. Considering these recent advancements in computerized image understanding, and the critical need for computational tools to help pathologists with histopathological characterization and diagnosis of colorectal polyps for more efficient and accurate colorectal cancer screening, we propose a novel deep-learning-based approach for this task.

2 Materials and Methods

The whole-slide images require to develop and evaluate our method were collected from patients who underwent colorectal cancer screening at our academic quaternary care center. Our domain expert pathologist collaborators annotated different types of colorectal polyps in these images. We used these annotations as reference standards for training and testing our deep-learning methods for colorectal polyp classification on whole-slide images, as well as for establishing a deep-learning benchmark for this task. Defining a benchmark for deep-learning methods can provide a guideline for future clinical implementations, and can promote thorough understanding of understanding of an architecture as a critical factor in the deep-learning model’s performance.

2.1 Dataset

The data required for training and evaluating the proposed approach in this project is collected from Dartmouth-Hitchcock Medical Center (DHMC) patients who underwent colorectal cancer screening since 01/2010. The Department of Pathology and Laboratory Medicine at DHMC has instituted routine whole-slide scanning for slide archiving, employing three high-throughput Leica Aperio whole-slide scanners. These slides are digitized at magnification. Our histology imaging dataset includes H&E-stained, whole-slide images for five types of colorectal polyps: hyperplastic polyp, sessile serrated polyp, traditional serrated adenoma, tubular adenoma, and tubulovillous/villous adenoma. These five classes cover the most common occurrences of colorectal polyps, and encompass all polyp types that are included in the US multi-society task force guidelines for colorectal cancer risk assessment and surveillance (Lieberman et al., 2012). In addition, our dataset will include normal samples, which do not contain colorectal polyps, for our model training and evaluation. Figure 1 shows sample H&E-stained images from all colorectal polyp types that were collected in this project.

Figure 1: H&E-stained image samples with different histopathological characterizations for colorectal polyps: (A) hyperplastic, (B) sessile serrated, (C) traditional serrated, (D) tubular, (E) tubulovillous/villous, and (F) normal.

For this project, 1,723 whole-slide images have been collected through this collaboration with the Department of Pathology and Laboratory Medicine at DHMC. The number of collected images from each colorectal polyp type is presented in Table 2. We used 85% of the collected images in this dataset, as the training set, and evaluated its performance on the remaining 15% as the validation set. An additional 239 whole-slide images were collected after the training for final evaluation. The use of these data for this project is approved by the Dartmouth Institutional Review Board.

2.2 Image Annotation

High-resolution histology images for colorectal polyp samples are large—most of the slides encompass normal tissue and only a small part of a whole-slide image is actually related to the colorectal polyp. In this study, two collaborators, resident pathologists from the Department of Pathology and Laboratory medicine at DHMC, independently reviewed the whole-slide images in our training and test sets to identify the type of colorectal polyps in images, as reference standards. In addition, to train a classification model on colorectal polyp features in these slides, and as a preprocessing step, one of the pathologists outlined the regions in which the colorectal polyp was present and generated smaller crops focused on colorectal polyps. Extracting smaller crops for training deep-learning classifiers has shown superior performance in previous histopathology analysis applications (Bengio, 2009). A second, highly experienced pathologist also reviewed the whole-slide images and their associated extracted crops. The disagreements in classifying and cropping the images were resolved through further discussions between the annotator pathologists and through consultation with a third, senior gastrointestinal pathologist collaborator. To ensure the accuracy of these manual annotations and resulting image crops, when an agreement could not be reached on a polyp type or cropping for an image, that image was discarded and replaced by a new image.

2.3 Training Architecture and Framework

Deep-learning is strongly rooted in previously existing artificial neural networks (LeCun et al., 2015), although the construction of deep-learning models only recently became practical due to the availability of large amounts of training data and new high-performance GPU computational capabilities designed to optimize these models (LeCun et al., 2015). Krizhevsky et. al. developed a deep learning model (Krizhevsky et al., 2012) based on convolutional neural networks (ConvNets) (Le Cun et al., 1990) that significantly improved the image classification results and reduced the error rate about 10% compared to the best non-deep-learning methods’ performance in computer vision at the time. Since then, various deep-learning methods have been developed and have improved the models’ performance even further.

While it has been shown that the increase in depth would yield superior results (Simonyan and Zisserman, 2014), the state-of-the-art deep-learning models were unable to take advantage of this increase, beyond 50 layers (Szegedy et al., 2015; Simonyan and Zisserman, 2014). This was because of a fundamental problem with propagating gradients for optimizing networks with large number of layers, which is commonly known as the vanishing gradient problem (Simonyan et al., 2013; He et al., 2015a)

. Therein, beyond a moderate number of layers, the models experience performance degradation according to the degree of increase in the number of layers in previous architectures. In 2015 Microsoft introduced “residual architecture” (ResNet), which addressed the vanishing gradient problem. Upon its introduction, ResNet outperformed previous architectures by significant margins in all main tracks of the ImageNet computer vision competition, including object detection, classification, and localization

(He et al., 2015a), and allowed for up to 152 layers before experiencing the performance degradation. To empirically support our choice of architecture, we conducted an ablation study on top performing deep-learning architectures (Russakovsky et al., 2015), such as AlexNet (Krizhevsky et al., 2012), VGG (Simonyan and Zisserman, 2014), GoogleNet (Szegedy et al., 2015), and different variations of ResNet (He et al., 2015a). Results of this comparison can be found in Table 1 in the Results section.

For our approach, we have adopted a modified version of a residual architecture, as this approach yielded state-of-the art performance in both image recognition benchmarks, ImageNet (Russakovsky et al., 2015), COCO (Lin et al., 2014), as well as in image segmentation benchmarks, COCO-segmentation (Lin et al., 2014). We implement ResNet as a standard neural network, consisting of and convolution filters, and introduced additional mappings or shortcuts that bypass several convolutional layers. Inputs from these additional mappings were then added with the output of the previous layer to form a residual input to the next layer such as in Figure 2. Introduction of these shortcuts almost completely eliminates the vanishing gradient problem, which in term allows for greater depth of the neural networks while keeping the computational complexity at a manageable level due to relatively small convolutional filters. In addition to the identity mappings, we experimented with “projection shortcuts” (done by convolution) when dimensions of the shortcuts did not match the dimensions of the preceding layer in order to achieve the best performance in our study (He et al., 2015a).

Figure 2: The mechanism of a sample residual black in the ResNet architecture (He et al., 2015a)

2.4 Training

To verify our architecture choice in this work, we further separated 15% of the formerly mentioned training data as the hold-out validation set to run an ablation study on various deep-learning architectures. After finding the optimal architecture on this validation set, training was repeated on the entire augmented training set. Finally, we evaluated the trained model on our test set.

Our deep-learning classification model is trained for detecting colorectal polyps in small patches of H&E-stained, whole-slide images. Each crop is processed as follows.

We first rescale the data to conform to the median of the dimensions along x and y axes computed on a random subset of images. This random subset was confined to 15% of our training set for computational efficiency. If the image size along any dimensions was below the median, we use zero-padding to make it conform to the aspect ratio. We normalize each image using mean and standard deviation computed on training data in order to neutralize color differences caused by inconsistent staining of the slides. For color jittering data augmentation, we compute PCA for all points of a subset of training data, sample the offset along principal components, and add it to all pixels of each image. Finally, we rotate each image by 90 degrees to enforce rotational invariance, and flip a randomly-selected 50% of the images along the horizontal axis.

We trained the optimal model for 200 epochs on the augmented training set, with initial learning rate of 0.1, decreasing it 0.1 times each 50 epochs, and 0.9 momentum. Overall training time for different architectures took 36 hours on a single NVIDIA K40c GPU. Figure


shows the value of the loss function on the training and validation sets for training a ResNet model with 152 layers. As can be seen in this figure, the model converges early in the training process near 50th epoch.

Figure 3: Training loss per iteration for 152 layer ResNet model on training and validation sets.

2.5 Inferencing Classes for Whole-Slide Images

As mentioned in the Training section, our deep-learning classification model is trained for detecting colorectal polyps in small patches of H&E-stained, whole-slide images. To identify the colorectal polyps and their types in whole-slide images by our deep-learning model, we break the whole-slide images into smaller, overlapping patches and apply the model on these patches. Figure 4 shows the overview of our approach for whole-slide image classification. In this work, we use overlapping patches enforcing one-third (i.e. 33%) overlap to cover the full image. In order to extract coherent patches with image crops used for training, the size of these patches is fixed at the median size of a random 15% subset of the image crops from our training set. Our system infers the type of colorectal polyp in the whole-slide image based on the most common colorectal polyp class among the associated patches for a whole-slide image. In addition, to reduce the noise and increase the confidence of our results, we only associate a class to a whole-slide image if at least a minimum of 5 patches are identified as that class, with 70% average confidence. If there is no support for any of the colorectal polyp types among the patches, the whole-slide image is classified as “normal”.

Figure 4: Overview of our approach for classification of colorectal polyps in whole-slide, H&E-stained images.

2.6 Evaluation

At training time, we evaluated our models using a validation set of images cropped as described in the section 2.1

. Based on these results we could evaluate the per-crop accuracy in order to understand and address potential pitfalls and inter-class confusion. For the evaluation of the final model, we applied our proposed inference mechanism on whole-slide images in the test set. In this evaluation, we measure the standard machine-learning evaluation metrics of accuracy, sensitivity (recall), specificity, positive predictive value (precision), negative predictive value, and F1 score for our method

(Powers, 2011)

. In addition, we calculate 95% confidence intervals for all of the performance metrics in this evaluation through the Clopper-Pearson method

(Clopper and Pearson, 1934).

3 Results


Table 1: Results of ablation test on raw image crops over 50 epochs for selecting the best deep-neural network architecture.
Architecture Number of layers Accuracy 95% confidence interval Evaluation time in seconds
AlexNet(Krizhevsky et al., 2012) 8 71.8% (65.4% - 77.6%) 2.5
VGG(Simonyan et al., 2013) 19 76.4% (70.2% - 81.8%) 3.0
GoogleNet (Szegedy et al., 2015) 22 88.7% (83.8% - 92.5%) 2.4
ResNet-A (He et al., 2015a) 50 81.2% (75.4% - 86.1%) 2.2
ResNet-B (He et al., 2015a) 101 82.7% (77.1% - 87.4%) 2.6
ResNet-C 111152 layer ResNet with identity mappings (He et al., 2015a) 152 87.1% (82.0% - 91.2%) 3.1
ResNet-D 222152 layer ResNet with projection mappings (He et al., 2015a) 152 89.0% (84.1% - 92.8%) 3.1


Table 2: Results of our best model (ResNet-D) for classification of colorectal polyps in cropped histology images based on validation data.
Colorectal polyp type # Cases in the test set Accuracy 95% confidence interval
Hyperplastic polyp 34 86.9% (81.5% - 91.3%)
Sessile serrated polyp 33 87.4% (82.0% - 91.7%)
Traditional serrated adenoma 38 91.5% (86.7% - 94.9%)
Tubular adenoma 35 94.5% (90.4% - 97.2%)
Tubulovillous/villous adenoma 29 91.5% (86.7% - 94.9%)
Normal 30 96.0% (92.3% - 98.2%)
Total 199 91.3% (86.5% - 94.8%)


Table 3: Results of our final model for classification of colorectal polyps in 239 whole-slide images in our test set (HP: hyperplastic polyp, SSP: sessile serrated polyp, TSA: traditional serrated adenoma, TA: tubular adenoma, and TVA/V: tubulovillous/villous adenoma).
HP           (N = 37) SSP          (N = 39) TSA (N=38) TA           (N=39) TVA/V (N=38) Normal (N=48) Total (N=239)
Accuracy 89.8% (85.3%-93.3%) 89.5% (85.0%-93.1%) 94.7% (91.1%-97.2%) 93.1% (89.2%-96.0%) 95.8% (92.5%-97.9%) 95.0% (91.5%-97.4%) 93.0% (89.0%-95.9%)
Precision 90.9% (86.6%-94.2%) 86.11% (81.1%-90.2%) 100.0% (98.5%-100.0%) 83.3% (78.0%-87.8%) 97.2% (94.3%-98.9%) 80.7% (75.1%-85.5%) 89.7% (85.2%-93.2%)
Recall 81.1% (75.5%-85.8%) 81.6% (76.1%-86.3%) 89.5% (84.9%-93.0%) 89.7% (85.2%-93.3%) 92.1% (88.0%-95.2%) 95.8% (92.5%-98.0%) 88.3% (83.6%-92.1%)
F1 Score 85.7% (80.6%-89.9%) 83.8% (78.5%-88.2%) 94.4% (90.8%-97.0%) 86.4% (81.4%-90.5%) 94.6% (90.9%-97.1%) 87.6% (82.8%-91.5%) 88.8% (84.1%-92.5%)


Table 4:Confusion matrix of our final model for classification of colorectal polyps in 239 whole-slide images on our test set (HP: hyperplastic polyp, SSP: sessile serrated polyp, TSA: traditional serrated adenoma, TA: tubular adenoma, and TVA/V: tubulovillous/villous adenoma).
[width=8em]PredictionReference HP SSP TSA TA TVA/V Normal
HP 30 3 0 0 0 0
SSP 5 31 0 0 0 0
TSA 0 0 34 0 0 0
TA 0 0 2 35 3 2
TVA/V 0 0 0 1 35 0
Normal 2 4 2 3 0 46

4 Discussion

In this work, we presented an automated system to facilitate the histopathological characterization of colorectal polyps on H&E-stained, whole-slide images with high sensitivity and specificity. Our evaluation shows that our system can accurately differentiate high-risk polyps from both low-risk colorectal polyps and normal cases by identifying the corresponding colorectal polyp types, such as hyperplastic, sessile serrated, traditional serrated, tubular, and tubulovillous/villous, on H&E-stained, whole-slide images. These polyp types are the focus of and major criteria in the US multi-society task force guidelines for colorectal cancer surveillance and cover most colorectal polyp occurrences (Lieberman et al., 2012). This project is inspired in part by the use of image analysis software in Papanicolaou (Pap) smear screening (Biscotti et al., 2005) for cervical cancer. In past years, the automation of Pap smear screening has dramatically improved the diagnostic accuracy and screening productivity, and helped to reduce the incidence of cervical cancer and mortality among American women (Biscotti et al., 2005). Our proposed system can potentially achieve a similar impact on colorectal cancer screening, as colorectal cancer is the second leading cause of cancer death among both men and women in the United States (Society, 2016), and colorectal polyps are the most common findings during colorectal cancer screening (Lieberman et al., 2012).

Our proposed automatic image understanding system can potentially reduce the time needed for screening analysis, diagnosis, and prognosis; reduce the manual burden on clinicians and pathologists; and significantly reduce the potential errors arising from the histopathological characterization of colorectal polyps for the subsequent risk assessment and follow-up recommendations. By combining the outcomes of our proposed system with pathologists’ interpretations, this technology will be able to significantly improve the accuracy of diagnoses and prognoses, and therefore foster precision medicine. Along those lines, this project will provide a platform for improved quality assurance of colorectal cancer screening and understanding of common error patterns to improve clinical training. In the clinical setting, the implementation of our approach will enhance the accuracy of colorectal cancer screening, reduce the cognitive burden on pathologists, positively impact patient health outcomes, and reduce colorectal cancer mortality by fostering early preventive measures. Improvement in the efficiency of colorectal cancer screening will result in a reduction in screening costs, an increase in the coverage of screening programs, and an overall improvement in public health.

This project leverages ResNet architecture (He et al., 2015a), a new deep-learning paradigm, to address the “vanishing gradient” problem in model training. This architecture enables the development of ultra-deep models with superior accuracy for characterization of histology images in comparison to existing approaches. Our ablation test results confirm (Table 1) the superiority of ResNet deep-neural network architecture with 152 layers for our classification task in comparison to other common architectures such as AlexNet (Krizhevsky et al., 2012), VGG (Simonyan et al., 2013), and GoogleNet (Szegedy et al., 2015). Although this best performing ResNet model has significantly more layers than other architectures in this comparison, its evaluation time (3.1 seconds) is close to the other models in a practical range. This small evaluation time difference is due to relatively simple computational layers in ResNet architecture. In addition, as can be seen in Table 2, data augmentation has a positive impact on the accuracy of our classification results.

We evaluated our ResNet-based, whole-slide inference model for colorectal polyp classification on 239 independent whole-slide, H&E-stained images. These results are presented in Tables 3 and 4. As we can see in these tables, our whole-slide inferencing approach demonstrates a strong performance across different classes, with an over all accuracy of 93.0%, an over all precision of 89.7%, an over all recall of 88.3%, and an over all F1 score of 88.8%. As can be seen in the presented confusion matrix (Table 4), in this evaluation we observed a tendency to classify low-confidence examples as normal. This may be due to the diversity of whole-slide images that are considered to be normal in our training set. Furthermore, we can see that differentiation between hyperplastic polyps and sessile serrated polyp is another major source of mistakes for our model, which is aligned with gastrointestinal pathologists’ experience in this task (Vu et al., 2011; Irshad et al., 2014; Veta et al., 2015).

Although our proposed histopathology characterization system is based on strong deep-learning methodology, and achieved a strong performance in our evaluation on the test set collected at our organization, we still plan to take additional steps to improve our evaluation and results. One possible improvement could be a further increase in our architecture’s number of layers, which requires collecting a larger training set. To this end, through a collaboration with the New Hampshire Colonoscopy Registry (NHCR), we are planning to apply and evaluate the proposed method on an additional dataset from patients across New Hampshire for the external validation of our approach.

One shortcoming of our system for histopathological characterization, and deep-learning models in general, is the “black box” approach to the outcomes. These image analysis methods are mostly focused on the efficacy of the final results and rarely provide sufficient evidence and details on factors that contribute to their outcomes. As future work, we aim to leverage visualization methods for deep learning models to tackle this problem. These visualization methods will provide insight about influential regions and features of a whole-slide image that contribute to the histopathological characterization results. This visualization will help pathologists verify the characterization results of our method and understand the underlying reasoning for a specific classification.

Our proposed method to characterize colorectal polyps in whole-slide images can be extended to other histopathology analyses and prognosis assessment problems outside of colorectal cancer screening. The proposed method for whole-slide, H&E-stained histopathology analysis builds an illustrative “showcase” for colorectal cancer screening. As future work, we plan to build training sets for other challenging histopathology characterization problems and extend the developed deep-learning image analysis framework to histopathology image analysis and assessment in other types of cancer, such as melanoma, glioma/glioblastoma, and breast carcinoma.

5 Conclusion

In this paper, we presented an image understanding system to assist pathologists in characterization of colorectal polyps on H&E-stained, whole-slide images. This system was based on state-of-the-art, deep-neural network architecture to identify the types of colorectal polyps in whole-slide, H&E-stained images. We evaluated our developed system on 239 H&E-stained, whole-slide images for detection of five colorectal polyp classes outlined by the US multi-society task force guidelines for colorectal cancer risk assessment and surveillance. Our results (Accuracy: 93.0%, Precision: 89.7%, Recall: 88.3%, F1 Score: 88.8%) show the efficacy of our approach for this task. The technology developed and tested in this work has a great potential to be highly impactful by serving as a low-burden, efficient, and accurate diagnosis and assessment tool for colorectal polyps. Therefore, the outcomes of this project can potentially increase the coverage and accuracy of colorectal cancer screening programs, and overall reduce colorectal cancer mortality.

6 Acknowledgments

We would like to thank Haris Baig and Du Tran from Visual Learning Group at Dartmouth College for helpful discussions.



  • Abdeljawad et al. (2015) Abdeljawad, K., Vemulapalli, K.C., Kahi, C.J., Cummings, O.W., Snover, D.C., Rex, D.K., 2015. Sessile serrated polyp prevalence determined by a colonoscopist with a high lesion detection rate and an experienced pathologist. Gastrointestinal endoscopy 81, 517–524.
  • Aptoula et al. (2013) Aptoula, E., Courty, N., Lefèvre, S., 2013. Mitosis detection in breast cancer histological images with mathematical morphology, in: Signal Processing and Communications Applications Conference (SIU), 2013 21st, IEEE. pp. 1–4.
  • Bengio (2009) Bengio, Y., 2009. Learning deep architectures for ai. Foundations and trends® in Machine Learning 2, 1–127.
  • Biscotti et al. (2005) Biscotti, C.V., Dawson, A.E., Dziura, B., Galup, L., Darragh, T., Rahemtulla, A., Wills-Frank, L., 2005. Assisted primary screening using the automated thinprep imaging system. American journal of clinical pathology 123, 281–287.
  • Boucheron (2008) Boucheron, L.E., 2008. Object-and spatial-level quantitative analysis of multispectral histopathology images for detection and characterization of cancer. University of California at Santa Barbara.
  • Clopper and Pearson (1934) Clopper, C.J., Pearson, E.S., 1934. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26, 404–413.
  • Cruz-Roa et al. (2013) Cruz-Roa, A.A., Ovalle, J.E.A., Madabhushi, A., Osorio, F.A.G., 2013. A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 403–410.
  • Doyle et al. (2007) Doyle, S., Hwang, M., Shah, K., Madabhushi, A., Feldman, M., Tomaszeweski, J., 2007. Automated grading of prostate cancer using architectural and textural image features, in: 2007 4th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, IEEE. pp. 1284–1287.
  • Ertosun and Rubin (2015) Ertosun, M.G., Rubin, D.L., 2015. Automated grading of gliomas using deep learning in digital pathology images: A modular approach with ensemble of convolutional neural networks, in: AMIA Annual Symposium Proceedings, American Medical Informatics Association. p. 1899.
  • Farabet et al. (2012) Farabet, C., Couprie, C., Najman, L., LeCun, Y., 2012. Scene parsing with multiscale feature learning, purity trees, and optimal covers. arXiv preprint arXiv:1202.2160 .
  • Gil et al. (2002) Gil, J., Wu, H., Wang, B.Y., 2002. Image analysis and morphometry in the diagnosis of breast cancer. Microscopy research and technique 59, 109–118.
  • Gurcan et al. (2009) Gurcan, M.N., Boucheron, L.E., Can, A., Madabhushi, A., Rajpoot, N.M., Yener, B., 2009. Histopathological image analysis: A review. IEEE reviews in biomedical engineering 2, 147–171.
  • Hadsell et al. (2009) Hadsell, R., Sermanet, P., Ben, J., Erkan, A., Scoffier, M., Kavukcuoglu, K., Muller, U., LeCun, Y., 2009. Learning long-range vision for autonomous off-road driving. Journal of Field Robotics 26, 120–144.
  • He et al. (2015a) He, K., Zhang, X., Ren, S., Sun, J., 2015a. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 .
  • He et al. (2015b) He, K., Zhang, X., Ren, S., Sun, J., 2015b. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034.
  • Irshad et al. (2014) Irshad, H., Veillard, A., Roux, L., Racoceanu, D., 2014. Methods for nuclei detection, segmentation, and classification in digital histopathology: a review—current status and future potential. IEEE reviews in biomedical engineering 7, 97–114.
  • Janowczyk and Madabhushi (2016) Janowczyk, A., Madabhushi, A., 2016. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. Journal of Pathology Informatics 7, 29. URL:;year=2016;volume=7;issue=1;spage=29;epage=29;aulast=Janowczyk;t=6, doi:10.4103/2153-3539.186902, arXiv:;year=2016;volume=7;issue=1;spage=29;epage=29;aulast=Janowczyk;t=6.
  • Kahi (2015) Kahi, C.J., 2015. How does the serrated polyp pathway alter crc screening and surveillance? Digestive diseases and sciences 60, 773--780.
  • Kallenbach-Thieltges et al. (2013) Kallenbach-Thieltges, A., Großerüschkamp, F., Mosig, A., Diem, M., Tannapfel, A., Gerwert, K., 2013. Immunohistochemistry, histopathology and infrared spectral histopathology of colon cancer tissue sections. J Biophotonics 6, 88--100.
  • Krizhevsky et al. (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, pp. 1097--1105.
  • Le Cun et al. (1990) Le Cun, B.B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D., 1990. Handwritten digit recognition with a back-propagation network, in: Advances in neural information processing systems, Citeseer.
  • LeCun et al. (2015) LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521, 436--444.
  • Leggett and Whitehall (2010) Leggett, B., Whitehall, V., 2010. Role of the serrated pathway in colorectal cancer pathogenesis. Gastroenterology 138, 2088--2100.
  • Lieberman et al. (2012) Lieberman, D.A., Rex, D.K., Winawer, S.J., Giardiello, F.M., Johnson, D.A., Levin, T.R., 2012. Guidelines for colonoscopy surveillance after screening and polypectomy: a consensus update by the us multi-society task force on colorectal cancer. Gastroenterology 143, 844--857.
  • Lin et al. (2014) Lin, T., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L., 2014. Microsoft COCO: common objects in context. CoRR abs/1405.0312. URL:
  • Madabhushi and Lee (2016) Madabhushi, A., Lee, G., 2016. Image analysis and machine learning in digital pathology: Challenges and opportunities. Medical Image Analysis 33, 170--175.
  • Malkin (1998) Malkin, H.M., 1998. Comparison of the use of the microscope in pathology in germany and the united states during the nineteenth century. Annals of diagnostic pathology 2, 79--88.
  • Malon et al. (2013) Malon, C.D., Cosatto, E., et al., 2013. Classification of mitotic figures with convolutional neural networks and seeded blob features. Journal of pathology informatics 4, 9.
  • Naik et al. (2007) Naik, S., Doyle, S., Feldman, M., Tomaszewski, J., Madabhushi, A., 2007. Gland segmentation and computerized gleason grading of prostate histology by integrating low-, high-level and domain specific information, in: MIAAB workshop, Citeseer. pp. 1--8.
  • Nakhleh (2006) Nakhleh, R.E., 2006. Error reduction in surgical pathology. Archives of pathology & laboratory medicine 130, 630--632.
  • Powers (2011) Powers, D.M.W., 2011. Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. International Journal of Machine Learning Technology 2, 37--63.
  • Raab et al. (2005) Raab, S.S., Grzybicki, D.M., Janosky, J.E., Zarbo, R.J., Meier, F.A., Jensen, C., Geyer, S.J., 2005. Clinical impact and frequency of anatomic pathology errors in cancer diagnoses. Cancer 104, 2205--2213.
  • Rajpoot and Rajpoot (2004) Rajpoot, K., Rajpoot, N., 2004. Svm optimization for hyperspectral colon tissue cell classification, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 829--837.
  • Russakovsky et al. (2015) Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al., 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 211--252.
  • Sertel et al. (2009) Sertel, O., Kong, J., Catalyurek, U.V., Lozanski, G., Saltz, J.H., Gurcan, M.N., 2009. Histopathological image analysis using model-based intermediate representations and color texture: Follicular lymphoma grading. Journal of Signal Processing Systems 55, 169--183.
  • Simonyan et al. (2013) Simonyan, K., Vedaldi, A., Zisserman, A., 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 .
  • Simonyan and Zisserman (2014) Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556. URL:
  • Sims et al. (2003) Sims, A., Bennett, M., Murray, A., 2003. Image analysis can be used to detect spatial changes in the histopathology of pancreatic tumours. Physics in medicine and biology 48, N183.
  • Sirinukunwattana et al. (2016) Sirinukunwattana, K., Raza, S.E.A., Tsang, Y.W., Snead, D.R., Cree, I.A., Rajpoot, N.M., 2016. Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE transactions on medical imaging 35, 1196--1206.
  • Sirinukunwattana et al. (2015) Sirinukunwattana, K., Snead, D.R., Rajpoot, N.M., 2015. A stochastic polygons model for glandular structures in colon histology images. IEEE transactions on medical imaging 34, 2366--2378.
  • Snover (2011) Snover, D.C., 2011. Update on the serrated pathway to colorectal carcinoma. Human pathology 42, 1--10.
  • Society (2016) Society, A.C., 2016. American cancer society: Cancer facts and figures 2016. atlanta.
  • Szegedy et al. (2015) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., 2015.

    Going deeper with convolutions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1--9.

  • Veta et al. (2015) Veta, M., Van Diest, P.J., Willems, S.M., Wang, H., Madabhushi, A., Cruz-Roa, A., Gonzalez, F., Larsen, A.B., Vestergaard, J.S., Dahl, A.B., et al., 2015. Assessment of algorithms for mitosis detection in breast cancer histopathology images. Medical image analysis 20, 237--248.
  • Vu et al. (2011) Vu, H.T., Lopez, R., Bennett, A., Burke, C.A., 2011. Individuals with sessile serrated polyps express an aggressive colorectal phenotype. Diseases of the Colon & Rectum 54, 1216--1223.
  • Wang et al. (2014) Wang, H., Cruz-Roa, A., Basavanhally, A., Gilmore, H., Shih, N., Feldman, M., Tomaszewski, J., Gonzalez, F., Madabhushi, A., 2014. Cascaded ensemble of convolutional neural networks and handcrafted features for mitosis detection, in: SPIE Medical Imaging, International Society for Optics and Photonics. pp. 90410B--90410B.
  • Wong et al. (2009) Wong, N.A., Hunt, L.P., Novelli, M.R., Shepherd, N.A., Warren, B.F., 2009. Observer agreement in the diagnosis of serrated polyps of the large bowel. Histopathology 55, 63--66.
  • Xie et al. (2015) Xie, Y., Kong, X., Xing, F., Liu, F., Su, H., Yang, L., 2015. Deep voting: A robust approach toward nucleus localization in microscopy images, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 374--382.