Pathology has played an essential role in diagnosing gastrointestinal disorders. However, errors can occur due to complex systems, time constraints and variable inputs. This can be further complicated when the biopsy images share histological features. Computational methods have the potential to address these challenges. In light of this, developing assistive computational methods can help mitigate said errors. The goal of applying computational methods in identifying diseases is for developing fast, reproducible and reasonably accurate methods that can be easily standardized.
. Convolutional Neural Networks (CNN), a type of deep learning architecture, are particularly suited for distinguishing features in biopsy images. Past work in this area involves using CNN to detect cancer metastases in high resolution biopsy images.
Applying CNNs to high resolution biopsy images from gastrointestinal patients may distinguish features in diseased tissues, specifically Celiac Disease and Environmental Enteropathy. These diseases have significantly overlapping features, which makes differentiating between the two particularly difficult
. CNNs learn from different areas of an image, look for similar patterns in new images and classify them based on feature similarity. Our hypothesis is that a CNN will find differences in histologically similar tissues that are sometimes indistinguishable under a microscopic lens. We present in this paper a viable deep learning framework to classify duodenal biopsy images into Celiac Disease, Environmental Enteropathy or Normal tissues.
Images were extracted from 465 high resolution whole slide images (WSIs) taken from 150 H&E duodenal biopsy slides (Refer Table I). The biopsies were from children who underwent endoscopy procedures at either Aga Khan University Hospital in Pakistan (10 children <2 years with growth faltering, EE diagnosed on endoscopy, n = 34 WSI), University Teaching Hospital in Zambia (16 children with severe acute malnutrition, EE diagnosed on endoscopy, n = 19 WSI), or the University of Virginia Children’s Hospital (63 children <18 years old with CD, n = 236 WSI; and 61 healthy children <5 years old, n = 173 WSI).
Iii-a Patch Creation
We had access to a small number of patient biopsies which were however of very high resolution and large file size. Therefore the high resolution image slides were split into patches of 10001000 pixels and 20002000 pixels with an overlap of 750 pixels and 1000 pixels respectively in both horizontal and vertical axes to make sure that the patches represented patterns from the slide exhaustively. We discarded patches that contained less than 50% tissue area. The patches were then resized to 256256 pixels for feeding into the CNN model. Each slide generated an average of about 250 10001000 patches and about 40 20002000 patches per slide. Since Celiac Disease had the most slides and generated more patches than others, the Environmental Enteropathy and Normal patches were up-sampled by an appropriate factor to balance the data and avoid bias in the model.
Iii-B Stain Normalization
There were visible variations in the color of the images due to differences in scanners, staining chemicals used while preparing the slide, and staining methods used across pathology labs. When these images were analyzed using deep learning techniques, it led to erroneous results that were not based on the features of the image but on color difference. To address this issue we applied Structure Preserving Color Normalization described by Vahadane et. al. using a empirically chosen target image from the EE dataset to normalize all our images. The results are visualized in Fig. 1.
Iii-C Image Pre-processing and Augmentation
We performed extensive data augmentations to prevent overfitting the model. As histopathology images exhibit both rotational and axial symmetry, 4 copies of each image patch was created using a random combination of rotation (90, 180, 270 or 360 degree angle), mirroring and zoom (between 1 and 1.1).
Iii-D Classification model
We utilized a ResNet50 architecture to classify our patches as it has been shown to work well on computer assisted diagnosis of breast cancer14].
Since different layers capture different information, we use discriminative fine tuning as described by Howard and Ruder . The layers closer to the input were more likely to have learned more general features, while the later layers identified more abstract features depending on the dataset the model has been trained on. Therefore, the learning rate used while training the initial layers was 1/9th of the rate of the final layers, while the middle layers used a learning rate 1/3rd the rate of the final layers.
Furthermore, we used cyclic cosine annealing with restarts to prevent the model getting stuck in a local minima while training
. By lowering the learning rate periodically, we ensured the model does not overshoot the global minima. We intermittently reset the learning rate by starting with a larger value so that it can move out of the local minima, if stuck, and reach the global optimum. By restarting, we also eliminate the need to experimentally estimate the value of our learning rate.
Test Time Augmentation (TTA) was performed while making final predictions to ensure the predictions are insensitive to image orientation. TTA randomly performs augmentations (zoom, tilt, brightness) on the images during prediction, thus allowing the model to identify common patterns at a micro level with little regard for image orientation. The model was trained over 10 epochs with a batch size of 32.
Iii-E Multi-zoom Approach
To create a more robust framework that looks more holistically at biopsy slides and mimics the decision making process of a pathologist, we developed a deep learning architecture made classifications based on information from multiple magnification levels of the biopsy slide. Every biopsy slide was first segmented into 20002000 patches as highlighted and each of these patches were further segmented into patches of pixel size 10001000 with an overlap of 750 pixels in both axes. After preprocessing of all these patches as previously outlined, color normalization was performed, after which two independent ResNet50 models were trained on these sets of 20002000 and 10001000 patches using the strategies described in the previous section. Additionally, all the 10001000 patches were paired with their parent 2000
2000 patches. Each pair was then passed through the corresponding trained ResNet50 model and the last fully connected layer with 2048 features from the respective models was extracted and concatenated together to give a total of 4096 features that represented the two magnification levels of a given area of the same biopsy slide. This concatenated vector gave an abstract representation of the image that was then passed through a set of trainable linear layers to make the final classification. Contrary to our expectations, we observed that this multi-scale approach provided little performance benefit weighed against the computational complexity introduced.
Iv Results and Analysis
We utilized patches from 367 labelled slides for training our model which reported an overall accuracy of 88.89% on 10001000 patches and 86.82% on 20002000 patches. The predictions on the patches were then aggregated to identify the classification for their parent whole slide images. The model exhibited an overall accuracy of 92.86% on the unlabeled 98 slides. Table II shows the metrics used to assess the performance of the model on the test set. Fig 3 shows that the model achieved exceptional certainty in identifying Environmental Enteropathy while giving an overall macro-average AUC of 0.99.
|avg / total||0.93||0.92||0.92||98|
|WSI level accuracy.|
Iv-B Interpreting the model
It is important for us to interpret the CNN activation areas in our model for bench-marking our results with their manual counterparts. In the domain of medical imaging it is of utmost importance that we utilize methods to explain their classification result. Furthermore, visualizing activation areas allows for domain experts to corroborate the model results with incumbent classification methods. Gradient-weighted Class Activation Mapping (Grad-CAM) method produces a localization map highlighting specific areas of the image that the model deems most important while making the classification decision. We implemented the methods for a residual network architecture by extracting the activation values from an intermediate convolution layer and using them to generate a heatmap identifying the important areas of the image. Viewing the images through the Grad-CAM’s lens allowed us, and domain experts, to confirm if the model was classifying the images based on real pathological features that make medical sense or image artifacts. Figure 3 displays a few example Grad-CAM outputs that were generated.
Iv-C Visualizing Layers
The classification was done by the CNN model by identifying patterns in the image that were associated with a certain class. Analyzing patterns that the model was looking for in the image could be useful to make sure that the image artefacts linked with a disease are being considered by the model. Therefore to understand the internal process of the designed architecture and analyze if the model was identifying all relevant image patterns, we used visualization techniques developed by Zieler et al. to highlight the function of the intermediate architecture feature layers. We found that different filters in the model were successfully identifying patterns such as cell boundaries, nuclei and backgrounds in the image as shown in Fig. 4. This coupled with the Grad-Cams generated helped us conclude that the model was classifying by identifying patterns similar to what pathologists search for in the slides.
This research was supported by University of Virginia, Engineering in Medicine SEED Grant (SS&DEB), the University of Virginia Translational Health Research Institute of Virginia (THRIV) Mentored Career Development Award (SS), and the Bill and Melinda Gates Foundation (AA;OPP1138727; SRM;OPP1144149; PK;OPP1066118).
-  K. Geboes and G. Y. Lauwers, ”Gastrointestinal Pathology,” Archives of Pathology & Laboratory Medicine, vol. 134, no. 6, pp. 812-814, June 2010.
-  R. E. Nakhleh, ”Error Reduction in Surgical Pathology,” Archives of Pathology & Laboratory Medicine, vol. 130, no. 5, pp. 630-632, May 2006.
-  S. S. Raab, D. M. Grzybicki, J. E. Janosky, R. J. Zarbo, F. A. Meier, C. Jensen and S. J. Geyer, ”Clinical impact and frequency of anatomic pathology errors in cancer diagnoses.,” Cancer, vol. 104, no. 10, p. 2205–2213, 2005.
-  M. N. Gurcan, L. E. Boucheron, A. Can, A. Madabhushi, N. M. Rajpoot and B. Yener., ”Histopathological image analysis a review.,” IEEE Reviews in Biomedical Engineering, pp. 147-171, 2009.
G. Sharma and A. Carter, ”Artificial Intelligence and the Pathologist, Future Frenemies?,” Archives of Pathology & Laboratory Medicine, vol. 141, pp. 622-623, May 2017.
-  S. Syed et al., ”195 - Convolutional Neural Networks Image Analysis of Duodenal Biopsies Robustly Distinguishes Environmental Enteropathy from Healthy Controls and Identifies Secretory Cell Lineages as High Activation Locations”, Gastroenterology, vol. 154, no. 6, p. S-52, 2018.
-  A. Cruz-Roa, A. Basavanhally, F. González, H. Gilmore, M. Feldman, S. Ganesan, N. Shih, J. Tomaszewski and A. Madabhushi, ”Automatic detection of invasive ductal carcinoma in whole slide images with Convolutional Neural Networks,” in SPIE Medical Imaging 2014: Digital Pathology, San Diego, California, United States, 2014.
-  Wimmer, Georg, et al. ”Convolutional neural network architectures for the automated diagnosis of celiac disease.” International Workshop on Computer-Assisted and Robotic Endoscopy. Springer, Cham, 2016
-  Wei, J. W., Wei, J. W., Jackson, C. R., Ren, B., Suriawinata, A. A., Hassanpour, S., ”Automated detection of celiac disease on duodenal biopsy slides: a deep learning approach” arXiv:1901.11447 [cs.CV] Jan 2019.
-  C. L. Jansson-Knodell, I. A. Hujoel, A. Rubio-Tapia and J. A. Murray, ”Not All That Flattens Villi Is Celiac Disease: A Review of Enteropathies,” Mayo Clinic Proceedings, pp. 509-517, April 2018.
-  A. Krizhevsky, I. Sutskever and G. E. Hinton, ”ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems 25 (NIPS 2012), 2012.
-  A. Vahadane, T. Peng, A. Sethi, S. Albarqouni, L. Wang, M. Baust, K. Steiger, A. M. Schlitter, I. Esposito and N. Navab, ”Structure-Preserving Color Normalization and Sparse Stain Separation for Histological Images,” IEEE Transactions on Medical Imaging, vol. 35, no. 8, pp. 1962-1971, August 2016.
-  B. E. Bejnordi, M. Veta, P. Johannes van Diest, B. van Ginneken, N. Karssemeijer, G. Litjens and J. A.W. M. van der Laak, ”Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer,” Journal of the American Medical Association, vol. 318, no. 22, pp. 2199-2210, 2017.
-  S. J. Pan and Q. Yang, ”A Survey on Transfer Learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345-1359, 2010.
-  J. Howard and S. Ruder, ”Universal Language Model Fine-tuning for Text Classification,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), Melbourne, 2018.
-  I. Loshchilov., F. Hutter, “SGDR: Stochastic Gradient Descent with Warm Restarts,” arXiv:1608.03983 [cs.LG], May 2017.
-  R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh and D. Batra, ”GradCAM: Visual Explanations from Deep Networks via Gradient-Based Localization,” in IEEE International Conference on Computer Vision (ICCV), Venice, 2017.
-  M. D. Zeiler and R. Fergus, ”Visualizing and understanding convolutional networks.,” in European conference on computer vision. , Springer, Cham, 2014.