Hierarchical Deep Convolutional Neural Networks for Multi-category Diagnosis of Gastrointestinal Disorders on Histopathological Images

05/08/2020 ∙ by Rasoul Sali, et al. ∙ 7

Deep convolutional neural networks (CNNs) have been successful for a wide range of computer vision tasks including image classification. A specific area of application lies in digital pathology for pattern recognition in tissue-based diagnosis of gastrointestinal (GI) diseases. This domain can utilize CNNs to translate histopathological images into precise diagnostics. This is challenging since these complex biopsies are heterogeneous and require multiple levels of assessment. This is mainly due to structural similarities in different parts of the GI tract and shared features among different gut diseases. Addressing this problem with a flat model which assumes all classes (parts of the gut and their diseases) are equally difficult to distinguish leads to an inadequate assessment of each class. Since hierarchical model restricts classification error to each sub-class, it leads to a more informative model compared to a flat model. In this paper we propose to apply hierarchical classification of biopsy images from different parts of the GI tract and the receptive diseases within each. We embedded a class hierarchy into the plain VGGNet to take advantage of the hierarchical structure of its layers. The proposed model was evaluated using an independent set of image patches from 373 whole slide images. The results indicate that hierarchical model can achieve better results compared to the flat model for multi-category diagnosis of GI disorders using histopathological images.



There are no comments yet.


page 1

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Gastrointestinal (GI) diseases are ailments linked to the digestive system, including the esophagus, stomach and the intestines. GI diseases account for substantial morbidity, mortality and financial burden by affecting the GI tract and having an impact on digestion and overall health. The National Institute of Health reports that between 60 and 70 million Americans are affected by GI diseases each year[13].

A common approach to GI disease diagnosis lies in digital pathology for pattern recognition. However, a major challenge of interpreting clinical biopsy images to diagnose disease is the often striking overlap in histopathology images between distinct but related conditions. There is a critical clinical need to develop new methods to allow clinicians to translate heterogeneous biomedical images into precise diagnostics [1].

Convolutional Neural Networks (CNNs) have shown superior performance for the automated extraction of quantitative morphologic phenotypes from GI biopsy images and diagnosis of associated diseases [16, 8]

. Despite this, as the number of disease classes associated with different parts of gut becomes larger one of the problems that may arise is that visual separability between classes becomes more challenging. This is mainly due to structural similarities in different parts of the gut and the shared features among gut diseases. Furthermore, for multi-class classification problems, some classes become harder to distinguish than others and require dedicated classifiers 

[27]. This issue cannot be addressed by regular flat models because these models assume that all classes are equally difficult to distinguish [29]. To combat this, hierarchical relationship are often identified that exist between classes which can be used to deploy a hierarchical classification model. This model can be more informative since the classification error is restricted to subcategories compared to treating all classes as arranged in a flat structure.

In CNNs architecture, lower layers capture low level features while higher layers are likely to extract more abstract features [28]. This property of a CNN can be combined with the class hierarchical structure to enforce the network to learn different levels of class hierarchy in different layers. In this way, coarse categories which are easier to classify are represented in lower (shallow) layers while higher (deeper) layers output fine subcategories simultaneously [27, 29]. In this paper, we propose a hierarchical deep convolutional neural network to take advantage of the hierarchical structure of GI diseases for classification.

This paper is organized as follows: Section II provides an introduction to the diseases studied in this paper. In section  III some related researches are reviewed. The methodology is explained in section IV. The data used in this study, data preparation steps and empirical results are elaborated in section V. Finally, section VII concludes the paper along with outlining future directions.

Ii Gastrointestinal disorders

GI disorders refers to any abnormal condition or disease that occurs within the GI tract. While there are a wide variety of disorders associated with different parts of the GI tract, this paper focuses on certain disorders involving only the duodenum, esophagus and ileum. In this section, we give an introduction to each of the considered disorders.

Ii-a Duodenum

Ii-A1 Celiac Disease (CD)

It is an inability to normally process dietary gluten (present in foods such as wheat, rye, and barley) and is present in % of the US population. Gluten exposure triggers an inflammatory cascade which leads to compromised intestinal barrier function. Gluten consumption by people with CD can cause diarrhea, abdominal pain, bloating, and weight loss. If unrecognized, it can lead to anemia, decreased bone density, and, in longstanding cases, intestinal cancer [12].

Ii-A2 Environmental Enteropathy (EE)

It is an acquired small intestinal condition, a consequence of the continuous burden of immune stimulation by fecal-oral exposure to enteropathogens leading to a persistent acute phase response and chronic inflammation [2, 19]. EE can be characterized histologically by villus shortening, crypt hyperplasia and resultant decrease in the surface area of mature absorptive intestinal epithelial cells which leads to a markedly reduced nutrient absorption and, thus, under-nutrition and stunting [21].

Ii-B Esophagus

Ii-B1 Eosinophilic Esophagitis (EoE)

It is a chronic, allergic inflammatory disease of esophagus. It occurs when eosinophils, a normal type of white blood cells present in digestive tract, build up in the lining of the esophagus. EoE is characterized by symptoms of esophageal dysfunction and eosinophilic infiltration of the esophageal mucosa in the absence of secondary causes of eosinophilia [3].

Ii-C Ileum

Ii-C1 Crohn’s Disease

It is an inflammatory bowl disease that causes patchy disease constituting of chronic inflammation, ulcers and mucosal damage anywhere in the GI tract, although the most common being the terminal ileum and colon. The interaction of genetic susceptibility, environmental factors, and intestinal microflora is believed to be the major cause of crohn’s disease. This interaction results in abnormal mucosal immune response which compromises epithelial barrier function [23].

Iii Related Work

Hierarchical CNN has demonstrated improved performance in image classification compared to flat CNN models across multiple domains [26, 17, 11, 15]. These models exploit the hierarchical structure of object categories [24] to decompose the classification tasks into multiple steps. Hierarchical Deep CNNs (HD-CNN) proposed by Yan et al. [27] embeds CNN into a categorical hierarchy by separating easy classes using a coarse category classifier and difficult categories using fine category classifier. This model can be implemented without increasing the complexity of the training process, however, it requires multi-step training of each CNN. Zhu and Bain proposed a branched variant (B-CNN) of the hierarchical deep CNN [29]. Since shallow layers of CNN capture low-level features while deeper levels capture high level features of an image, B-CNN outputs multiple predictions ordered from coarse to fine along concatenated convolutional layers corresponding to hierarchical structure of the target classes. The model branch training strategy adjusts parameters on the output layers which then forces the input to learn successively coarse to fine concepts along with the layer blocks. Hierarchical architecture has been applied to both image [17] and video classification [4]

tasks with superior performance when compared with conventional flat CNN models. While deep learning has seen a significant application in medical images classification tasks 

[11], hierarchical models remain a relatively less explored area in literature.

CNN-based hierarchical model in medical image classification on histopathological images was reported by Ranjan et al. [15] with superior performance than flat CNN. The authors attempted to detect cancer and its states (insitu, invasive or normal) from histopathological images using multiple CNNs in a hierarchical manner. With one CNN in each of the two-level classification task, the first level CNN, a pretrained AlexNet, is trained to discriminate the normal class from the images of the rest of the classes. The second level hierarchy is a tree of CNN-based binary classifiers using majority voting to discriminate the other three classes; in situ, invasive and benign cells.

Another work applied hierarchical deep convolutional neural network on cythopathology to classify cellular images as healthy or cancer‐affected cells [9]. To the best of our knowledge, there are no previously published studies that have applied hierarchical deep convolutional neural networks to gastrointestinal disease classification using histopathological images.

Iv Methodology

Iv-a Base Model

There are many different architectures of CNNs in the literature with associated advantages and drawbacks. In this paper, we used VGGNet [18] (proposed by Visual Geometry Group in University of Oxford) as base model which has shown excellent performance in image classification problems including medical image analysis [14, 5, 7]. VGGNet obtained state of the art results in the ILSVRC’14 competition with error rate which was among top errors and was a significant improvement over ZFNet [28], the winner of ILSVRC’13. Two main intertwined characteristics of VGGNet were the increased depth of the network and applying smaller filters. It uses sized filters and sized pooling from the beginning to the end of the network. Since smaller filters have few parameters, made it possible to increase the depth by stacking more of them with the same effective receptive fields when using larger filters. For instance, effective receptive fields of three stacked

filters with stride

is same as a filter. VGG16 and VGG19 has been released as two variants of VGGNet. VGG16 has trainable layers including convolutional layers which have been organized in blocks are followed by

fully-connected layers. The final layer is a softmax layer that outputs class probabilities. In this paper, VGG16 was applied and was trained from scratch on biopsy patches. The network shown at the top in Figure

1 is a plain VGG16.

Iv-B Hierarchical Convolutional Neural Network

To propose hierarchical convolutional neural network, architecture of Branch Convolutional Neural Network (B-CNN) [29] proposed by Zhu and Bain was applied to embed different levels of class hierarchical on VGG16 to propse H-VGGNet. There were seven classes of GI disorders: Duodenum-Celiac, Duodenum-EE, Duodenum-Normal, Esophagus-EoE, Esophagus-Normal, Ileum-Crohn’s and Ileum-Normal, as fine classes and each class belonged to a coarse category: Duodenum, Esophagus or Ileum (see the hierarchy of classes in Figure 2). Since in this hierarchy there are two class levels, in addition to output layer which will output fine classes, one branch was added to VGG16 to output coarse categories. The network shown at the bottom in Figure 1

is H-VGGNet. Although new branches can consist of both convolutional and fully-connected layers, in this paper the new added branch was composed of only fully connected layers. This model for each input image computed both coarse and fine level predictions. The rectified linear unit (ReLU


was employed as the activation function. To reduce over-fitting, dropout regularization 

[20] was used after ReLU of each fully-connected layer with

. Also Batch Normalization 

[6] was applied after ReLU of every trainable layer.

The loss function for such model is weighted summation of coarse and fine prediction losses (see equation



Where is number of levels in hierarchy of classes, is weight of level th in class hierarchy and term is cross-entropy loss function for the th instance in th level of class hierarchy. is the element in class score of instance th in th level of class hierarchy corresponding to positive element in target label and is the th element in class score of instance th in th level of class hierarchy.

Fig. 1: Top: VGGNet architecture, Bottom: H-VGGNet architecture
Fig. 2: Hierarchy of classes

V Experimental Setup

This section is devoted to presenting the experimental setting including data description, data pre-processing steps, training details and the evaluation criterion.

V-a Data

Hematoxylin and Eosin (H&E) stained whole slide images (WSI) from patients were obtained for this study. Some patients had biopsies for each diagnosis. All biopsies obtained from University of Virginia (UVA), VA, USA were retrospectively retrieved archival samples while the other biopsies were obtained as part of prospective cohorts studying growth faltering among children (except Crohn’s Disease biopsies). Images were obtained based on the gut disease state: 1) Celiac Disease in the Duodenum: UVA (), Cincinnati Children’s Medical Center (CCHMC), OH, USA and Washington University (WashU), MO, USA ( and , respectively); 2) Environmental Enteropathy in Duodenum: Aga Khan University, Karachi, Pakistan () and Zambia School of Medicine, Lusaka, Zambia (); 3) histologically normal duodenum: UVA (), CCHMC (), WashU (); 4) EoE in Esophagus: UVA (); 5) histologically normal esophagus: UVA (); 6) Crohn’s disease in Ileum: CCHMC as part of RISK study sub-cohort (); and, 7) histologically normal ileum: UVA ().
We split our data into training, development and test sets using ratio. Since it is important that our model, after being trained on a sample patients’ data, generalizes to other unseen patients data, we performed our split to ensure there was no overlap between the training, development and test set for a particular patient.

Coarse category Fine class Train Development Test
 Number of
  Number of
  Number of
 Number of
  Number of
  Number of
Duodenum Celiac
Esophagus EoE
Ileum Crohns
Normal 8
TABLE I: Distribution of training, development, and test set data among different classes

V-B Data Pre-processing

V-B1 Image Patching

A sliding window method was applied to each high-resolution WSI to generate patches of size pixels. Since some classes had more whole-slide images than others, we generated patches with different overlapping areas for each class. To reduce the computational cost, patches were resized to pixels. After generating patches from each image, we labelled each patch based on its associated WSI.

V-B2 Patch Clustering

In our work, a two-step clustering process was applied to filter useless patches which had mostly been created from the background of the WSIs. All or a large part of these patches were blank or did not contain any useful biopsy information. Through the first step, a convolutional autoencoder was used to learn the embedded features of each patch and in the second step k-means clustering algorithm was applied to cluster embedded features into two clusters: useful and useless. Table 

I summarizes distribution of WSIs and patches (after cleaning) in each class.

V-B3 Stain Color Normalization

Histological images have substantial color variation that adds bias while training the model. This arises due to a wide variety of factors such as differences in raw materials and manufacturing techniques of stain vendors, staining protocols of labs, and color responses of digital scanners [25]. To avoid any bias, unwanted color variations should be addressed and resolved as an essential pre-processing step prior to any analyses.

Various solutions such as color balancing [8], gray-scale, stain normalization [16] etc. have been proposed in the published literature to address color variation issue. In this study, we used gray-scale version of images but before converting the RGB patches to gray-scale, the stain normalization approach proposed by Vahadane et al. [25] was applied to make sure that effect of variation of color intensity is significantly reduced. Figure 3 shows an example of the result of applying this process on representative biopsy patches.

Fig. 3: Color normalization artifacts. The first row represent the original images, the second row is color normalized images using the method proposed by Vahadane et al. [25] and their associated gray-scale images are in third row

V-C Training Details

We conducted extensive experiments to compare the performance of hierarchical model with flat model. To achieve this, both base model and hierarchical model were trained and tested ten times. Each time both the models were trained in epochs. Optimization was performed using RMSprop [22] optimization with no momentum, The initial value of the learning rate is considered as , it changed to after the 10th epoch and to

after the 15th epoch. Different loss weights are applied on each level of hierarchy to reflect the differences in the importance of each level of classes. Since in initial epochs the low level feature extraction is more important, more weights is assigned to it. As the training of model progresses, the weight of coarse categories level decreases and the weight of fine classes increases. The changes in loss weights follow

in the first epoch, in the 5th epoch, in the 10th epoch, in the 15th epoch. This change in weights causes the algorithm to focus first on the optimization of coarse category, and as the learning process progresses, this focus shifts to the fine level.

V-D Evaluation Metrics

In order to assess the performance of models, accuracy, area under the ROC curve (AUC), Precision, Recall and F1 score have been considered.

Vi Results

Table II presents the performance comparison between two models in terms of the accuracy, AUC, Precision, Recall and F1 score on test set with confidence intervals. As shown in the table, the performance of hierarchical model for many classes was better than the flat model in terms of the mean of the aforementioned criterion. Also Table III

presents the normalized confusion matrix of two models. The confusion between different coarse categories in hierarchical model was less than the flat model.

Metric Model Class
Duodenum Esophagus Ileum
Celiac EE Normal EoE Normal Crohn’s Normal
Accuracy VGGNet
Precision VGGNet
Recall VGGNet
F1 score VGGNet
TABLE II: Comparison of model performance
True Label Model Predicted Label
Duodenum Esophagus Ileum
Celiac EE Normal EoE Normal Crohn’s Normal
Celiac VGGNet
Normal (Duodenum) VGGNet
Normal (Esophagus) VGGNet
Crohn’s VGGNet
Normal (Ileum) VGGNet
TABLE III: Normalized confusion matrix of flat and hierarchical model

Vii Conclusion

In this paper, we propose a hierarchical deep convolutional neural network for multi-category classification of gastrointestinal disorders using histopathological biopsy images. Our proposed model was tested on  cropped images derived from an independent set of WSIs. Our results showed that hierarchical deep model had superior classification performance for a problem with inherent hierarchical structure compared to a flat deep model which assumes equal difficulty for classification. With dataset collected from patients and based on our training, development and test set split, our model can be generalized to other patients that are not part of the training or development sets.

In CNN architecture, since lower layers capture low level features while higher layers are likely to extract more abstract features, we utilized this property to build our model instead of employing separate models for different levels of class hierarchy. The use of such structure makes it possible to not only save computational cost but also benefit from shared information across the coarse levels in training phase. Quantification of this synergy could be a possible avenue for future research.


Research reported in this manuscript was supported by National Institute of Diabetes and Digestive and Kidney Diseases of the National Institutes of Health under award number K23DK117061-01A1 (SS), Bill and Melinda Gates Foundation under award numbers OPP1066203, OPP1066118, OPP1144149 and OPP1066153 and University of Virginia Translational Health Research Institute of Virginia (THRIV) Scholar Career Development Award (SS). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.


  • [1] B. E. Bejnordi, M. Veta, P. J. Van Diest, B. Van Ginneken, N. Karssemeijer, G. Litjens, J. A. Van Der Laak, M. Hermsen, Q. F. Manson, M. Balkenhol, et al. (2017) Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama 318 (22), pp. 2199–2210. Cited by: §I.
  • [2] D. Campbell, M. Elia, and P. Lunn (2003) Growth faltering in rural gambian infants is associated with impaired small intestinal barrier function, leading to endotoxemia and systemic inflammation. The Journal of nutrition 133 (5), pp. 1332–1338. Cited by: §II-A2.
  • [3] E. S. Dellon and I. Hirano (2018) Epidemiology and natural history of eosinophilic esophagitis. Gastroenterology 154 (2), pp. 319–332. Cited by: §II-B1.
  • [4] J. Fan, A. K. Elmagarmid, X. Zhu, W. G. Aref, and L. Wu (2004) ClassView: hierarchical video shot classification, indexing, and accessing. IEEE Transactions on Multimedia 6 (1), pp. 70–86. Cited by: §III.
  • [5] J. J. Gómez-Valverde, A. Antón, G. Fatti, B. Liefers, A. Herranz, A. Santos, C. I. Sánchez, and M. J. Ledesma-Carbayo (2019)

    Automatic glaucoma classification using color fundus images based on convolutional neural networks and transfer learning

    Biomedical optics express 10 (2), pp. 892–913. Cited by: §IV-A.
  • [6] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: §IV-B.
  • [7] S. Y. Ko, J. H. Lee, J. H. Yoon, H. Na, E. Hong, K. Han, I. Jung, E. Kim, H. J. Moon, V. Y. Park, et al. (2019) Deep convolutional neural network for the diagnosis of thyroid nodules on ultrasound. Head & neck 41 (4), pp. 885–891. Cited by: §IV-A.
  • [8] K. Kowsari, R. Sali, M. N. Khan, W. Adorno, S. A. Ali, S. R. Moore, B. C. Amadi, P. Kelly, S. Syed, and D. E. Brown (2019) Diagnosis of celiac disease and environmental enteropathy on biopsy images using color balancing on convolutional neural networks. In Proceedings of the Future Technologies Conference, pp. 750–765. Cited by: §I, §V-B3.
  • [9] S. D. Krauß, R. Roy, H. K. Yosef, T. Lechtonen, S. F. El-Mashtoly, K. Gerwert, and A. Mosig (2018) Hierarchical deep convolutional neural networks combine spectral and spatial information for highly accurate raman-microscopy-based cytopathology. Journal of biophotonics 11 (10), pp. e201800022. Cited by: §III.
  • [10] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §IV-B.
  • [11] F. Milletari, N. Navab, and S. Ahmadi (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. Cited by: §III.
  • [12] I. Parzanese, D. Qehajaj, F. Patrinicola, M. Aralica, M. Chiriva-Internati, S. Stifter, L. Elli, and F. Grizzi (2017) Celiac disease: from pathophysiology to treatment. World journal of gastrointestinal pathophysiology 8 (2), pp. 27. Cited by: §II-A1.
  • [13] A. F. Peery, E. S. Dellon, J. Lund, S. D. Crockett, C. E. McGowan, W. J. Bulsiewicz, L. M. Gangarosa, M. T. Thiny, K. Stizenberg, D. R. Morgan, et al. (2012) Burden of gastrointestinal disease in the united states: 2012 update. Gastroenterology 143 (5), pp. 1179–1187. Cited by: §I.
  • [14] A. Rakhlin, A. Shvets, V. Iglovikov, and A. A. Kalinin (2018) Deep convolutional neural networks for breast cancer histology image analysis. In International Conference Image Analysis and Recognition, pp. 737–744. Cited by: §IV-A.
  • [15] N. Ranjan, P. V. Machingal, S. S. D. Jammalmadka, V. Thenaknidiyoor, and A. Dileep (2018) Hierarchical approach for breast cancer histopathology images classification. Cited by: §III, §III.
  • [16] R. Sali, L. Ehsan, K. Kowsari, M. Khan, C. A. Moskaluk, S. Syed, and D. E. Brown (2019) CeliacNet: celiac disease severity diagnosis on duodenal histopathological images using deep residual networks. arXiv preprint arXiv:1910.03084. Cited by: §I, §V-B3.
  • [17] Y. Seo and K. Shin (2019) Hierarchical convolutional neural networks for fashion image classification. Expert Systems with Applications 116, pp. 328–339. Cited by: §III.
  • [18] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §IV-A.
  • [19] N. W. Solomons (2003) Environmental contamination and chronic inflammation influence human growth potential. The Journal of nutrition 133 (5), pp. 1237–1237. Cited by: §II-A2.
  • [20] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov (2014) Dropout: a simple way to prevent neural networks from overfitting.

    The journal of machine learning research

    15 (1), pp. 1929–1958.
    Cited by: §IV-B.
  • [21] S. Syed, A. Ali, and C. Duggan (2016) Environmental enteric dysfunction in children: a review. Journal of pediatric gastroenterology and nutrition 63 (1), pp. 6. Cited by: §II-A2.
  • [22] T. Tieleman and G. Hinton (2012) Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4 (2), pp. 26–31. Cited by: §V-C.
  • [23] J. Torres, S. Mehandru, J. Colombel, and L. Peyrin-Biroulet (2017) Crohn’s disease. The Lancet 389 (10080), pp. 1741–1755. Cited by: §II-C1.
  • [24] A. Tousch, S. Herbin, and J. Audibert (2012) Semantic hierarchies for image annotation: a survey. Pattern Recognition 45 (1), pp. 333–345. Cited by: §III.
  • [25] A. Vahadane, T. Peng, A. Sethi, S. Albarqouni, L. Wang, M. Baust, K. Steiger, A. M. Schlitter, I. Esposito, and N. Navab (2016) Structure-preserving color normalization and sparse stain separation for histological images. IEEE transactions on medical imaging 35 (8), pp. 1962–1971. Cited by: Fig. 3, §V-B3, §V-B3.
  • [26] Z. Yan, R. Piramuthu, V. Jagadeesh, W. Di, and D. Decoste (2019-August 20) Hierarchical deep convolutional neural network for image classification. Google Patents. Note: US Patent 10,387,773 Cited by: §III.
  • [27] Z. Yan, H. Zhang, R. Piramuthu, V. Jagadeesh, D. DeCoste, W. Di, and Y. Yu (2015) HD-cnn: hierarchical deep convolutional neural networks for large scale visual recognition. In Proceedings of the IEEE international conference on computer vision, pp. 2740–2748. Cited by: §I, §I, §III.
  • [28] M. D. Zeiler and R. Fergus (2014) Visualizing and understanding convolutional networks. In European conference on computer vision, pp. 818–833. Cited by: §I, §IV-A.
  • [29] X. Zhu and M. Bain (2017) B-cnn: branch convolutional neural network for hierarchical classification. arXiv preprint arXiv:1709.09890. Cited by: §I, §I, §III, §IV-B.