A Hierarchical Conditional Random Field-based Attention Mechanism Approach for Gastric Histopathology Image Classification

by   Yixin Li, et al.
Tencent QQ

In the Gastric Histopathology Image Classification (GHIC) tasks, which is usually weakly supervised learning missions, there is inevitably redundant information in the images. Therefore, designing networks that can focus on effective distinguishing features has become a popular research topic. In this paper, to accomplish the tasks of GHIC superiorly and to assist pathologists in clinical diagnosis, an intelligent Hierarchical Conditional Random Field based Attention Mechanism (HCRF-AM) model is proposed. The HCRF-AM model consists of an Attention Mechanism (AM) module and an Image Classification (IC) module. In the AM module, an HCRF model is built to extract attention regions. In the IC module, a Convolutional Neural Network (CNN) model is trained with the attention regions selected and then an algorithm called Classification Probability-based Ensemble Learning is applied to obtain the image-level results from patch-level output of the CNN. In the experiment, a classification specificity of 96.67 images. Our HCRF-AM model demonstrates high classification performance and shows its effectiveness and future potential in the GHIC field.



There are no comments yet.


page 3

page 14

page 17

page 22


Gastric histopathology image segmentation using a hierarchical conditional random field

In this paper, a Hierarchical Conditional Random Field (HCRF) model base...

Learning of Frequency-Time Attention Mechanism for Automatic Modulation Recognition

Recent learning-based image classification and speech recognition approa...

A Novel Global Spatial Attention Mechanism in Convolutional Neural Network for Medical Image Classification

Spatial attention has been introduced to convolutional neural networks (...

Learn To Pay Attention

We propose an end-to-end-trainable attention module for convolutional ne...

Relationships from Entity Stream

Relational reasoning is a central component of intelligent behavior, but...

Playing to distraction: towards a robust training of CNN classifiers through visual explanation techniques

The field of deep learning is evolving in different directions, with sti...

Generative Model for Material Experiments Based on Prior Knowledge and Attention Mechanism

Material irradiation experiment is dangerous and complex, thus it requir...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Gastric cancer is one of the top five most frequently diagnosed malignant tumors worldwide, according to the World Health Organisation (WHO) report wild2014world . It remains a deadly cancer for its high incidence and fatality rate, which leads to over 1,000,000 new cases and over 700,000 deaths per year, making it the third leading cause of cancer deathsbray2018global . Surgical removal of gastric cancer in the early stage without metastasis is the only possible cure. The median survival of gastric cancer rarely exceeds 12 months, and after the tumor metastasis, 5-years of survival is observed with a survival rate under 10% orditura2014treatment

. Therefore, early treatment can effectively reduce the possibility of death and an accurate estimate of the patient’s prognosis is demanded. Although endoscopic ultrasonography and Computerized Tomography (CT) are the primary methods for diagnosing gastric cancer, whereas histopathology images are considered as the gold standard for the diagnosis

van2016gastric . However, histopathology images are usually large with redundant information, which means histopathology analysis is a time-consuming specialized task and highly associated with pathologists’ skill and experience elsheikh2013american . Professional pathologists are often in short supply, and long hours of heavy work can lead to lower diagnostic quality. Thus, an intelligent diagnosis system plays a significant role in automatically detecting and categorizing histopathology images.

In recent years, Deep Learning (DL) techniques have shown significant improvements in a wide range of computer vision tasks, including diagnosis of gastric cancer, lung cancer and breast cancer, assisting doctors in classifying and analyzing medical images. Especially,

Gastric Histopathology Image Classification (GHIC) is a weakly supervised problem, which means that an image labeled as abnormal contains abnormal tissues with cancer cells and normal tissues without cancer cells existing in the surrounding area at the same time. However, the existing networks usually fail to focus only on abnormal regions to make their diagnosis, which leads to noise regions and redundant information, bringing negative influence on the final decision-making process and affecting the network performance wang2019thorax . Therefore, some advanced methods are proposed to incorporate visual Attention Mechanisms (AMs) into Convolutional Neural Networks (CNNs), which allows a deep model to adaptively focus on related regions of an image li2019attention . Moreover, the fully dense annotations of pathological findings such as the contours or bounding boxes are not available in most cases due to its cost and time-consuming nature. Hence, we propose an intelligent Hierarchical Conditional Random Field based Attention Mechanism (HCRF-AM) model that includes additional region level images to guide the attention of CNNs for the GHIC tasks. The HCRF-AM model includes the AM module (where the Hierarchical Conditional Random Field (HCRF) model sun2020gastric ; sun2020hierarchical is applied to extract attention areas) and the Image Classification (IC) module. The workflow of the proposed HCRF-AM model is shown in Fig. 1.

Figure 1: Workflow of the proposed HCRF-AM model for GHIC.

There are three main contributions of our work: First, the AM module integrated into the network improves both the performance and interpretability of gastric cancer diagnosis. Second, we develop the HCRF model to obtain full annotations for weakly supervised classification tasks automatically. Thirdly, we use a publicly available gastric histopathology image dataset, which consists of images and extensive experiments on the dataset demonstrate the effectiveness of our method.

This paper is organized as follows. In Sec. 2, we review the existing methods related to automatic gastric cancer diagnosis, AMs and the Conditional Random Field (CRF). We explain our proposed method in Sec. 3. Sec. 4 elaborates the experimental settings, implementation, results and comparison. Sec. 5 compares our method to previous GHIC studies. Sec. 6 concludes this paper and discusses the future work.

2 Related Works

2.1 Automatic Gastric Cancer Diagnosis

Though different imaging techniques like gastroscopes zhu2015lesion , X-ray ishihara2017detection , and CT li2018detection . are utilized to detect and diagnose gastric cancer, histopathological analysis of gastric cancer slides by pathologists is the only way to diagnose gastric cancer with confidence. Researchers have devoted a considerable amount of effort and there is a great deal of work on automatic classification of gastric histopathological images.

Here, we group Computer Aided Diagnosis (CAD) methods of GHIC into two types: classical Machine Learning (ML) techniques and Deep Learning (DL) techniques. The classical ML methods extract some handcrafted features like color 

li2020multi and texture descriptors korkmaz2017recognition  korkmaz2018classification

and use classifiers like Support Vector Machine (SVM) 

sharma2015appearance  liu2018classification

, Random Forest (RF) 

sharma2017comparative and Adaboost algorithm li2020multi to make decision. However, the above classical ML methods only consider a handful of features on images, yielding relatively low classification accuracy.

In recent years, numerous DL models have been proposed in literature to diagnose gastric cancer with images obtained under the optical microscope. For instance, a pure supervised feedforward CNN model for classification of gastric carcinoma Whole Slide Images (WSIs) is introduced in sharma2017deep , and the performance of the developed DL approach is quantitatively compared with traditional image analysis methods requiring prior computation of handcrafted features. The comparative experimental results reveal that DL methods compare favorably to traditional methods. The work in liu2018gastric creates a deep residual neural network model for GHIC tasks, which has deeper and more complex structures with fewer parameters and higher accuracy. A whole slide gastric image classification method based on Recalibrated Multi-instance Deep Learning (RMDL) is proposed in wang2019rmdl . The RMDL provides an effective option to explore the interrelationship of different patches and consider the various impacts of them to image-level label classification. A convolutional neural network of DeepLab-v3 with the ResNet-50 architecture is applied as the binary image segmentation method in song2020clinically , and the network is trained with 2123 pixel-level annotated Haematoxylin and Eosin (H&E) stained WSIs in their private dataset. A deep neural network that can learn multi-scale morphological patterns of histopathology images simultaneously is proposed in kosaraju2020deep . The work of iizuka2020deep

contributes to reducing the number of parameters of standard Inception-v3 network by using a depth multiplier. The output of the Inception-v3 feature extractor feeds in a Recurrent Neural Network (RNN) consisting of two Long Short-Term Memory (LSTM) layers and forms the final architecture. The models are trained to classify WSIs into adenocarcinoma, adenoma, and non-neoplastic.

Although existing methods based on DL models provide significant performance boost in gastric histopathology image analysis, the existing methods still neglect that the images in weakly-supervised learning tasks contain large redundancy regions that are insignificant in the DL process, which is the main challenge in computational pathology.

2.2 Applications of Attention Mechanism

The visual Attention Mechanism (AM) has the capacity to make a deep model adaptively focus on related regions of an image and hence is an essential way to enhance its effectiveness in many vision tasks, such as object detection ba2014multiple li2020object , image caption xu2015show liu2020image and action recognition sharma2015action . A prediction model to analyze whole slide histopathology images is proposed in bentaieb2018predicting , which integrates a recurrent AM. The AM is capable of attending to the discriminatory regions of an image by adaptively selecting a limited sequence of locations. An attention-based CNN is introduced in li2019large , where the attention maps are predicted in the attention prediction subnet to highlight the salient regions for glaucoma detection. A DenseNet based Guided Soft Attention network is developed in yang2019guided which aims at localizing regions of interest in breast cancer histopathology images, and simultaneously using them to guide the classification network. A Thorax-Net for the classification of thorax diseases on chest radiographs is constructed in wang2019thorax

. The attention branch of the proposed network exploits the correlation between class labels. The locations of pathological abnormalities by analyzing the feature maps are learned by the classification branch. Finally, a diagnosis is derived by averaging and binarizing the outputs of two branches. A CAD approach called HIENet is introduced in 

sun2019computer to classify histopathology images of endometrial diseases using a CNN and AM. The Position Attention block of the HIENet is a self-AM, which is utilized to capture the context relations between different local areas in the input images. GHIC is intrinsically a weakly supervised learning problem and the location of essential areas plays a critical role in the task. Therefore, it is reasonable to combine the AMs in the classification of tissue-scale gastric histopathology images.

2.3 Applications of Conditional Random Fields

Conditional Random Fields (CRFs), as an important and prevalent type of ML method, are designed for building probabilistic models to explicitly describe the correlation of the pixels or the patches being predicted and label sequence data. The CRFs are attractive in the field of ML because they allow achieving in various research fields, such as Name Entity Recognition Problem in Natural Language Processing 

zhang2017semi , Information Mining wicaksono2013toward , Behavior Analysis zhuowen2013human , Image and Computer Vision kruthiventi2015crowd , and Biomedicine liliana2017review . In recent years, with the rapid development of DL, the CRF models are usually utilized as an essential pipeline within the deep neural network in order to refine the image segmentation results. Some research incorporates them into the network architecture, while others include them in the post-processing step. In qu2019weakly

, a dense CRF is embedded into the loss function of a deep CNN model to improve the accuracy and further refine the model. In 

Zormpas2019Superpixel , a multi-resolution hierarchical framework (called SuperCRF) is inspired by pathologists to perceive regional tissue architecture is introduced. The labels of the CRF single-cell nodes are connected to the regional classification results from superpixels producing the final result. In li2020automated , a method based on a CNN is presented for the objective of automatic Gleason grading and Gleason pattern region segmentation of images with prostate cancer pathologies, where a CRF-based post-processing is applied to the prediction. In Dong2020GECNN , a DL convolution network based on Group Equivariant Convolution and Conditional Random Field (GECNN-CRF) is proposed. The output probability of the CNN model is able to build up the unary potential of the CRFs. The pairwise loss function used to express the magnitude of the correlation between two blocks is designed by the feature maps of the neighboring patches.

In our previous work kosov2018environmental , an environmental microorganism classification engine that can automatically analyze microscopic images using CRF and Deep Convolutional Neural Networks (DCNN) is proposed. The experimental results show 94.2% of overall segmentation accuracy. In another work li2019cervical , we suggest a multilayer hidden conditional random fields (MHCRFs) to classify gastric cancer images, achieving an overall accuracy of 93%. In sun2020gastric , we optimize our architecture and propose the HCRF model, which is employed to segment gastric cancer images for the first time. The results show overall better performance compared to other existing segmentation methods on the same dataset. Furthermore, we combine the AM with the HCRF model and apply them in classification tasks, obtaining preliminary research results in  li2021intelligent . For more information, please refer to our previous survey paper li2020comprehensive . The spatial dependencies on patches are usually neglected in previous GHIC tasks, and the inference is only based on the appearance of individual patches. Hence, we describe an AM based on the HCRF framework in this paper, which has not been applied to the problem in this field.

3 Method

Various kinds of classifiers have been used in GHIC tasks, and CNN classifiers are proved to achieve better performance than some classical Machine Learning (ML) methods. However, the results obtained by training them directly are not so satisfying. Considering that fact, we develop the HCRF-AM model to to refine the classification results further. Our proposed method consists of three main building blocks such as Attention Mechanism (AM) module, Image Classification (IC) module, and Classification Probability-based Ensemble Learning (CPEL). The structure of our HCRF-AM model is illustrated in Fig. 2. We explain each building block in the next subsections.

Figure 2: Overview of HCRF-AM framework for analyzing H&E stained gastric histopathological image (a) The example of input dataset. (b) The AM module. (c) The IC module

3.1 AM Module

The AM module is integrated to assist the CNN classifier with extracting key characteristics of the abnormal images and reducing redundant information meanwhile. The HCRF, which is the improvement of CRF lafferty2001conditional , have excellent attention area detection performance, because it can characterize the spatial relationship of images li2020comprehensive . The fundamental definition of CRFs will be introduced first. The detail information of HCRF model, including pixel-unary, pixel-binary, patch-unary, patch-binary potentials, and their combination will be elaborated afterwards.

3.1.1 Fundamental Definition of CRFs

The basic theory of CRF is introduced in lafferty2001conditional : Firstly, Y

is the random variable of the observation label sequence, and

X is the random variable of the relative label sequence. Secondly, represents a graph where , while X is indexed by the nodes or vertices of G. is the array of all sites, which corresponds with the vertices in the related undirected graph , whose edges construct the interactions among adjacent sites. Thus, is a CRF in case, when conditioned on observation sequence Y, the random variables follow the Markov properties related to the graph: , in which implies and are neighbours in . These principles demonstrate the CRF model is an undirected graph where two disjoint sets, X and Y, are separated from the nodes. In that case, the conditional distribution model is .

Based on the definition of the random fields in Clifford-1990-MRF

, the joint distribution over the label sequence

X is given Y and forms as Eq. (1).


where y is the observation sequence, x is the corresponding label sequence, and is the set of sections of x in association with the vertices of sub-graph S. Furthermore, from Chen-2018-DSI ; Zheng-2015-CRFRNN ; Gupta-2006-CRF , it can be comprehended that a redefinition of Eq. (1) is Eq. (2).


where is the normalization factor and is the potential function over the clique C. The clique C is the subset of the vertices in the undirected graph G , where , in this way, every two different vertices are adjoining.

3.1.2 The Architecture of the HCRF Model

Different from most of CRF models that have been built up with only unary and binary potentials Zheng-2015-CRFRNN ; Chen-2018-DSI , two types of higher order potentials are introduced in our work. One is a patch-unary potential to characterize the information of tissues, the other is a patch-binary potential to depict the surrounding spatial relation among different tissue areas. Our HCRF is expressed by Eq. (3).




is the normalization factor; is the set of all nodes in the graph , corresponding to the image pixels; is the set of all edges in the graph . is one patch divided from an image; represents the neighboring patches of a single patch. The usual clique potential function contains two parts (terms): The pixel-unary potential function is used to measure the probability that a pixel node is labeled as , which takes values from a given set of classes

, for a given observation vector

Y kosov2018environmental ; the pixel-binary potential function is used to describe the adjacent nodes and in the graph. The spatial context relationship between them is related not only to the label of node but also to the label of its neighbour node . Furthermore, and are the newly introduced higher order potentials. The patch-unary potential function is used to measure the probability that a patch node is labeled as for a given observation vector Y; the patch-binary potential function is used to describe the adjacent nodes and in the patch. , , and are the weights of the four potentials, , , and , respectively. and are the weights of the and , respectively. These weights are used to find the largest posterior label and to further improve the image segmentation performance.

The workflow of the proposed HCRF model can be concluded as follows: First, to obtain pixel-level segmentation information, the U-Net ronneberger2015u is trained to build up the pixel-level potential. Then, in order to obtain abundant spatial segmentation information in patch-level, we fine-tune three pre-trained CNNs, including VGG-16 simonyan2014very , Inception-V3 szegedy2016rethinking and ResNet-50 he2016deep networks to build up the patch-level potential. Thirdly, based on the pixel- and patch-level potentials, our HCRF model is structured. In the AM module, a half of abnormal images and their Ground Truth (GT) images are applied to train the HCRF and the attention extraction model is obtained.

3.1.3 Pixel-unary Potential

The pixel-unary potential in Eq. (3) is related to the probability weights of a label , taking a value given the observation data by Eq. (5).


where the image content is characterized by site-wise feature vector , which may be determined by all the observation data  Kumar-2006-DRF . The probability maps at the last convolution layer of the U-Net serves as the feature maps, and the -dimensional pixel-level feature obtains. So, the pixel-unary potential is updated to Eq. (6).


where the data determines .

3.1.4 Pixel-binary Potential

The pixel-binary potential in Eq. (3) describes the similarity of the pairwise adjacent sites and to take label given the data and weights, and it is defined as Eq. (7).


The layout of the pixel-binary potential is shown in Fig. 3. This “lattice” (or “reseau” or “array”) layout is used to describe the probability of each classified pixel is calculated by averaging each pixel of neighbourhood unary probability Li-2019-CHI . The other procedures are the same as the pixel-unary potential calculation in Sec. 3.1.3.

Figure 3: 48 neighbourhood ‘lattice’ layout of pixel-binary potential in the AM module. Average of unary probabilities of 48 neighbourhood pixels is used as probability of pixel (central pixel in orange)

3.1.5 Patch-unary Potential

In order to extract abundant spatial information, VGG-16, Inception-V3 and ResNet-50 networks are selected to extract patch-level features. In patch-level terms, represent VGG-16, Inception-V3 and ResNet-50 networks, respectively. In patch-unary potentials of Eq. (3), label and . are related to the probability of labels given the data by Eq. (8).


where the characteristics in image data are transformed by site-wise feature vectors , and that may be determined by all the input data . For , , and , we use 1024-dimensional patch-level bottleneck features , and

, obtained from pre-trained VGG-16, Inception-V3 and ResNet-50 by ImageNet; and retrain their last three fully connected layers 

Kermany-2018-IMD using gastric histopathology images to calculate the classification probability of each class. Therefore, the patch-unary potential is updated to Eq. (9).


where the data determines , and .

3.1.6 Patch-binary potential

The patch-binary potential of the Eq. (3) demonstrates how similarly the pairwise adjacent patch sites and is to take label given the data and weights, and it is defined as Eq. (10).


where denotes the patch labels and represents the patch weights. A “lattice” (or “reseau” or “array”) layout of eight neighbourhood in Fig. 4 is designed to calculate the probability of each classified patch by averaging each patch of neighbourhood unary probability Li-2019-CHI . The other operations are identical to the patch-binary potential in Sec. 3.1.5.

Figure 4: Eight neighbourhood ‘lattice’ layout of patch-binary potential in the AM module. Average of unary probabilities of eight neighbourhood patches is used as probability of target patch (central patch in orange)

The core process of HCRF can be found in Algorithm 1.

1:The original image, ; The real label image, ;
2:The image for segmentation result, ;
3:Put the original image into network and get ;
4:for pixel in the original image  do
5:     Get defined as Eq.(5);
6:     for pixel in the neighbour nodes of pixel  do
7:         Get defined as Eq.(7);
8:     end for
9:end for
10:Each pixel is taken as the center to get its corresponding patch;
11:Put the original image into three networks and get , and ;
12:for patch in the original image  do
13:     Get   defined as Eq.(8);
14:     for patch in the neighbour nodes of patch  do
15:         Get   defined as Eq.(10);
16:     end for
17:end for
18:for pixel in the original image  do
19:     Get the corresponding patch of pixel ;
20:     Get normalization factor defined as Eq.(4);
21:     Get   defined as Eq.(3);
22:     Get pixel-level classification result;
23:end for
24:Get the image for segmentation result;
25:return ;
Algorithm 1 HCRF

3.2 IC Module

Firstly, the abnormal images of the IC module in the training and validation set are sent to the trained HCRF model. The output map of this step can be used to locate the diagnostically relevant regions and guide the attention of the network for classification of microscopic images. The next step is to threshold and mesh the output probability map. If the attention area occupies more than of the area of a patch, this patch is chosen as the final attention patch (this parameter is obtained by traversing the proportion from to using grid optimization method). The proposed HCRF-AM method emphasizes and gives prominence to features which own higher discriminatory power.

Chemicals that are valuable for the diagnosis of gastric cancer, such as miRNA-215 deng2020mirna , are also often expressed at higher levels in paracancerous tissue than in normal tissue wang2018single , indicating the significance of adjacent tissues for the diagnosis of gastric cancer. This suggests that it is not sufficient that only specific tumor areas for the networks are conserved. Hence, all the images in the IC module dataset as well as the attention patches are used as input. The patches that are most likely to contain tumor areas are given more weight. Meanwhile, the neighboring patches of the attention patches will not be abandoned.

Transfer Learning (TL) is a method that uses CNNs pretrained on a large annotated image database (such as ImageNet) to complete various tasks. TL focuses on acquiring knowledge from a problem and applying it to different but related problems. It essentially uses additional data so that CNNs can decode by using the features of past experience training, after that the CNNs can have better generalization ability and higher efficiency kamishima2009trbagg . In this paper, we have compared the VGG series, Inception series, ResNet series, and DenseNet series as our classifier. The final selection is based on comprehensive classification performance and a number of parameters. We finally apply VGG-16 networks for the TL classification process, where the parameters are pre-trained on the ImageNet dataset deng2009imagenet . The size of input images is pixels.

3.3 Classification Probability-based Ensemble Learning (CPEL)

Since the proposed DL network is a patch-based framework, image-based Classification method needs to be developed to determine the possibility that the whole slice contains tumors, or not. However, the result may not be reliable if we calculate the possibility for whole slice by simply averaging all the scores in probability maps. In order to acquire accurate image-level results from patch-level output of the CNNs, a CPEL algorithm is introduced kittler1998combining . The following equation can calculate the probability of each class:


Here, denotes the image label ( represents normal images and represents abnormal images). is the input image with size of 2048 2048 pixels, and is the input patch with size of 256 256 pixels. T means the number of patches contained in an input image. represents the probability of an image labeled as normal or abnormal; Similarly, represents that of a patch. Additionally, in order to guarantee the image patch classification accuracy, the log operation is carried out to the probability ( means the natural logarithm of a number). The final prediction is determined by the category which owns larger probability.

The whole process of our HCRF-AM framework is shown in Algorithm 2.

1:The image set for training and validation set of CNN with binary label, ; The real label image set for abnormal images in , ; The image set for test set of CNN,;
2:The probability of an image labeled as normal or abnormal of ;
3:Divide into abnormal image set and normal image set according to the binary label;
4:for image in  do
5:     Divide into patches and put them into CNN;
6:     if  then
7:         Get real label image of from ;
8:         Put and into AM module and get segmentation result ;
9:         Divide into patches and get patch set ;
10:         for patch in  do
11:              if over 50% pixels in are segmented as abnormal regions then
12:                   is chosen as attention region;
13:                  Put into CNN;
14:              end if
15:         end for
16:     end if
17:end for
18:Get CNN model;
19:for image in  do
20:     Put into CNN model and get patch-level classification result ;
21:     Get image-level classification result defined as Eq.(11);
22:end for
23:return of ;
Algorithm 2 HCRF-AM framework

4 Experiment

4.1 Experimental Settings

4.1.1 Dataset

In this study, we use a publicly available Haematoxylin and Eosin (H&E) stained gastric histopathology image dataset to test the effectiveness of our HCRF-AM model zhang2018pathological , and some examples in the dataset are represented in Fig. 5.

Figure 5: Examples in the H&E stained gastric histopathological image dataset. The column a. presents the original images of normal tissues. The original images in column b. contain abnormal regions, and column c. shows the corresponding GT images of column b. In the GT images, the brighter regions are abnormal tissues with cancer cells, and the darker regions are normal tissues without cancer cells.

The images in our dataset are processed with H&E stains, which is essential for identifying the various tissue types in histopathological images. In a typical tissue, nuclei are stained blue by haematoxylin, whereas the cytoplasm and extracellular matrix have varying degrees of pink staining due to eosin fischer2008hematoxylin . The images are magnified 20 times and most of the abnormal regions are marked by practical histopathologists. The image format is ‘*.tiff’ or ‘*.png’ and the image size is pixels. In the dataset, there are 140 normal images, 560 abnormal images and 560 Ground Truth (GT) images of the abnormal ones. In the normal images, cells are arranged regularly, the nucleo-cytoplasmic ratio is low, and a stable structure can be seen. By contrast, in the abnormal images, cancerous gastric tissue usually presents nuclear enlargement. Hyperchromasia without visible cell borders and prominent perinuclear vacuolization is also a typical feature miettinen2006gastrointestinal , Miettinen2003Gastrointestinal . In the GT images, the cancer regions are labeled in the sections.

4.1.2 Experimental Design

The proposed HCRF-AM model consists of AM module and IC module, so we distribute the images in the dataset according to the needs. The allocation is represented in Table 1.

Image type AM module IC module
Original normal images 0 140
Original abnormal images 280 280
Table 1: The images allocation for AM module and IC module.

In the AM module, 280 abnormal images and the corresponding GT images are used to train the HCRF model to acquire attention areas, and they are divided into training and validation sets with a ratio of 1:1 (the detail information is in Sec. 3.1).The AM module data setting is represented in Table  2.

Image type Train Validation Sum
Original abnormal images 140 140 280
Augmented abnormal images 53760 53760 107520
Table 2: The AM module data setting.

Before being sent into the model, we augment the training and validation datasets six times. Furthermore, because cellular visual features in a histopathological image is always observed on patch scales by the pathologists, we crop the original and the GT images into pixels. Finally, we obtain 53760 training, 53760 validation images.

In the IC module, 280 abnormal images remain and 140 normal images are applied in CNN classification part (the detail information is in Sec. 3.2). The IC module data setting is represented in Table 3.

Image type Train Validation Test Sum
Original normal images 35 35 70 140
Original abnormal images 35 35 210 280
Cropped normal images 2240 2240
Cropped abnormal images 2240 2240
Table 3: The IC module data setting.

Among them, 70 images from each class are randomly selected for training and validation sets, and the test set contains 70 normal images and 210 abnormal images. Similarly, we mesh these images into patches ( pixels). So, the initial dataset of the IC module comprises of 2240 training and 2240 validation images from each category.

4.1.3 Evaluation Method

To evaluate our model, accuracy, sensitivity, specificity, precision and F1-score metrics are used to measure the classification result. These five indicators are defined in Table  4.

Criterion Definition Criterion Definition
Accuracy Sensitivity
Specificity Precision
Table 4: The five evaluation criteria and corresponding definitions.

In this paper, the samples labeled as normal are positive samples, and the samples labeled as abnormal are negative samples. In the definition of these indicators, TP denotes the true positive, which represents positive cases diagnosed as positive. TN denotes the true negative, which indicates negative cases diagnosed as negative. FP denotes the false positive, which are negative cases diagnosed as positive and FN denotes the false negative, which are positive cases diagnosed as unfavorable. The accuracy is the ratio of the number of samples correctly classified by the classifier to the total number of samples. The sensitivity reflects the positive case of correct judgement accounting for the proportion of the total positive samples, and the specificity reflects the negative case of correct judgement accounting for the proportion of the total negative samples. The precision reflects the proportion of positive samples that are determined by the classifier to be positive samples. The F1-score is an indicator that comprehensively considers the accuracy and sensitivity.

4.2 Baseline Classifier Selection

For baseline, we compare the performance between different CNN-based classifiers and evaluate the effect of Transfer Learning (TL) method on the initial dataset. We use the cropped images in Table 3 as the train and validation set to build the networks and the classification accuracy is obtained on the test set. The result is shown in Fig. 6.

Figure 6: Comparison between image classification performance of different CNN-Based Classifiers on test set.

From Fig. 6, it is observed that the VGG-16 TL method performs the best and achieves an accuracy of 0.875, followed by Resnet 50 and VGG-19 simonyan2014very network. It can be also seen from the Fig. 6 that the method of training models from scratch (De-novo trained CNNs) performs significantly worse than each TL algorithm in terms of classification accuracy. Therefore, the VGG-16 TL method is finally selected as the classifier in the baseline.

4.3 Evaluation of AM Module

Based on our previous experiment sun2020gastric , we find that the HCRF model performs better than other state-of-the-art methods (DenseCRF chen2017deeplab , SegNet badrinarayanan2017segnet , and U-Net) and four classical methods (Level-Set osher1988fronts , Otsu thresholding otsu1979threshold , Watershed vincent1991watersheds , and MRF li1994markov ) when segmenting interesting regions and objects. A comparative analysis with existing work on our dataset is presented in Fig. 7. The state-of-the-art methods are all trained on the dataset in Table 2.

Figure 7: Comparison between HCRF and other attention area extracted methods on test set ((a), (b) two typical examples of attention area extraction results using different methods).

It can be displayed that our HCRF method has better attention area extracted performance than other existing methods in the visible comparison, where more cancer regions are correctly marked and less noise remains. The detailed information of evaluation index is shown in Table. 5.

Criterion Our HCRF DenseCRF U-Net SegNet Level-Set Otsu thresholding Watershed -means MRF
Dice 0.4629 0.4578 0.4557 0.2008 0.2845 0.2534 0.2613 0.2534 0.2396
IoU 0.3259 0.3212 0.3191 0.1300 0.1920 0.1505 0.1585 0.1506 0.1432
Precision 0.4158 0.4047 0.4004 0.3885 0.2949 0.2159 0.2930 0.2165 0.1839
Recall 0.6559 0.6889 0.6896 0.3171 0.5202 0.4277 0.3541 0.4284 0.4991
Specificity 0.8133 0.7812 0.7795 0.8412 0.7284 0.7082 0.7942 0.7078 0.5336
RVD 1.4135 1.6487 1.6736 2.0660 2.9296 2.8859 1.9434 2.8953 4.5878
Accuracy 0.7891 0.7702 0.7684 0.7531 0.6982 0.6598 0.7205 0.6593 0.5441
Table 5: A numerical comparison of the image segmentation performance between our HCRF model and other existing methods. The first row shows different methods. The first column shows the evaluation criteria. Dice is in the interval [0,1], and a perfect segmentation yields a Dice of 1. RVD is an asymmetric metric, and a lower RVD means a better segmentation result. IoU is a standard metric for segmentation purposes that computes a ratio between the intersection and the union of two sets, and a high IoU means a better segmentation result. The bold texts are the best performance for each criterion.

The classical methods have similar results, where entire the extracted region is scattered and abnormal areas cannot be separated. Except recall and specificity, the proposed HCRF performs better on other indexes compared to the state-of-the-art method. The precision has more effectiveness in evaluating the foreground segmentation result and recall has more effectiveness in evaluating the background segmentation result. Consequently, the HCRF model is suitable for us to extract the attention regions and it is chosen in our following experimental steps.

In addition, based on the third-party experimentskurmi2020content , the excellent performance of our HCRF model is also verified. In their experiments, the HCRF and other state-of-the-art methods (BFC zafari2015segmentation , SAM wang2016semi , FRFCM lei2018significantly , MDRAN vu2019methods , LVMAC peng2019local , PABVS yu2020pyramid , FCMRG sheela2020morphological ) are used for nuclei segmentation, and our HCRF model perform well, second only to the method proposed for their task in this experiment.

4.4 Evaluation of HCRF-AM Model

Based on the experiment results in Sec. 4.2, we choose VGG-16 as our classifier in the IC module. First, training and validation sets in Table. 3 as well as their attention areas are used to train the VGG-16 network with a Transfer Learning (TL) strategy. The validation set is applied to tune the CNN parameters and avoid the overfitting or underfitting of CNN during the training process. Second, pixel images in the test sets are cropped into pixel images and sent into the trained network to obtain the patch prediction probability. Thirdly, CPEL method is applied in order to acquire the final label of an image of pixels. Finally, we evaluate the classification performance according to the true labels. In the first step, we use the grid optimization method to generate the best parameter of the proportion of attention area (the step size is 0.1). If the attention area takes up more than

of the area of a patch, and this patch is chosen as the final attention patch. A detailed view of comparison between baseline’s and HCRF-AM model’s classification accuracy and the corresponding confusion matrix is shown in Fig. 

8 and Fig. 9

. For the VGG-16 network, the settings of the hyperparameters are shown in Table. 


Hyper-parameter VGG-16
Initial input size
Initial learning rate 0.0001
Batch-size 64
Loss function Categorical Cross-Entropy
Opimizer Adam kingma2014adam
Table 6: The parameter settings for TL networks.
Figure 8: Image classification results on the validation sets and test sets. The confusion matrices present the classification results of baseline method and our HCRF-AM method, respectively.
Figure 9: Comparison between image classification accuracy of proposed HCRF-AM model and baseline on test sets.

Fig. 8 illustrates that our proposed method achieves higher accuracy for each image type. From Fig. 9, we find that all evaluation indexes of our HCRF-AM model are about to higher than the baseline model. The results denote that although the test set has 280 images which are four times the number of the training and validation sets (the figure for the abnormal images is seven times), our proposed HCRF-AM model also provides good classification performance (especially the classification accuracy of abnormal images), showing high stability and strong robustness of our method. Moreover, it has been verified that the HCRF model achieves better attention region extraction performance using GT images as standard in Sec.4.3. A numerical comparison between the final classification results of our HCRF method and other existing methods as attention extraction method on the test set is given in Table. 7. It is indicated that the HCRF model performs better on all indexes considering the final classification performance.

Criterion HCRF Level-Set DenseCRF U-Net Watershed MRF Otsu SegNet
Accuracy 0.914 0.896 0.893 0.893 0.886 0.882 0.896 0.879
Sensitivity 0.757 0.714 0.686 0.729 0.700 0.643 0.729 0.657
Specificity 0.967 0.957 0.962 0.948 0.948 0.962 0.952 0.952
Precision 0.883 0.847 0.857 0.823 0.817 0.849 0.836 0.821
F1-score 0.815 0.775 0.762 0.773 0.754 0.732 0.779 0.730
Table 7: Numerical comparison of classification results between different attention extracted methods.

4.5 Comparison to Existing Methods

4.5.1 Existing Methods

In order to show the potential of the proposed HCRF-AM method for the GHIC task, it is compared with four existing methods of AMs, including Squeeze-and-Excitation Networks (SENet) hu2018squeeze , Convolutional Block Attention Module (CBAM) woo2018cbam , Non-local neural networks (Non-local) wang2018non and Global Context Network (GCNet) cao2019gcnet . VGG-16 has a great number of parameters and it is hard to converse especially when integrated with other blocks simonyan2014very ioffe2015batch . Based on the experiment constructed, we also find that it is tricky to facilitate the training of VGG-16 from scratch. Meanwhile, the AMs nowadays have been extensively applied to Resnet and it is popular with the researchers hammad2020resnet roy2020attention . Therefore, we combine these existing attention methods with Resnet in our contrast experiment in most cases. The experimental settings of these existing methods are briefly introduced as follows: (1) SE blocks are integrated into a simple CNN with convolution kernel of pixels. (2) CBAM is incorporated into Resnet v2 with 11 layers. (3) Nonlocal is applied to all residual blocks in Resnet with 34 layers. (4) GC blocks are integrated to Resnet v1 with 14 layers. They are all trained on the dataset in Table 3 and the input data size is pixels.

4.5.2 Image Classification Result Comparison

According to the experimental design in the Sec. 4.5.1, we obtained the experimental results in the Table 8.

Ref. Method Accuracy Sensitivity Specificity
hu2018squeeze SENet+CNN 0.754 0.429 0.862
woo2018cbam CBAM+Resnet 0.393 0.843 0.243
wang2018non Non-local+Resnet 0.725 0.571 0.776
cao2019gcnet GCNet+Resnet 0.741 0.557 0.811
Our method HCRF-AM 0.914 0.757 0.967
Table 8: A comparison of the image classification results of our HCRF-AM model and other existing methods on the test set.

Table 8 indicates that: (1) Comparing to four state-of-the-art methods, except sensitivity, the proposed HCRF-AM performs better on other indexes. The overall accuracy of most methods is around 70%, apparently lower than that of ours. (2) The sensitivity of HCRF-AM is the second best only after CBAM-Resnet, and the other two indicators of CBAM-Resnet are far lower than us. And in practical diagnosis, the specificity, which reflects the abnormal case of correct judgement, is of particular importance. (3) The sensitivity and specificity of SE blocks and GC blocks vary widely, whose differences are around 30%. This suggests that their prediction strategy is out of balance (see further discussion in Sec. 5.3).

4.6 Computational Time

In our experiment, we use a workstation with Intel Core i7-8700k CPU 3.20 GHz, 32GB RAM and GeForce RTX 2080 8 GB. The training time of our model includes two modules, the AM module and IC module, taking about 50 h for training 280 images ( pixels) and 1431 s for training 140 images ( pixels), respectively. The AM module and IC module take about 50 h and 1431 s for training, respectively. The mean testing time for one ( pixels) image of the proposed HCRF-AM model is 0.5 s. It suggests that although our model have high computing complexity during the offline training stage, it is very efficient for online testing and proper for the routine clinical workflow.

5 Discussion

5.1 Analysis of the Dataset

The experiment results in Sec. 4.3

suggest that there is still a significant gap in the mis-segmentation problems of our AM module. Hence, we invite our cooperative histopathologists to analyze this open-source dataset and determine the reason for mis-segmentation.

According to Fig. 10 and our cooperative histopathologists’ medical knowledge, when there are too many abnormal regions in a gastric histopathological image, the medical doctors roughly draw the GT images, where not all abnormal regions are figured out, as shown in Fig. 10 (b). This low-quality operation makes some abnormal regions labelled as normal, resulting in attention extraction errors.

Figure 10: Typical examples of some images in our dataset for analysis. (a) presents the original images. (b) denotes the GT images. The regions in the red curves in (b) are the abnormal regions in the redrawn GT images by our cooperative histopathologists. The red regions of (c) shows the attention extraction results by the AM module.

From Fig. 10 (b) and (c), it can be indicated that our HCRF model in AM module may extract the attention regions correctly, but the original GT images may miss the corresponding regions due to the coarse labels. We consult our cooperative histopathologists and use red curves to redraw the regions that the original dataset did not label, where it can be seen that the original GT images in the applied dataset miss quite a lot of abnormal regions. Obviously, the foreground of our segmentation result is closer to the redrawn GT images, instead of the original GT images, which could lead to a low IoU and high RVD.

5.2 Mis-classification Analysis

To analyze the causes of mis-classification, some examples is given in Fig. 11.

Figure 11: Examples of mis-classification. The row (a) presents the normal cases diagnosed as abnormal (FN). The row (b) presents the abnormal cases diagnosed as normal (FP).

For FN samples in Fig. 11(a), some larger bleeding spots can be found in some normal samples, leading to misdiagnosis. Some images have many bright areas in the field of view, which may be caused by being at the edge of the whole slice, and these bright areas cannot provide information effectively. For FP samples in Fig. 11(b), the cancer areas in some images for abnormal samples are small and scattered, making them insufficiently noticed in classification. Simultaneously, in some samples, the staining of the two stains is not uniform and sufficient. In some images, diseased areas appear atypical, which increases the difficulty of classification.

5.3 Analysis of the Existing Attention Mechanisms

Recently, Attention Mechanisms (AMs) have drawn great attention from scholars and they have been extensively applied to solve practical problems in various fields. For example, the non-local network is proposed to model the long-range dependencies using one layer, via a self-AM wang2018non . However, with the increasing area of the receptive field, the computation costs become more extensive at the same time. These AMs, which have a large memory requirement, are not suitable in GHIC tasks because the size of histopathology images are always pixels or even larger. Unlike the natural image, resizing is fatal to histopathological images, which may risk losing a lot of essential details and textures. Another limitation exists in the lack of pretrained models when we utilize TL methods in these existing AMs. Training-from-scratch makes the neural network tricky to convergence, let alone achieve good performance mishkin2015all . There are some methods proposed to solve this problem, such as SE block. The feature importance induced by SE blocks used for feature recalibration highlights informative features and suppress less useful ones, which causes a decrease of computation complexity. And the design of the SE block is simple and can be used directly with existing state-of-the-art Deep Learning (DL) architectures hu2018squeeze .

6 Conclusion and Future Work

In this paper, we develop a novel approach for GHIC using an HCRF based AM. Through experiments, we choose high-performance methods and networks in the AM and IC modules of HCRF-AM model. In the evaluation process, the proposed HCRF method outperforms the state-of-the-art attention area extraction methods, showing the robustness and potential of our method. Finally, our method achieves a classification accuracy of and a specificity of on the testing images. We have compared our proposed method with some existing popular AMs methods that uses same dataset to further verify the performance. Considering the advantages mentioned above, the HCRF-AM model holds the potential to be employed in a human-machine collaboration pattern for early diagnosis in gastric cancer, which may help increase the productivity of pathologists. In the discussion part, the possible causes of misclassification in the experiment are analyzed, which provides a reference for improving the performance of the model.

Though our method provides satisfactory performance, there are a few limitations. First, our proposed HCRF model in the AM module only considers information in single scale, which degrades model performance. Moreover, our model can be further improved by the technique shown in Zormpas2019Superpixel

, where the pathologists incorporate large-scale tissue architecture and context across spatial scales, in order to improve single-cell classification. Second, we have investigated four kinds of DL models, using TL methods and integrating the AM into them. In the future, we can investigate other DL models and compare their results for higher classification accuracy. Finally, our AM is a weakly supervised system at present. Hence, the unsupervised learning method 

dosovitskiy2020image may be of certain reference significance to ours, which applies a pure transformer directly to sequences of image patches and performs well on nature image classification tasks.

This study was supported by the National Natural Science Foundation of China (grant No. 61806047). We thank Miss Xiran Wu, due to her contribution is considered as important as the first author in this paper. We also thank Miss Zixian Li and Mr. Guoxian Li for their important discussion.


  • [1] C. Wild, B. Stewart, and C. Wild. World Cancer Report 2014. World Health Organization, Geneva, Switzerland, 2014.
  • [2] F. Bray, J. Ferlay, I. Soerjomataram, R. Siegel, L. Torre, and A. Jemal. Global Cancer Statistics 2018: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA: A Cancer Journal for Clinicians, 68(6):394–424, 2018.
  • [3] M. Orditura, G. Galizia, V. Sforza, V. Gambardella, A. Fabozzi, M. Laterza, F. Andreozzi, J. Ventriglia, B. Savastano, A. Mabilia, et al. Treatment of Gastric Cancer. World Journal of Gastroenterology: WJG, 20(7):1635, 2014.
  • [4] E. Van Cutsem, X. Sagaert, B. Topal, K. Haustermans, and H. Prenen. Gastric Cancer. The Lancet, 388(10060):2654–2664, 2016.
  • [5] T. Elsheikh, M. Austin, D. Chhieng, F. Miller, A. Moriarty, and A. Renshaw. American Society of Cytopathology Workload Recommendations for Automated Pap Test Screening: Developed by the Productivity and Quality Assurance in the Era of Automated Screening Task Force. Diagnostic Cytopathology, 41(2):174–178, 2013.
  • [6] H. Wang, H. Jia, L. Lu, and Y. Xia. Thorax-Net: An Attention Regularized Deep Neural Network for Classification of Thoracic Diseases on Chest Radiography. IEEE Journal of Biomedical and Health Informatics, 24(2):475–485, 2019.
  • [7] L. Li, M. Xu, X. Wang, L. Jiang, and H. Liu. Attention Based Glaucoma Detection: A large-scale Database and CNN Model. In Proc. of CVPR 2019, pages 10571–10580, 2019.
  • [8] C. Sun, C. Li, J. Zhang, M. Rahaman, S. Ai, H. Chen, F. Kulwa, Y. Li, X. Li, and T. Jiang. Gastric Histopathology Image Segmentation Using a Hierarchical Conditional Random Field. Biocybernetics and Biomedical Engineering, 40(4):1535–1555, 2020.
  • [9] C. Sun, C. Li, J. Zhang, F. Kulwa, and X. Li. Hierarchical Conditional Random Field Model for Multi-object Segmentation in Gastric Histopathology Images. Electronics Letters, 56(15):750–753, 2020.
  • [10] R. Zhu, R. Zhang, and D. Xue. Lesion Detection of Endoscopy Images Based on Convolutional Neural Network Features. In 2015 8th International Congress on Image and Signal Processing (CISP), pages 372–376, 2015.
  • [11] K. Ishihara, T. Ogawa, and M. Haseyama. Detection of Gastric Cancer Risk from X-ray Images via Patch-based Convolutional Neural Network. In 2017 IEEE International Conference on Image Processing (ICIP), pages 2055–2059, 2017.
  • [12] R. Li, J. Li, X. Wang, P. Liang, and J. Gao. Detection of Gastric Cancer and its Histological Type based on Iodine Concentration in Spectral CT. Cancer Imaging, 18(1):1–10, 2018.
  • [13] J. Li, W. Li, A. Sisk, H. Ye, W. Wallace, W. Speier, and C. Arnold. A Multi-resolution Model for Histopathology Image Classification and Localization with Multiple Instance Learning. arXiv Preprint arXiv:2011.02679, 2020.
  • [14] S. Korkmaz, A. Akçiçek, H. Bínol, and M. Korkmaz. Recognition of the Stomach Cancer Images with Probabilistic HOG Feature Vector Histograms by Using HOG Features. In 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY), pages 000339–000342, 2017.
  • [15] S. Korkmaz and H. Binol. Classification of Molecular Structure Images by Using ANN, RF, LBP, HOG, and Size Reduction Methods for Early Stomach Cancer Detection. Journal of Molecular Structure, 1156:255–263, 2018.
  • [16] H. Sharma, N. Zerbe, I. Klempert, S. Lohmann, B. Lindequist, O. Hellwich, and P. Hufnagl. Appearance-based Necrosis Detection Using Textural Features and SVM with Discriminative Thresholding in Histopathological Whole Slide Images. In 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE), pages 1–6, 2015.
  • [17] B. Liu, M. Zhang, T. Guo, and Y. Cheng. Classification of Gastric Slices Based on Deep Learning and Sparse Representation. In 2018 Chinese Control And Decision Conference (CCDC), pages 1825–1829, 2018.
  • [18] H. Sharma, N. Zerbe, C. Böger, S. Wienert, O. Hellwich, and P. Hufnagl. A Comparative Study of Cell Nuclei Attributed Relational Graphs for Knowledge Description and Categorization in Histopathological Gastric Cancer Whole Slide Images. In 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), pages 61–66, 2017.
  • [19] H. Sharma, N. Zerbe, I. Klempert, O. Hellwich, and P. Hufnagl. Deep Convolutional Neural Networks for Automatic Classification of Gastric Carcinoma Using Whole Slide Images in Digital Histopathology. Computerized Medical Imaging and Graphics, 61:2–13, 2017.
  • [20] B. Liu, K. Yao, M. Huang, J. Zhang, Y. Li, and R. Li. Gastric Pathology Image Recognition Based on Deep Residual Networks. In 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), volume 2, pages 408–412, 2018.
  • [21] S. Wang, Y. Zhu, L. Yu, H. Chen, H. Lin, X. Wan, X. Fan, and P. Heng. RMDL: Recalibrated Multi-instance Deep Learning for Whole Slide Gastric Image Classification. Medical Image Analysis, 58:101549, 2019.
  • [22] Z. Song, S. Zou, W. Zhou, Y. Huang, L. Shao, J. Yuan, X. Gou, W. Jin, Z. Wang, X. Chen, et al. Clinically Applicable Histopathological Diagnosis System for Gastric Cancer Detection Using Deep Learning. Nature Communications, 11(1):1–9, 2020.
  • [23] S. Kosaraju, J. Hao, H. Koh, and M. Kang. Deep-Hipo: Multi-scale Receptive Field Deep Learning for Histopathological Image Analysis. Methods, 179:3–13, 2020.
  • [24] O. Iizuka, F. Kanavati, K. Kato, M. Rambeau, K. Arihiro, and M. Tsuneki. Deep Learning Models for Histopathological Classification of Gastric and Colonic Epithelial Tumours. Scientific Reports, 10(1):1–11, 2020.
  • [25] J. Ba, V. Mnih, and K. Kavukcuoglu. Multiple Object Recognition with Visual Attention. arXiv preprint arXiv:1412.7755, 2014.
  • [26] W. Li, K. Liu, L. Zhang, and F. Cheng. Object Detection Based on an Adaptive Attention Mechanism. Scientific Reports, 10(1):1–13, 2020.
  • [27] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In International conference on machine learning, pages 2048–2057, 2015.
  • [28] M. Liu, L. Li, H. Hu, W. Guan, and J. Tian. Image Caption Generation with Dual Attention Mechanism. Information Processing & Management, 57(2):102178, 2020.
  • [29] S. Sharma, R. Kiros, and R. Salakhutdinov. Action recognition using visual attention. arXiv preprint arXiv:1511.04119, 2015.
  • [30] A. BenTaieb and G. Hamarneh.

    Predicting Cancer with a Recurrent Visual Attention Model for Histopathology Images.

    In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 129–137, 2018.
  • [31] L. Li, M. Xu, H. Liu, Y. Li, X. Wang, L. Jiang, Z. Wang, X. Fan, and N. Wang. A Large-Scale Database and a CNN Model for Attention-Based Glaucoma Detection. IEEE Transactions on Medical Imaging, 39(2):413–424, 2019.
  • [32] H. Yang, J. Kim, H. Kim, and S. Adhikari. Guided Soft Attention Network for Classification of Breast Cancer Histopathology Images. IEEE Transactions on Medical Imaging, 39(5):1306–1315, 2019.
  • [33] H. Sun, X. Zeng, T. Xu, G. Peng, and Y. Ma. Computer-aided Diagnosis in Histopathological Images of the Endometrium Using a Convolutional Neural Network and Attention Mechanisms. IEEE Journal of Biomedical and Health Informatics, 24(6):1664–1676, 2019.
  • [34] X. Zhang, Y. Jiang, H. Peng, K. Tu, and D. Goldwasser.

    Semi-Supervised Structured Prediction with Neural CRF Autoencoder.

    In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1701–1711, 2017.
  • [35] A. Wicaksono and S. Myaeng. Toward Advice Mining: Conditional Random Fields for Extracting Advice-Revealing Text Units. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pages 2039–2048, 2013.
  • [36] L. Zhuowen and K. Wang. Human Behavior Recognition Based on Fractal Conditional Random Field. In 2013 25th Chinese Control and Decision Conference (CCDC), pages 1506–1510, 2013.
  • [37] S. Kruthiventi and R. Babu. Crowd Flow Segmentation in Compressed Domain Using CRF. In 2015 IEEE International Conference on Image Processing (ICIP), pages 3417–3421, 2015.
  • [38] D. Liliana and C. Basaruddin. A Review on Conditional Random Fields as a Sequential Classifier in Machine Learning. In 2017 International Conference on Electrical Engineering and Computer Science (ICECOS), pages 143–148, 2017.
  • [39] H. Qu, P. Wu, Q. Huang, J. Yi, G. Riedlinger, S. De, and D. Metaxas. Weakly Supervised Deep Nuclei Segmentation Using Points Annotation in Histopathology Images. In International Conference on Medical Imaging with Deep Learning, pages 390–400, 2019.
  • [40] Z. Konstantinos, F. Henrik, Raza S., R. Ioannis, J. Yann, and Y. Yinyin. Superpixel-based Conditional Random Fields (SuperCRF): Incorporating Global and Local Context for Enhanced Deep Learning in Melanoma Histopathology. Frontiers in Oncology, 9:1045, 2019.
  • [41] Y. Li, M. Huang, Y. Zhang, J. Chen, H. Xu, G. Wang, and W. Feng. Automated Gleason Grading and Gleason Pattern Region Segmentation based on Deep Learning for Pathological Images of Prostate Cancer. IEEE Access, 8:117714–117725, 2020.
  • [42] J. Dong, X. Guo, and G. Wang. GECNN-CRF for Prostate Cancer Detection with WSI. In Proceedings of 2020 Chinese Intelligent Systems Conference, pages 646–658, 2021.
  • [43] S. Kosov, K. Shirahama, C. Li, and M. Grzegorzek. Environmental Microorganism Classification Using Conditional Random Fields and Deep Convolutional Neural Networks. Pattern Recognition, 77:248–261, 2018.
  • [44] C. Li, H. Chen, L. Zhang, N. Xu, D. Xue, Z. Hu, H. Ma, and H. Sun. Cervical Histopathology Image Classification Using Multilayer Hidden Conditional Random Fields and Weakly Supervised Learning. IEEE Access, 7:90378–90397, 2019.
  • [45] Y. Li, X. Wu, C. Li, C. Sun, X. Li, M Rahaman, and H. Zhang. Intelligent Gastric Histopathology Image Classification Using Hierarchical Conditional Random Field based Attention Mechanism. In Proceedings of the 2021 13th International Conference on Machine Learning and Computing, 2021.
  • [46] C. Li, Y. Li, C. Sun, H. Chen, and H. Zhang. A Comprehensive Review for MRF and CRF Approaches in Pathology Image Analysis. arXiv preprint arXiv:2009.13721, 2020.
  • [47] J. Lafferty, A. McCallum, and F. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. 2001.
  • [48] P. Clifford. Markov Random Fields in Statistics; Disorder in Physical Systems: A Volume in Honour of John M. Hammersley. Oxford University Press, 19:32, 1990.
  • [49] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. Yuille. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848, 2018.
  • [50] S. Zheng, S. Jayasumana, B. Romera-Paredes, et al. Conditional Random Fields as Recurrent Neural Networks. In Proc. of ICCV 2015, pages 1–17, 2015.
  • [51] R. Gupta. Conditional Random Fields. Unpublished Report, IIT Bombay, 2006.
  • [52] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional Networks for Biomedical Image Segmentation. In Proc. ofMICCAI 2016, pages 234–241, 2015.
  • [53] K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for Large-scale Image Recognition. arXiv:1409.1556, 2014.
  • [54] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2818–2826, 2016.
  • [55] K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
  • [56] S. Kumar and M. Hebert. Discriminative Random Fields. International Journal of Computer Vision, 68(2):179–201, 2006.
  • [57] C. Li, H. Chen, D. Xue, Z. Hu, L. Zhang, L. He, N. Xu, S. Qi, H. Ma, and H. Sun. Weakly Supervised Cervical Histopathological Image Classification Using Multilayer Hidden Conditional Random Fields. In Proc. of ITIB 2019, pages 209–221, 2019.
  • [58] D. Kermany, M. Goldbaum, W. Cai, C. Valentim, H. Liang, S. Baxter, A. McKeown, G. Yang, X. Wu, F. Yan, et al. Identifying Medical Diagnoses and Treatable Diseases by Image-based Deep Learning. Cell, 172(5):1122–1131, 2018.
  • [59] S. Deng, X. Zhang, Y. Qin, W. Chen, H. Fan, X. Feng, J. Wang, R. Yan, Y. Zhao, Y. Cheng, et al. miRNA-192 and-215 Activate Wnt/-catenin Signaling Pathway in Gastric Cancer via APC. Journal of Cellular Physiology, 235(9):6218–6229, 2020.
  • [60] M. Wang, Y. Yu, F. Liu, L. Ren, Q. Zhang, and G. Zou. Single Polydiacetylene Microtube Waveguide Platform for Discriminating microRNA-215 Expression Levels in Clinical Gastric Cancerous, Paracancerous and Normal Tissues. Talanta, 188:27–34, 2018.
  • [61] T. Kamishima, M. Hamasaki, and S. Akaho. TrBagg: A Simple Transfer Learning Method and its Application to Personalization in Collaborative Tagging. In 2009 Ninth IEEE International Conference on Data Mining, pages 219–228, 2009.
  • [62] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei. Imagenet: A Large-scale Hierarchical Image Database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
  • [63] J. Kittler, M. Hatef, R. Duin, and J. Matas. On Combining Classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226–239, 1998.
  • [64] Z. Zhang and C. Lin. Pathological Image Classification of Gastric Cancer Based on Depth Learning. ACM Trans. Intell. Syst. Technol., 45(11A):263–268, 2018.
  • [65] A. Fischer, K. Jacobson, J. Rose, and R. Zeller. Hematoxylin and Eosin Staining of Tissue and Cell Sections. Cold Spring Harbor Protocols, 2008(5):pdb–prot4986, 2008.
  • [66] M. Miettinen and J. Lasota. Gastrointestinal Stromal Tumors: Review on Morphology, Molecular Pathology, Prognosis, and Differential Diagnosis. Archives of Pathology & Laboratory Medicine, 130(10):1466–1478, 2006.
  • [67] M. Miettinen. Gastrointestinal Stromal Tumors (GISTs): Definition, Occurrence, Pathology, Differential Diagnosis and Molecular Genetics. Polish Journal of Pathology, 54, 2003.
  • [68] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. Yuille. Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848, 2017.
  • [69] V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A Deep Convolutional Encoder-decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12):2481–2495, 2017.
  • [70] S. Osher and J. Sethian. Fronts Propagating with Curvature-dependent Speed: Algorithms Based on Hamilton-Jacobi Formulations. Journal of Computational Physics, 79(1):12–49, 1988.
  • [71] N. Otsu. A Threshold Selection Method from Gray-level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1):62–66, 1979.
  • [72] L. Vincent and P. Soille. Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(6):583–598, 1991.
  • [73] S. Li. Markov Random Field Models in Computer Vision. In Proc. of ECCV 1994, pages 361–370, 1994.
  • [74] Y. Kurmi and V. Chaurasia.

    Content-based Image Retrieval Algorithm for Nuclei Segmentation in Histopathology Images.

    Multimedia Tools and Applications, pages 1–21, 2020.
  • [75] S. Zafari, T. Eerola, J. Sampo, H. Kälviäinen, and H. Haario. Segmentation of Overlapping Elliptical Objects in Silhouette Images. IEEE Transactions on Image Processing, 24(12):5942–5952, 2015.
  • [76] Z. Wang. A Semi-automatic Method for Robust and Efficient Identification of Neighboring Muscle Cells. Pattern Recognition, 53:300–312, 2016.
  • [77] T. Lei, X. Jia, Y. Zhang, L. He, H. Meng, and A. Nandi. Significantly Fast and Robust Fuzzy c-means Clustering Algorithm Based on Morphological Reconstruction and Membership Filtering. IEEE Transactions on Fuzzy Systems, 26(5):3027–3041, 2018.
  • [78] Q. Vu, S. Graham, T. Kurc, M. To, M. Shaban, T. Qaiser, N. Koohbanani, S. Khurram, J. Kalpathy-Cramer, T. Zhao, et al. Methods for Segmentation and Classification of Digital Microscopy Tissue Images. Frontiers in Bioengineering and Biotechnology, 7:53, 2019.
  • [79] Y. Peng, S. Liu, Y. Qiang, X. Wu, and L. Hong.

    A Local Mean and Variance Active Contour Model for Biomedical Image Segmentation.

    Journal of Computational Science, 33:11–19, 2019.
  • [80] C. Yu, Y. Yan, S. Zhao, and Y. Zhang. Pyramid Feature Adaptation for Semi-supervised Cardiac Bi-ventricle Segmentation. Computerized Medical Imaging and Graphics, 81:101697, 2020.
  • [81] C. Sheela and G. Suganthi. Morphological Edge Detection and Brain Tumor Segmentation in Magnetic Resonance (MR) Images Based on Region Growing and Performance Evaluation of Modified Fuzzy C-Means (FCM) Algorithm. Multimedia Tools and Applications, pages 1–14, 2020.
  • [82] D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • [83] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation Networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018.
  • [84] S. Woo, J. Park, J. Lee, and I. Kweon. Cbam: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), pages 3–19, 2018.
  • [85] X. Wang, R. Girshick, A. Gupta, and K. He. Non-local Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7794–7803, 2018.
  • [86] Y. Cao, J. Xu, S. Lin, F. Wei, and H. Hu. Gcnet: Non-local Networks Meet Squeeze-excitation Networks and Beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0, 2019.
  • [87] S. Ioffe and C. Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning, pages 448–456, 2015.
  • [88] M. Hammad, P. Pławiak, K. Wang, and U. Acharya. ResNet-Attention Model for Human Authentication Using ECG Signals. Expert Systems, page e12547, 2020.
  • [89] S. Roy, S. Manna, T. Song, and L. Bruzzone. Attention-Based Adaptive Spectral-Spatial Kernel ResNet for Hyperspectral Image Classification. IEEE Transactions on Geoscience and Remote Sensing, 2020.
  • [90] D. Mishkin and J. Matas. All You Need is a Good Init. arXiv preprint arXiv:1511.06422, 2015.
  • [91] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929, 2020.