Gleason Grading of Histology Prostate Images through Semantic Segmentation via Residual U-Net

by   Amartya Kalapahar, et al.

Worldwide, prostate cancer is one of the main cancers affecting men. The final diagnosis of prostate cancer is based on the visual detection of Gleason patterns in prostate biopsy by pathologists. Computer-aided-diagnosis systems allow to delineate and classify the cancerous patterns in the tissue via computer-vision algorithms in order to support the physicians' task. The methodological core of this work is a U-Net convolutional neural network for image segmentation modified with residual blocks able to segment cancerous tissue according to the full Gleason system. This model outperforms other well-known architectures, and reaches a pixel-level Cohen's quadratic Kappa of 0.52, at the level of previous image-level works in the literature, but providing also a detailed localisation of the patterns.



There are no comments yet.


page 4


Prostate Gland Segmentation in Histology Images via Residual and Multi-Resolution U-Net

Prostate cancer is one of the most prevalent cancers worldwide. One of t...

Application of Graph Based Features in Computer Aided Diagnosis for Histopathological Image Classification of Gastric Cancer

The gold standard for gastric cancer detection is gastric histopathologi...

TriResNet: A Deep Triple-stream Residual Network for Histopathology Grading

While microscopic analysis of histopathological slides is generally cons...

Refined Deep Neural Network and U-Net for Polyps Segmentation

The Medico: Multimedia Task 2020 focuses on developing an efficient and ...

Automated risk classification of colon biopsies based on semantic segmentation of histopathology images

Artificial Intelligence (AI) can potentially support histopathologists i...

Machine Learning-based Automatic Graphene Detection with Color Correction for Optical Microscope Images

Graphene serves critical application and research purposes in various fi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Prostate cancer is the second most common cancer in men [Ferlay2015Cancer2012] and new cases account for of all new cancer diagnoses in men each year [Siegel2016Cancer2016]. The Gleason grading system is widely accepted as a part of a standard protocol when determining the severity of cancer and is related to the growth pattern of tumour glands [gleason]. The system consists of three grades, from to , each of them including clusters of glandular patterns (or Gleason patterns, referred to in this paper also as ) with similar prognosis (see Fig. 5). In the clinical practice, the stained prostate biopsies are analysed by visual inspection by the pathologists, in order to detect cancerous patterns. Evaluating every single sample manually is a very time-consuming and subjective task [Litjens2016DeepDiagnosis]. For this reason, in recent years, the use of con computer-aided-diagnosis tools based on computer-vision algorithms has experimented a growth in this field.

Figure 5: Examples of histology prostate regions. (a): Non-cancerous glands, (b): Gleason pattern , (c): Gleason pattern and (d) Gleason pattern .

Previous works in the literature have carried out different strategies in order to analyse prostate biopsy images. There are three main approaches for prostate histology images processing: image-level predictions [Arvaniti2018AutomatedLearning, Nir2019ComparisonImages] , pixel-level segmentation [Li2019PathImages, Ing2018SemanticNetworks, Li2017AProstatectomies] or gland-level analysis [Garcia2019First-stageLearning]. The image-level processing provides a general cancerous pattern for a region, lacking a precise delineation of the structures in the tissue, while the gland-level analysis is limited to cancerous patterns with glandular structures (i.e. Gleason pattern or ). Contrary, the pixel-level semantic segmentation can work over all the different cancerous patterns, and provides a precise delimitation of the cancerous patterns in the tissue. Nevertheless, one limiting factor in prior works regarding semantic segmentation of Gleason patterns is the low prevalence of Gleason pattern . Despite the fact that image-level studies have been able to full gradation of Gleason patterns [Arvaniti2018AutomatedLearning]

, the deep-learning models for semantic segmentation usually require larger amounts of data. This fact has led to the segmentation of Gleason pattern

and in a unified class, using low grade (Gleason Pattern ) or high grade (Gleason pattern or ). The main state-of-the-art convolutional neural networks (CNNs) have been used for this task. In particular, in [Li2017AProstatectomies] a multi-resolution modification of the U-Net architecture is proposed, while in [Ing2018SemanticNetworks] the architectures proposed are the Fully-Convolutional Networks and the SegNet. Finally, a recent work proposed Region-CNNs for both segmentation and glandular structures detection [Li2019PathImages].

In this work, we explore the automatic detection and grading of prostate tumour growth patterns by means of semantic segmentation of the Gleason grades in histology images. To the best of the authors’ knowledge, this is the first time that automatic deep-learning segmentation models are used for the full gradation of cancerous patterns in prostate biopsies. One of the main contributions of this work is the validation of different well-known architectures for this task. In particular, we compare the performance of the Fully-Convolutional Networks, the SegNet and the U-Net in the validation cohort. Furthermore, we propose the modification of the U-Net architecture based on residual blocks for this task, outperforming previously mentioned architectures. This model has a comparable behaviour distinguishing between different Gleason grades than previous image-level approaches in the literature and offers an accurate delimitation of the different patterns.

2 Materials

The database used in this work is composed of prostate biopsies from patients. Whole Slides Images were obtained by staining and digitising the biopsies at magnification. The images were carefully analysed by a group of pathologists from Hospital Clínico of Valencia, and pixel-level annotations were carried out following the Gleason grading system. In order to process the large Whole Slide Images, those were re-sampled to resolution, and sliding-window patches of pixels and of overlap were obtained. For each image, a mask was extracted with the pixel-level semantic group among background (BG), non-cancerous tissue (NC), Gleason pattern (GP3), Gleason pattern (GP4) or Gleason pattern (GP5). Thus, the database is composed of images with its respective semantic masks.

3 Methods

The Gleason pattern grading of prostate images is addressed in this work by the pixel-level semantic segmentation using different convolutional-neural-networks models. Those are based on well-known architectures for image segmentation: Fully-Convolutional Networks, SegNet, and the U-Net. The input images are resized during the training process to pixels in order to avoid memory problems. The proposed models in this work share the same output configuration: a convolutional layer with many filters as classes to be predicted. Concretely, the defined labels are: background (BG), non-cancerous tissue (NC), Gleason pattern (GP3), Gleason pattern (GP4) or Gleason pattern

(GP5). Then, a pixel-level soft-max activation is used to obtain the probability maps. During the inference stage, a predicted map is obtained for each image assigning the class with a higher probability to each pixel.

3.1 Fully-convolutional networks

Fully-Convolutional Networks (FCN) were proposed in [Long2015FullySegmentation]

as an extension of classic classification architectures for semantic segmentation tasks. Convolutional neural networks (CNNs) for image classification are composed of a feature-extraction stage (base model) via stacked convolutional filters and spatial dimension reduction by max-pooling operations, and a classification phase through fully-connected layers (top model). In the FCN architecture, the top model is based on convolutional filters, providing a pixel-level prediction on the last activation maps. The main drawback of this architecture is the low resolution in the last activation map of the base model. For that purpose, pixel-level predictions at different pooling levels are combined. The lower pooling level used in the prediction is called the stride. The base model used for the feature extraction and the stride level define a concrete

Fully-Convolutional Network architecture (e.g. for s stride of ).

3.2 Segnet

The SegNet architecture [Badrinarayanan2017SegNet:Segmentation] for semantic segmentation is based on the Fully-Convolutional Networks. After the base model, a decoder branch recovers the spatial information via stacked convolutional blocks and upsampling operations. The upsampling is based on the indices used in the base model during the max-pooling operations in order to perform a non-linear reconstruction of the original dimensions.

3.3 U-Net architecture

The U-Net architecture is a segmentation model proposed for medical applications in [Ronneberger2015U-net:Segmentation]. The configuration is based on two branches: one encoder in charge of extracting the relevant features in the image, and a decoder controlling the reconstruction of the probability segmentation maps. The encoder branch consists of stacked convolutional blocks with dimensional reduction via a max-pooling operator. Each convolutional block doubles the number of filters, while the pooling operator resizes the image in a half. In particular, convolutional blocks are used, increasing the number of filters from up to and the spatial dimensions from to pixels. In the decoder branch, the convolutional blocks are followed by deconvolutions that increase the dimension of the images in a factor of and reduce the number of filters in a half. Furthermore, the encoder is connected to the decoder via the concatenation of the activation maps of corresponding levels after the deconvolutional filter. An overview of the U-Net used in this work is presented in Fig. 6. The convolutional block is composed of two convolutional filters of

pixels and ReLU as an activation function.

Figure 6: U-Net architecture for prostate cancer gradation. BN: background, NC: non cancerous, GP3: Gleason pattern , GP4: Gleason pattern , GP5: Gleason pattern .

3.4 U-Net composed of residual blocks

In order to improve the performance of the standard U-Net, the convolutional blocks (see blue connections in Fig. 6) are modified with a residual configuration in the architecture. The residual blocks [He2016DeepRecognition] are a type of configuration of convolutional filters with skip-additive connections that have shown good properties for model optimisation and performance. In particular, the identity-mapping configuration proposed in [He2016IdentityNetworks] is used. This is composed of three convolutional filters with a size of . The output of the first layer is connected in a skip connection with the result of processing a batch normalisation, ReLU activation and the other two filters to the same output. For the proposed U-Net modification, a previous convolutional filter is used to normalise the number of filters (see Fig. 7).

Figure 7: Residual Block with identity mapping modified for the U-Net architecture. : number of filters in the input volume. : number of filters in the output volume. , : spatial dimensions of the activation volumes.

3.5 Loss function

The loss function used during the training process in the Dice function, introduced in

[vnet] for Volumetric Image Segmentation. This function makes a balance between intersection and union of predicted and reference masks, being appropriate for imbalanced datasets. The Dice is defined as follows:


where and denote the on-hot-encoded predicted labels and ground truth respectively for a batch of images for the class . Note that denotes one of the classes: background, non cancerous, GP3, GP4, and GP5.

4 Experiments and Results

In order to perform a validation and comparison of the different segmentation models described previously, the database was partitioned following a hold-out strategy. The images were divided into groups. Around the of the images were used for training, while two subsets with of the images were used for validation and testing. Note that the class balance was maintained among groups, and each patient was assigned uniquely to one group in order to avoid overestimation of the models’ performance. As a figure of merit, the Dice index () was obtained in the predicted segmentation maps.

We trained types of convolution-neural-networks models for semantic segmentation of Gleason patterns. In particular, the Fully-Convolutional Network () with pre-trained VGG16 weights as base model and stride of , Segnet, and U-Net architecture with its standard configuration and the one modified with residual blocks

were used. The hyperparameters were empirically optimised in the validation cohort. The

model was trained using an SGD optimiser with a Nesterov momentum of

, a learning rate of and a decay rate of . In the U-Net model the learning rate was fixed at , and Adam was used as the optimiser. Those models were trained during epochs in a mini-batch strategy of images. Regarding the Segnet and models, Adam optimiser was also used, but the learning rate increased to . Those models were trained during epochs with a batch size of images. The results obtained in the validation subset are presented in Table 1.

Table 1: Results in the validation subset for the different models. BG: background, NC: non cancerous, GP: Gleason pattern.

Regarding the results obtained in the validation cohort, the U-Net modified with residual blocks, , showed the best performance. The worse performing model was the , only able to recognise properly the tissue with Gleason pattern . Better results were obtained with the Segnet model than using the basic U-Net architecture, with an average Dice index for the classes related to prostate tissue (i.e. NC, GP3, GP4 and GP5) of and respectively. The use of residual blocks showed to be crucial for the improvement of the U-Net model, reaching an average Dice for these grades of . The best performing model, , was trained in the whole training and validation set and the resultant model was evaluated in the test cohort. The obtained figures of merit and some representative examples of the semantic segmentation are presented in Table 2 and Fig. 20, respectively.

Table 2: Results in the test set for the different models. BG: background, NC: non cancerous, GP: Gleason pattern.
Figure 20: Examples of our proposed performance in the test set. Green: non cancerous, yellow: Gleason pattern , orange: Gleason pattern and red: Gleason pattern . (a): Original Image, (b): Reference, (c): Predicted.

The results obtained in the test subset show a slight decrease in model performance. The average Dice in the tissue classes drops to . This could be caused by the known internal heterogeneity in the Gleason grades, and the challenge of obtaining homogeneous subsets in the database during the partition stage. Moreover, the Dice index is a rigorous metric, and it does not take into account that most of the errors occur between adjacent classes (see Fig. 20 example four). In previous literature related to image-level full Gleason gradation, the metric used is the quadratic Cohen’s Kappa () [Cohen1968WeightedCredit]

to take into account this information. In order to establish fair comparisons with previous literature, the background class was joined to the non-cancerous class. The normalised confusion matrix is presented in Fig.

22, showing that most of the errors occur among adjacent classes and in pixels misclassified as cancerous due to a wrong delimitation of cancerous tissue (see Fig. 20 examples one to three). The pixel-level value obtained was , at the level of previous works in image-level approaches: in [Arvaniti2018AutomatedLearning] in the test cohort or in [Nir2019ComparisonImages] for the validation subset.

Figure 22: Confusion matrix of the pixel-level Gleason grade prediction in the test cohort with the proposed model. NC: non cancerous, GP3: Gleason pattern , GP4: Gleason pattern , GP5: Gleason pattern .

5 Conclusions

In this research, we have proposed an U-Net architecture modified with residual blocks able to perform semantic segmentation of the cancerous patterns in prostate images according to the Gleason grading system. The use of residual configurations is crucial to outperform other well-known architectures such that Segnet. With the proposed model, a pixel-level Cohen’s quadratic kappa of is reached in the test cohort. This performance is at the level of previous works for image-level grading of Gleason patterns, but our model offers a more accurate delimitation of cancerous patterns in the tissue.

Further studies will focus on extensive comparison of the main three approaches for prostate image analysis for the full Gleason gradation: image-based, pixel-level segmentation methods and gland-level analysis.