Cribriform pattern detection in prostate histopathological images using deep learning models

10/09/2019 ∙ by Malay Singh, et al. ∙ 18

Architecture, size, and shape of glands are most important patterns used by pathologists for assessment of cancer malignancy in prostate histopathological tissue slides. Varying structures of glands along with cumbersome manual observations may result in subjective and inconsistent assessment. Cribriform gland with irregular border is an important feature in Gleason pattern 4. We propose using deep neural networks for cribriform pattern classification in prostate histopathological images. 163708 Hematoxylin and Eosin (H&E) stained images were extracted from histopathologic tissue slides of 19 patients with prostate cancer and annotated for cribriform patterns. Our automated image classification system analyses the H&E images to classify them as either `Cribriform' or `Non-cribriform'. Our system uses various deep learning approaches and hand-crafted image pixel intensity-based features. We present our results for cribriform pattern detection across various parameters and configuration allowed by our system. The combination of fine-tuned deep learning models outperformed the state-of-art nuclei feature based methods. Our image classification system achieved the testing accuracy of 85.93 ± 7.54 (cross-validated) and 88.04 ± 5.63 ( additional unseen test set) across three folds. In this paper, we present an annotated cribriform dataset along with analysis of deep learning models and hand-crafted features for cribriform pattern detection in prostate histopathological images.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 6

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The microscopic appearance of prostatic adenocarcinomas is described as having small acini arranged in one or several patterns. Its diagnosis relies on a combination of tissue architectural structures and cytological findings. These diagnosis criterion are considered in the Gleason grading system for prostate cancer (PCa). This grading system is based on the glandular patterns of the tumor and is an established prognostic indicator [humphrey2004gleason, gleason1977histologic]. Here, various tissue architectural patterns are identified and assigned a pattern ranging from 1 (least aggressive) to 5 (most aggressive). Cribriform pattern in malignant glands is one kind of tissue architecture in prostate, it is one of the important features considered in determining if a tumor exhibits Gleason pattern 4. Also, it is critical to identify Gleason 3 from Gleason 4 tumor since it changes clinical decision. Only Gleason 3 lesions allow active surveillance, instead of subjecting patients to surgery or radiotherapy.

The Gleason grading system has undergone several modifications over the years [gordetsky2016grading]. According to several studies, cases with cribriform glands previously diagnosed as having Gleason pattern 3 would uniformly be considered grade 4 by today’s contemporary standards [mcneal1996spread, ross2012adenocarcinomas]. Distinguishing whether a prostatic tumor exhibit cribriform pattern or not is relevant, since studies have reported that its presence in radical prostatectomy specimens are associated with biochemical recurrence, extraprostatic extension, positive surgical margins, distant metastases, and cancer-specific mortality [iczkowski2011digital, kir2014association, sarbay2014significance, trudel2014prognostic, kweldam2015cribriform].

Also, Kweldam et al. [kweldam2015cribriform] while studying the prognostic value of individual Gleason grade 4 patterns among Gleason score 7 PCa patients concluded that cribriform pattern is a strong predictor for distant metastasis and disease-specific death. The median time to disease-specific death in men with cribriform pattern was 120 months, as compared to 150 months in men without cribriform pattern. Therefore, proper recognition of cribriform growth in daily pathology practice could be a useful tool in predicting adverse clinical outcome in PCa patients.

The Gleason grading system is inherently subjective and hence has led to high intra-observer and inter-observer variability. Various recent research contributions have suggested that the pathologist’s training and experience affect the degree of inter-observer agreement [humphrey2003prostate, Allsbrook, allsbrook2001interobserver]. Also, diagnosis of PCa by microscopic tissue examination is tedious and time consuming.

The aforementioned issues of low inter-observer agreement and the requirement of identifying various types of glandular patterns has motivated research for development of automated image based grading systems for PCa. Various computer-aided diagnosis (CAD) systems have been developed using a multitude of machine learning, image processing, and feature extraction methods 

[madabhushi2016image, nir2019comparison]. These systems have usually automated the task(s) of object detection, image/object classification, and image segmentation for aiding pathologists. For PCa, CAD systems have generally emphasized on gland segmentation, nuclei segmentation, and image classification tasks. Cribriform pattern classification is a different task for the conventional PCa CAD systems and it is yet to get the much needed attention. This paper is an attempt to fill in this gap by presenting an automated image based cribriform pattern classification system. The main contribution of this paper are

  1. our annotated cribriform dataset,

  2. hand-crafted nuclei features, and

  3. combination of nuclei features with deep learning (DL) models

for cribriform pattern detection in prostate histopathological images.

These hand-crafted nuclei features are designed to incorporate relevant nuclei texture and spatial information for cribriform pattern detection. The DL architectures used in our method have been chosen and/or modified according to their performance in similar histopathological tasks as suggested in literature [shin2016deep, coudray2018classification, sharma2017deep, bejnordi2018using, gecer2018detection, araujo2017classification, alom2019advanced]. Recently, various deep models like ResNet [he2016ResNet], VGG16 [simonyan2014very], VGG19 [simonyan2014very], Inception-v3 (GoogLeNet) [szegedy2015going, szegedy2016rethinking], and DenseNet [huang2017densely]

have achieved top performance in the ImageNet 

[russakovsky2015imagenet] challenge. This paper builds upon the recent success of DL in medical images’ tasks [shin2016deep, coudray2018classification, sharma2017deep, bejnordi2018using, gecer2018detection, araujo2017classification] and robust performance of ResNet [he2016ResNet], VGG16 [simonyan2014very], VGG19 [simonyan2014very], Inception-v3 (GoogLeNet) [szegedy2015going, szegedy2016rethinking], and DenseNet [huang2017densely] for the task of cribriform pattern detection. These DL architectures have been fine-tuned via transfer learning before combination with hand-crafted nuclei features for cribriform pattern detection. This paper focuses on the clinical problem of cribriform pattern detection and provides promising machine learning based method to aid pathologists.

2 Related work

Various CAD systems have been developed for prostate histopathological image classification while automating gland segmentation, nuclei segmentation, and image classification tasks [madabhushi2016image, xu2010high, nguyen2014prostate, kwak2017multiview, doyle2007automated, kwak2017nuclear, niazi2017visually, Diamond, khan2017predicting, gummeson2017automatic, kallen2016towards, litjens2017survey, yap2015automated, singh2017gland, singh2017study, ali2013cell, nir2019comparison]. Cribriform pattern classification is an altogether new task for the conventional PCa CAD systems. A general pipeline for prostate histopathological image classification is gland segmentation followed by feature extraction from these segmented glands for classification [xu2010high, nguyen2014prostate, kwak2017multiview].

Few approaches like Diamond et al. [Diamond] and Lin et al. [lin2016curvelet] have bypassed this segmentation step. Diamond et al. [Diamond] proposed using morphological and textural features to identify regions belonging to stroma, PCa, and normal tissue. Lin et al. [lin2016curvelet]

used curvelet-based textural features with Support Vector Machine (SVM) 

[svm] for classifying a given prostate histopathological image as Gleason patterns 3+3, 3+4, 4+3, and 4+4.

Nguyen et al. [nguyen2014prostate] used shape and textural features to identify nuclei regions. A nuclei-lumen graph made from nuclei and lumen boundary pixels was processed by normalized cuts [shi2000normalized] for final gland segmentation. This paper then used various graph based features with SVM [svm] for automated PCa grading. Kwak et al. [kwak2017multiview] proposed using multiple scales in the same system for PCa grading. Nuclei, gland and lumen regions were segmented using features in HSV and CIELab color spaces. For a given image, first the segmentation was performed and morphological features at multiple scales were used for final automated PCa grading. In an another similar approach, Ali et al. [ali2013cell] proposed using nuclei-graphs to compute features for predicting biochemical recurrence in prostate histopathological tissue microarray images. Fukuma et al. [fukuma2016study] and Khan et al. [khan2017predicting] also proposed using nuclei graph features for automated grading of brain and prostate histopathological images respectively.

The methods as discussed above focused on the development of hand-crafted features which are to be used along with classical machine learning methods. They also focused on a different problem of prostate histopathological image classification instead of cribriform pattern classification. On similar lines, various DL architectures have been deployed for prostate histopathological images’ tasks[greenspan2016guest, madabhushi2016image, litjens2016deep, kwak2017nuclear, alom2019advanced]. Generally, DL architectures require a preferably large dataset for training and evaluation purposes due to their huge parameter space. As manually annotated data in the medical imaging domain is scarce, various recent research efforts have focused on transfer learning [shin2016deep, chang2017unsupervised, gessert2019deep, swati2019brain, khan2019novel, hekler2019pathologist, brancati2019deep, ahmad2019classification, hosny2019classification, rai2019investigation, kather2019deep]. One of the approach for transfer learning is fine-tuning of pre-trained DL networks. In fine-tuning of pre-trained network, some layers are frozen during training along with small learning rate. We list out a few recent approaches with the corresponding pre-trained models used via fine-tuning along with medical image task as follows:

  • Shin et al. [shin2016deep]: Uses GoogLeNet [szegedy2015going] and AlexNet [krizhevsky2009learning] for “Thoracoabdominal Lymph Node Detection” and “Interstitial Lung Disease Classification”.

  • Gessert et al. [gessert2019deep]: Uses ResNet [he2016ResNet], VGG16 [simonyan2014very], and DenseNet [huang2017densely] for cancer tissue identification in confocal laser microscopy images for colorectal cancer.

  • Khan et al. [swati2019brain]: Uses VGG16 [simonyan2014very] for brain tumor classification in Magnetic Resonance (MR) images.

  • Khan et al. [khan2019novel]: Uses GoogLeNet [szegedy2015going], ResNet [he2016ResNet], and VGG16 [simonyan2014very] for breast cancer cytological image classification. They also combined these fine-tuned networks by average pooling.

  • Hekler et al. [hekler2019pathologist]: Uses ResNet [he2016ResNet] for H&E stained melanoma histopathological image classification.

  • Brancati et al. [brancati2019deep]: Uses ResNet [he2016ResNet] for invasive ductal carcinoma detection and lymphoma classification.

  • Ahmad et al. [ahmad2019classification]: Uses ResNet [he2016ResNet], GoogLeNet [szegedy2015going], and AlexNet [krizhevsky2009learning] for breast cancer cytological image classification.

  • Hosny et al. [hosny2019classification]: Uses AlexNet [krizhevsky2009learning] for skin lesion image classification.

  • Kather et al. [kather2019deep]: Uses ResNet [he2016ResNet] to predict microsatellite instability in gastrointestinal cancer.

Apart from the latest transfer learning based CAD approaches, various DL architectures have also been used for breast cancer and lung cancer histopathological images. Coudray et al.[coudray2018classification] trained an Inception-v3 (GoogLeNet) [szegedy2015going, szegedy2016rethinking] on whole slide images (WSI) obtained from The Cancer Genome Atlas to automatically classify histopathology images into Adenocarcinoma (LUAD), squamous cell carcinoma (LUSC) or normal lung tissue. Sharma et al.[sharma2017deep] studied H&E stained histopathological images of gastric carcinoma and applied deep learning to classify cancer based on immunohistochemical response and necrosis detection based on the existence of tumor necrosis in the tissue. Bejnordi et al.[bejnordi2018using] applied deep learning on H&E stained breast cancer images to discriminate between stroma surrounding invasive cancer and stroma from benign biopsies. Gecer et al. [gecer2018detection] proposed an algorithm based on deep convolutional networks that classify WSI of breast biopsies into five diagnostic categories. Araujo et al. [araujo2017classification]

designed a multi-scale deep convolutional neural network to classify normal tissue, benign lesion, in situ carcinoma, and invasive carcinoma, and in two classes, carcinoma and non-carcinoma.

Various recent approaches in machine learning literature have suggested using deeper networks for better classification/detection performance  [he2016ResNet, huang2017densely]. Following which, various deep models like ResNet [he2016ResNet] and DenseNet [huang2017densely] have achieved top performance in the ImageNet [russakovsky2015imagenet] challenge. These networks have outperformed the previous top performer GoogLeNet [szegedy2015going]. On the other hand, medical images with their heterogeneous patterns has warranted need of a more sophisticated DL model when compared to natural images. This paper builds upon the recent success of DL in medical images’ tasks and top performance of ResNet [he2016ResNet] and DenseNet [huang2017densely] for the task of cribriform pattern classification. These two networks have been compared with SVM classifier which used nuclei based features [fukuma2016study, kwak2017nuclear, khan2017predicting], VGG16 [simonyan2014very], VGG19 [simonyan2014very], and Inception-v3 (GoogLeNet) [szegedy2015going, szegedy2016rethinking]. The VGG16 [simonyan2014very], VGG19 [simonyan2014very], and Inception-v3 (GoogLeNet) [szegedy2015going, szegedy2016rethinking] are some of the initial DL architectures which achieved high performance across large scale natural image datasets. In this paper, the performance of ‘ResNet-50’ which is ResNet [he2016ResNet] network with 50 layers along with ‘DenseNet-121’, ‘DenseNet-169’ which are DenseNet [huang2017densely]

networks with 121 and 169 layers respectively are studied for the task of cribriform pattern detection. All these DL architectures have been fine-tuned via transfer learning. The fine-tuned DL architectures are then combined with hand-crafted nuclei features using Multi-layer Perceptron (MLP) for our final results. This paper focuses on the clinical problem of cribriform pattern detection and provides promising machine learning based method to aid pathologists.

3 Dataset

3.1 Dataset preparation

H&E stained whole slide images were downloaded from the ‘Legacy Archives’ of the NCI Genomic Data Commons (GDC) [gdc]. The GDC Legacy Archives currently hosts much of “The Cancer Genome Atlas (TCGA)” [tc] data. The TCGA has various WSIs categorised according to cancer types. Each WSI has a unique patient ID (Slide Name) in TCGA. This patient information is important when we design the experiments. The design should be such that patients sets among training, testing, and validation sets are mutually exclusive for reliable experiments and results.

Cribriform pattern may be seen in both benign and malignant glands. Neoplastic cribriform gland pattern may be seen in high grade prostate intraepithelial neoplasia (HG-PIN), acinar adenocarcinoma Gleason pattern 4, intraductal carcinoma of the prostate (IDC-P), and prostatic duct adenocarcinoma. Some example images for both ‘Cribriform’ and ‘Non-cribriform’ patterns are illustrated in Fig. 1. Cribriform patterns are characterized by solid proliferation with multiple punched-out lumina, without intervening stroma [kweldam2015cribriform] as evident in the first row of Fig. 1.

Figure 1: Example H&E images with ‘Cribriform’ and ‘Non-cribriform’ patterns in our dataset. These images were extracted at magnification with pixel resolution of 0.25MPP. The cribriform pattern detection system was developed using H&E images with different color variations.

Usual approach of data preparation is a pathologist going through the WSI using Aperio ImageScope [imagescope] and then extract images containing regions of interest(ROIs). These ROIs will either contain a cribriform pattern or a Non-cribriform pattern and hence labelled accordingly. We followed this protocol and initially extracted 161 images ( pixels) at from WSI of 10 patients using Aperio ImageScope [imagescope]. The pixels dimension was chosen by the pathologist such that the corresponding field of view contained enough information to identify if the image contains a cribriform pattern or not. The subsequent experiments for cribriform detection using deep learning were inconclusive due to insufficient patient data. We then extracted pixels images from 9 more patients using Aperio ImageScope [imagescope] and OpenSlide [goode2013openslide]. These images were then annotated by the pathologists in our team as ‘Cribriform’ or ‘Non-cribriform’. Table 1 tabulates the number of manually extracted and annotated images from each patient. This way we extracted 728 labeled images from 19 patients. Apart from these labeled images there were some images which were rejected during the labelling process as they were ambiguous and/or tissue structure was not preserved well.

After manually going through the images with the pathologist for labelling individual images we augmented the data using translation and rotation operations. The following section describes the data augmentation process.

S.N. Slide Name  (Patient ID) Gleason grade Number of Cribriform Images Number of Non-cribriform Images Image Dimensions
1 TCGA-2A-A8VO 3+3 (HG-PIN) - 17 pixels
2 TCGA-2A-A8VT 3+3 (HG-PIN) 2 - pixels
3 TCGA-EJ-5510 4+3 (HG-PIN) 6 1 pixels
4 TCGA-EJ-5511 3+4 (HG-PIN) 1 16 pixels
5 TCGA-EJ-5519 4+4 (HG-PIN) 5 - pixels
6 TCGA-EJ-7797 3+4 (HG-PIN) - 21 pixels
7 TCGA-G9-6338 4+3 (No HG-PIN) - 36 pixels
8 TCGA-G9-6363 4+3 (HG-PIN) - 14 pixels
9 TCGA-HC-7211 3+4 (HG-PIN) 25 - pixels
10 TCGA-HC-7212 3+4 (HG-PIN) 17 - pixels
11 TCGA-EJ-7791 No report 1 51 pixels
12 TCGA-EJ-8469 4+5 (HG-PIN) 121 - pixels
13 TCGA-EJ-A46F 4+4 (HG-PIN) 86 - pixels
14 TCGA-FC-7708 No report 5 60 pixels
15 TCGA-HC-7078 No report 1 12 pixels
16 TCGA-HC-7820 3+4 (HG-PIN) - 9 pixels
17 TCGA-XJ-A9DI 5+4 (No HG-PIN) - 28 pixels
18 TCGA-XK-AAJP 4+3 (HG-PIN) - 80 pixels
19 TCGA-YL-A8HL 4+5 (No HG-PIN) 114 - pixels
Total pixels images from 10 patients 56 105
Total pixels images from 9 patients 328 240
Total (749 images from 19 patients) 384 365
Table 1: Description of the manually extracted and annotated images in the cribriform dataset. We have 12 unique cribriform and 7 unique non-cribriform patients.

3.2 Data augmentation

We augment the dataset by using translation, rotation based sampling in the WSI. Given that we know the location of extracted and pixels images in the WSI, we can extract a region of around pixels around it using OpenSlide [goode2013openslide]. In this extracted region we can sample new images by translation of 50-100 pixels to the left, right, top, bottom of from the position of original image. Apart from translation, we can also sample images by rotation with and without translation. Fig. 2 illustrates the idea this idea for data augmentation. The images which are extracted around a given unique location will have same label as the original image location.

Let us define the total number of rotation, translations for extraction of new images. This will aid us in estimating the size of the augmented dataset. We define the translations of

and pixels along the horizontal (X-axis) and vertical (Y-axis) directions as two possible operations. We also define the rotation operations of and for a given image.

So, from original image location of we can have combinations of where . These translation operations will give us times the original images. The two rotation operations will give us 3 times the images. Eventually, one original image will give us images.

Figure 2: Example of extraction of new images from Whole Slide Image (WSI). A WSI is indicated as an arbitrary structure filled with green. The originally extracted pixels image is indicated by a filled red box. The surrounding pixels region is indicated by blue bordered box filled with white. The new images are to be sampled from inside this region. Some of the sampled images after rotation and/or translation from the original image are indicated by black empty boxes. Translation can be done by and pixels in horizontal and vertical directions from a given image. From original location of to locations where and . We can extract more images after rotations of and from a given image location. This will give us times the original dataset of images.

The originally extracted images were augmented using the method described above. The images were then checked manually for areas with empty regions which appear due to rotation and translation into empty WSI area. We have removed these images and then sorted all the remaining images which we extracted according to patient and label. There are ‘Cribriform’ and ‘Non-cribriform’ images after augmentation. This way we have a total of images pixels) from TCGA patients. Table 2 tabulates the patient wise number of images in the augmented dataset.

S.N. Slide Name  (Patient ID) Gleason grade Number of Cribriform Images Number of Non-cribriform Images
1 TCGA-2A-A8VO 3+3 (HG-PIN) - 1292
2 TCGA-2A-A8VT 3+3 (HG-PIN) 152 -
3 TCGA-EJ-5510 4+3 (HG-PIN) 456 76
4 TCGA-EJ-5511 3+4 (HG-PIN) 76 1216
5 TCGA-EJ-5519 4+4 (HG-PIN) 380 -
6 TCGA-EJ-7791 No report 76 21201
7 TCGA-EJ-7797 3+4 (HG-PIN) - 1596
8 TCGA-EJ-8469 4+5 (HG-PIN) 24000 -
9 TCGA-EJ-A46F 4+4 (HG-PIN) 10594 -
10 TCGA-FC-7708 No report 379 29935
11 TCGA-G9-6338 4+3 (No HG-PIN) - 2736
12 TCGA-G9-6363 4+3 (HG-PIN) - 1064
13 TCGA-HC-7078 No report 20 5188
14 TCGA-HC-7211 3+4 (HG-PIN) 1900 -
15 TCGA-HC-7212 3+4 (HG-PIN) 1292 -
16 TCGA-HC-7820 3+4 (HG-PIN) - 3943
17 TCGA-XJ-A9DI 5+4 (No HG-PIN) - 11699
18 TCGA-XK-AAJP 4+3 (HG-PIN) - 30185
19 TCGA-YL-A8HL 4+5 (No HG-PIN) 14233 -
Total (163708 images from 19 patients) 53557 110151
Table 2: Description of the all images in the augmented cribriform dataset. We have 12 unique cribriform and 7 unique non-cribriform patients. These images are of pixels.

As the total number of images in the augmented dataset is quite big, we used a subset of images for our experiments. We have defined three sets of patients for a three-fold cross-validated study such that patients for training, validation, and testing images are mutually exclusive. This configuration is to mimic the real world scenario for deployment of any cribriform pattern classification system. Table 3 tabulates these sets along with their use during the three folds. We sampled Cribriform (+ve), Non-cribriform (-ve) images in each of these sets for use in our experiments. This way we have a balanced dataset in our studies. We also defined an additional unseen test set for further evaluating our models. This additional unseen test set contains the images which have never been used for training, validation, and testing in the three-fold cross-validated study. The patients in the additional unseen test set and the test set in the cross-validated study for a given fold are the same. The addition unseen test set also contains Cribriform (+ve) and Non-cribriform (-ve) images in each fold ( three folds, same as the cross-validated study).

Set 1 Set 2 Set 3
Fold 01: Train; Fold 01: Validation; Fold 01: Test;
Fold 02: Validation; Fold 02: Test; Fold 02: Train;
Fold 03: Test; Fold 03: Train; Fold 03: Validation;
  • TCGA-2A-A8VT,

  • TCGA-HC-7212,

  • TCGA-FC-7078,

  • TCGA-YL-A8HL,

  • TCGA-XJ-A9DI,

  • TCGA-XK-AAJP.

Cribriform (+ve), Non-cribriform (-ve)
  • TCGA-2A-A8VO,

  • TCGA-EJ-7791,

  • TCGA-EJ-7797,

  • TCGA-HC-7211,

  • TCGA-EJ-5519,

  • TCGA-G9-6363,

  • TCGA-EJ-A46F.

Cribriform (+ve), Non-cribriform (-ve)
  • TCGA-HC-7708,

  • TCGA-HC-7820,

  • TCGA-EJ-5510,

  • TCGA-G9-6338,

  • TCGA-EJ-5511,

  • TCGA-EJ-8469.

Cribriform (+ve), Non-cribriform (-ve)
Cribriform (+ve), Non-cribriform (-ve) Cribriform (+ve), Non-cribriform (-ve) Cribriform (+ve), Non-cribriform (-ve)
Table 3: Set of patients in the three-fold cross-validated study. We sampled Cribriform (+ve), Non-cribriform (-ve) images in each of these sets for use in our experiments.

4 Methods

We have studied nuclei feature based classical machine learning model along with fine-tuned deep learning models for cribriform pattern detection. The classical machine learning model act as a base-line method for our system. We discuss all the methods for cribriform pattern detection in following sections.

4.1 Nuclei features with SVM

Various image based automated PCa grading studies have suggested using local and global features derived from nuclei patterns [kwak2017nuclear, fukuma2016study, khan2017predicting, ali2013cell]. Most commonly used local features quantify intensity distribution, radial intensity distribution, etc inside the segmented nuclei objects. These studies have also suggested creating nuclei graphs to quantify nuclei spatial distribution as a global feature. These nuclei based features with SVM are used as a base-line method for cribriform pattern detection experiments.

Given a nuclei segmentation, a digraph can be defined whose vertices are the centroids of the segmented nuclei [fukuma2016study]. is a complete digraph with edges weighted according to euclidean distance between the vertices (centroids). The nuclei spatial distribution was then quantified by computing Delaunay Triangulation and Minimum Spanning Tree (MST). The Delaunay Triangulation for the vertices in was computed using the Triangle software [shewchuk1996triangle]. The triangle area and perimeter based sub-features are extracted from this Delaunay Triangulation. The MST for was also computed using Kruskal’s algorithm [kruskal1956shortest]. For a given MST, its corresponding edge weight distribution was quantified as a sub-feature. Both of these sub-features constitute the image level nuclei feature.

The CellProfiler [carpenter2006cellprofiler] pipeline suggested by Fukuma et al. [fukuma2016study] has been used for nuclei segmentation and feature extraction. Fig. 3 shows the modules used in the CellProfiler [carpenter2006cellprofiler] pipeline. Fig. 4(a) shows a sample input H&E image for the CellProfiler [carpenter2006cellprofiler] pipeline. Fig. 4(b) shows the segmented nuclei locations as red diamonds on white background. These nuclei locations are used to define . These segmented nuclei regions are also used to extract nuclei level features like intensity distribution, eccentricity, etc. The MST features are extracted using the vertices in . Fig. 4(c) shows the Delaunay Triangulation using the vertices in . Table 4 discusses these features in detail. This table also details which tool or algorithm or CellProfiler [carpenter2006cellprofiler] module was used for the given nuclei sub-feature extraction.

Kwak et al. [kwak2017nuclear] illustrated that the RBF kernel SVM performs better then polynomial kernel SVM for the above nuclei features. Following this idea, the and for the RBF kernel were fine-tuned first and then fixed as and for final experiments.

Figure 3: Modules used in the CellProfiler [carpenter2006cellprofiler] pipeline for nuclei segmentation. This pipeline has been implemented as proposed by Fukuma et al. [fukuma2016study].
Figure 4: Intermediate stages during nuclei feature generation for an input H&E image using CellProfiler [carpenter2006cellprofiler] and Delaunay Triangulation. (a) Example input H&E image. (b) Segmented nuclei locations are indicated in red diamonds (By CellProfiler [carpenter2006cellprofiler]). Graph is defined using these nuclei locations. (c) Delaunay Triangulation using the vertices of graph . Table 4 discusses these features in detail.
Feature (Total Dimensions: 57) CellProfiler Module / Tool Module and Feature Description Relevance of feature with respect to PCa Histopathology
Number and area of nuclei [fukuma2016study, kwak2017nuclear] The average (

), standard deviation (

), disorder , and minimum to maximum ratio of area is computed. Dimensions: 5
MeasureImageArea and IdentifyPrimaryObjects. Measures the area and number of a given nuclei in the image. The morphology, size, and intensity distribution of nuclei are important in PCa assessment.
Radial Distribution of Pixel Intensity of the nuclei [fukuma2016study, kwak2017nuclear] Example features: Mean Intensity and Mean Intensities along the four rings(bins). The average (), standard deviation (), disorder , and minimum to maximum ratio of these two measurements are computed. Dimensions: 20 MeasureObjectIntensity Distribution and MeasureObjectIntensity Given an image with objects (nuclei) identified, these modules measures the intensity distribution from each object’s center to its boundary within a user-controlled number of bins, i.e. rings. The morphology, size, and intensity distribution of nuclei are important in PCa assessment.
Nucleus Size and Shape [fukuma2016study, kwak2017nuclear]. The nuclei shape can be modelled as an ellipse and subsequent features will be 1) minor axis length, 2) major axis length, 3) eccentricity, 4) orientation, 5) solidity. The average (), standard deviation (), disorder , and minimum to maximum ratio of these five measurements are computed. Dimensions: 20 MeasureObjectSizeShape Given an image with identified objects (e.g. nuclei or cells), this module extracts individual area and shape feature. The morphology, size, and intensity distribution of nuclei are important in PCa assessment.
Minimum Spanning Tree (MST)[fukuma2016study, kwak2017nuclear] The edge weights are computed as the distance between nuclei centroids. The average (), standard deviation (), disorder , and minimum to maximum ratio of the edge weights are features. Dimensions: 4 Kruskal’s algorithm [kruskal1956shortest]. A MST is created using the nuclei centroids. Khan et al. [khan2017predicting] however, mentions that just MST alone does not generate enough features to differentiate between images with Cribriform pattern (Gleason 4) from images with Gleason pattern 3. These features quantify the information specific to the spatial distribution of nuclei in the given field of view. The nuclei spatial distribution provides image level information which is important in PCa assessment. Khan et al. [khan2017predicting] provides additional insights about MST. The mean edge length of MST characterises the degree to which the epithelial nuclei are invading the stroma surrounding the gland.
Delaunay Triangulation [fukuma2016study]. The area and perimeter of each triangle is computed, and the average (), standard deviation (), disorder , and minimum to maximum ratio of area and perimeter are computed. Dimensions: 8 Triangle [shewchuk1996triangle]. A Delaunay Triangulation is created using the nuclei centroids. These features quantify the information specific to the spatial distribution of nuclei in the given field of view. The nuclei spatial distribution provides image level information which is important in PCa assessment. Khan et al. [khan2017predicting] provides additional insights about MST. The mean edge length of MST characterises the degree to which the epithelial nuclei are invading the stroma surrounding the gland.
Table 4: Nuclei features for cribriform pattern detection

4.2 Fine-tuning of pre-trained DL architectures

The extracted images at pixels dimensions were used for experiments with fine-tuning of different state-of-art DL architectures. These state-of-art DL architectures have been pre-trained on the ImageNet dataset [russakovsky2015imagenet]. Fine-tuning was done in two stages as follows:

  1. The last layers of each pre-trained network were modified for the cribriform pattern classification (binary classification). All the layers except the last fully connected layers in the modified network were frozen (non-trainable) for the first stage. The modified network was trained for 100 epochs.

  2. In the second stage the last block before the fully connected layers in the modified network was set as trainable. In this second stage, the last block along with the fully connected layers were trained for 100 epochs.

For both of the above stages, the learning rate was kept low to prevent overfitting due large number of trainable parameters with the given small amount of training images. The two stage fine-tuning strategy has been borrowed from the online Keras 

[chollet2015keras] tutorial “Building powerful image classification models using very little data” [keras_tune]

. This tutorial used TensorFlow 

[abadi2015tensorflow] as a back-end for deep learning.

Another possible strategy for fine-tuning is skipping first stage and directly fine-tune at the second stage itself. This way one will get non-reliable results because the random initialisation (high entropy) of last fully connected layers will induce massive change in weights in the last block of the network. The first stage in the above used strategy essentially reduces the entropy in the last fully connected layer leading to reliable results.

4.3 Fine-tuning of pre-trained and modified ResNet architectures

Additionally, we fine-tuned ResNet-50 [he2016ResNet, he2016identity] and ResNet-22, whereby we replaced the output layer of ResNet with two output nodes and kept all previous layers untouched. We separated the whole fine-tuning procedure into two stages. In the first stage, only the last layer was fine-tuned which runs for of the total number of epochs. For the second stage which runs for of the total number of epochs, the last ResNet block was trained as well as the output layer. In ResNet, all blocks are a bottle-neck block that consists of 3 convolutional layers.

ResNet-22 is a modified version of ResNet-50 [he2016ResNet] whose structure is basically the first 21 layers of ResNet-50 [he2016ResNet] plus a fully-connected layer at the output. The main advantage of using ResNet-22 is that it has a fewer number of parameters while still maintaining the powerful capabilities of the original ResNet [he2016ResNet] architecture. The input size is , whereby each image has three channels, namely R, G, and B. Architecture comparisons between the ResNet-50 and ResNet-22 network architecture are tabulated in Table 5. Both models share the same architecture for the first 21 layers as shown in Table 5 at the first four rows.

Output Size ResNet-50 [he2016ResNet] ResNet-22
262 262 1

1, 64, stride 2

63 63 3

3, Max-Pool , stride 2

63 63  
16 16  
16 16   No operation
8 8   No operation
1 1 Average Pool, 2-D, Full-Connected, Softmax
Table 5: Comparison between the network architectures of ResNet-50 [he2016ResNet] and ResNet-22. Each [] means one residual block. For example, in the fourth row we used 4 residual blocks where each residual block consists of 1 1 convolution followed by and then . Because ResNet-22 duplicates only the first 21 layers of ResNet-50 [he2016ResNet], the sixth and the seventh row has ‘No Operation’.

4.4 Feature combination using Multi-Layer Perceptron (MLP)

Kallen et al. [kallen2016towards] proposed using OverFeat [sermanet2013overfeat] network for feature extraction from prostate H&E images. These features were then fed into an SVM for automated PCa grading. During the experiments with nuclei features and various deep learning models some scope of improvement for cribriform pattern detection was observed. Subsequently, these methods were combined using feature concatenation and training a Multi-Layer Perceptron (MLP). Following a similar approach to Kallen et al. [kallen2016towards] features from a given image were extracted and then used to fine-tune the pre-trained ‘ResNet’, ‘DenseNet’, and ‘Inception-v3’ models.

In the MLP, the 57 nuclei features are concatenated with features from all “VGG’,‘ResNet’, ‘DenseNet’, and ‘Inception-v3’ models trained upon , , , , and pixels images. The two hidden layers in this MLP has and nodes respectively. This MLP was trained for epochs. We achieved testing accuracy of across three folds.

5 Results

Several DL models and nuclei features based model were assessed for effectiveness using the augmented cribriform image (balanced) dataset. The H&E images in the dataset were downscaled to , , , , and pixels for fine-tuning and testing of all DL models. The nuclei feature based SVM was trained and evaluated with images downscaled to pixels. The Keras [chollet2015keras] based framework resizes the input images to the internal image dimension of the given DL network. For example, ‘ResNet-50’ uses ‘’ pixels input image resolution. Given an input image of pixels, it is resized to pixels and then fed into the network during training and testing. The same process is used for all the DL models with different input image sizes (scales).

Three-fold cross-validated study was done such that patients for training, validation, and testing images are mutually exclusive to mimic the real world scenario for a cribriform pattern classification system. As discussed before in section 3.2, we had also defined an additional unseen test set for further evaluating our models. We expect the trained models to perform similarly during testing in both of the cross-validated study and the additional unseen test set. We tested the top three performing individual DL models on this additional unseen test set across the three folds.

Table 6 tabulates the testing accuracy for nuclei feature based methods along with fine-tuned DL architectures in the three-fold cross-validated study and for the top three models on the additional unseen test sets. The results for the top three models in the three-fold cross-validation study and on the additional unseen test set were similar. The experiments were conducted in two separate locations. The nuclei feature based method along with fine-tuned DL architectures were evaluated at first location. The modified ResNet [he2016ResNet] was designed and implemented in second location. The implementations were shared across the locations to validate reproducibility. For reproducibility checks, the DL experiments were done using 300 images on both locations. The results on both locations were identical. First experiment location used Ubuntu 14.04 64bit desktop with 32GB RAM, Intel i7 3.5 GHz CPU, and 6GB Nvidia TITAN GPU. The second location used Ubuntu 16.04 64bit desktop with 64 GB RAM, Intel i7 3.4 GHz CPU, and 12 GB Nvidia Titan X GPU.

5.1 Performance of DL models

Given the images rescaled to different resolutions from same image, the amount of usable information is directly proportional to the resolution of the rescaled image. We studied the performance from to pixels, the test accuracy decreases with image resolution which is as per our expectations.

VGG16 [simonyan2014very], VGG19 [simonyan2014very], and Inception-v3 [szegedy2015going, szegedy2016rethinking] were the top performers while newer and more complex architectures ResNet-50 [he2016ResNet], DenseNet-121 [huang2017densely], and DenseNet-169 [huang2017densely] did not perform well. This indicates that DL architectures, with low number trainable parameters (low model complexity) performed better than the DL architectures with much higher number of trainable parameters(high model complexity). This results can be attributed to the fact that the highly complex DL architectures will need higher number of training data samples. The same results were observed when ResNet-22 was designed after modifying ResNet50 [he2016ResNet].

The additional unseen test set results for our top three performing models VGG16 [simonyan2014very], VGG19 [simonyan2014very], and Inception-v3 [szegedy2015going, szegedy2016rethinking]

were similar to the three-fold cross-validated study results. This further confirms their robust performance. Also, in some of our trained/fine-tuned models, we observed that standard error of testing accuracy is a bit high indicating variable model performance across three folds. This can be attributed to the low number of patients being used for training. The models performance will improve with more patient information.

Method Input image dimensions (RGB), Scale Testing Accuracy (age), Testing accuracy on additional unseen set (if applicable in age)
‘ResNet-22’: pixels. pixels, 1:1 73.33 16.66
7cm‘VGG16’ [simonyan2014very]:
pixels.
pixels, 1:1 85.65 6.68, 85.81 6.74
pixels, 1:2
pixels, 1:4
pixels, 1:8
pixels, 1:16
7cm‘VGG19’ [simonyan2014very]:
pixels.
pixels, 1:1 86.78 6.97, 86.25 7.18
pixels, 1:2
pixels, 1:4
pixels, 1:8
pixels, 1:16
7cm‘Inception-v3’ [szegedy2015going, szegedy2016rethinking]:
pixels.
pixels, 1:1 88.18 5.99, 88.04 5.63
pixels, 1:2
pixels, 1:4
pixels, 1:8
pixels, 1:16
7cm‘DenseNet-121’ [huang2017densely]:
pixels.
pixels, 1:1
pixels, 1:2
pixels, 1:4
pixels, 1:8
pixels, 1:16
7cm‘DenseNet-169’ [huang2017densely]:
pixels.
pixels, 1:1
pixels, 1:2
pixels, 1:4
pixels, 1:8
pixels, 1:16
7cm‘ResNet-50’ [he2016ResNet]:
pixels.
pixels, 1:1
pixels, 1:2
pixels, 1:4
pixels, 1:8
pixels, 1:16
RBF kernel SVM (, ) using nuclei features (described in Table 4). pixels, 1:1
Combination of nuclei features with DL features using MLP (Not including ResNet-22) All scales from pixels to pixels. 85.93 7.54
Table 6: Testing Accuracy for various methods. Reported values are average standard error across the three folds. VGG16 [simonyan2014very], VGG19 [simonyan2014very], Inception-v3 [szegedy2015going, szegedy2016rethinking], and combination of all DL methods along with nuclei features using MLP achieve best results (Indicated in bold).

6 Conclusion

Pre-trained ‘VGG16’, ‘VGG19’, ‘ResNet-50’, ‘DenseNet-121’, ‘DenseNet-169’, and ‘Inception-v3’ were fine-tuned and tested to assess the possibility of using transfer learning for cribriform pattern detection. The performances of these models in their individual and combined capacity were assessed. Various hand-crafted nuclei features were also designed and tested for cribriform pattern detection. Some of these nuclei feature has been successful in prostate cancer grading which is easier problem when compared to cribriform pattern detection. Cribriform patterns are one of patterns in high grade prostate cancer regions. Our Non-cribriform labelled images include various high grade PCa regions which appear similar to cribriform pattern w.r.t. nuclei texture and clustering. The fine-tuned DL models were able to correctly identify cribriform pattern as they were able to use the information not limited to just nuclei texture and location. The detection results at various scales using DL models were analysed and combined with nuclei features using MLP with improved performance. The cribriform detection results are promising and can be treated as a base-line for future projects. The current dataset includes images from Gleason pattern 3, Gleason pattern 4, and HG-PIN regions with color variations. Future studies should include cribriform pattern images from all possible sources and various color variations encompassing multiple patient information.

Acknowledgments

This work was supported in parts by the Biomedical Research Council of A*STAR (Agency for Science, Technology and Research), Singapore; Science and Engineering Research Council of A*STAR, Singapore; National University of Singapore, Singapore; Department of Pathology at Tan Tock Seng Hospital, Singapore; Mount Elizabeth Novena Hospital, Singapore; Farrer Park Hospital, Singapore; University of Queensland, Australia; Monash University Malaysia, Malaysia; and Singapore-China NRF-Grant (No. NRF2016NRF-NSFC001-111).

References