Just like with natural images, deep convolutional neural networks (CNNs) have shown impressive results for the classification of various diseases in medical images [rajpurkar2017chexnet], [10.3389/fmed.2019.00264], [campanella2019clinical]. CNNs have also been used on histopathology images for tasks such as screening pre-cancerous lesions and localizing tumors [spanhol2016breast], as well as predicting mutations [coudray2018classification], survival [zhu2017wsisa], and cancer recurrence [xu2019deep][ye2018hybrid][mkl].
Though CNN based algorithms on histopathology images have produced promising results, these algorithms lack interpretability. Localization and visualization algorithms in CNNs such as guided-backpropagation[springenberg2014striving], grad-CAM [selvaraju2017grad], and other CAM-related techniques fail to produce informative visualization for histopathology images. For instance, these techniques do not highlight cell nuclei responsible for the diagnosis and relevant features of the tumor microenvironment to further our understanding of disease and treatment mechanisms. Also, often CNNs are not able to highlight the relevant portions of the macro environment of the tumor due to large sizes (giga-pixels) of the whole-slide images.
Morphological features of nuclei and the spatial relationships between them decide the diagnosis of histopathology slide. Representing histopathology images in the form of graphs can help capture the interaction between nuclei and the spatial arrangement of the relative positions with each other. Nuclei are represented as nodes of a graph and the distance between the nuclei can be described as edges between nodes of a graph[gadiya2019histographs]. This representation of histopathology images as graphs can be fed to graph convolutional networks (GCNs) to learn the characteristics of tissue at the macro-environment level.
Taking the idea of using GCNs on graphs extracted from histology images further in this work, we propose to use an attention-based architecture and an occlusion-based visualization technique to highlight informative nuclei and inter-nuclear relationships. Our visualization results for classification of disease states in breast and prostate cancer datasets agree satisfactorily with the pathologists’ observations of the relevance of various inter-nuclear relationships. Our technique paves the way for visualization of previously unknown features relevant for more important problems such as prognosis and prediction of treatment response.
Ii Related Work
Before the emergence of deep learning, processing of histopathology images as graphs was explored in various ways. Weyn et al.[weyn1999computer] represents a histopathology image as a minimum spanning tree for the diagnosis of mesotheliomas. They use k-nearest neighbor for the classification of minimum spanning trees. Similarly, Cigdem et al. [demir2005augmented]
form a graph from a histopathology image by considering the cluster of nuclei as a node that is connected using binary edges between nodes. A multi-layer perceptron is used for the detection of inflammation in brain biopsy. Cell-graphs[yener2016cell]
uses nuclei as nodes and heuristic features as node and vertex features to perform classification on breast cancer and brain biopsy datasets.
Though the above mentioned methods form graphs from histopathology images, they use classical machine learning approaches such as support vector machine (SVM),k-nearest neighbors (kNN), etc. Recent developments in deep learning for graphs have enabled the use of GCNs on graphs derived from histopathology images. Kipf et al.[kipf2016semi] exhibits impressive results for node classification on various graph datasets such as Citeseer, Cora, Pubmed and NELL. They used spectral graph convolution to operate on homogeneous graphs. Other lines of work in GCNs operate in the spectral domain, which enables these algorithms to analyze heterogeneous graphs as well. Such et al. [such2017robust] introduced a graph convolutional algorithm in spatial domain. This method achieves excellent performance on various graph datasets. CGC-Net[zhou2019cgc] uses a variant of GraphSage[hamilton2017inductive] for identification of grade of prostate cancer slide represented as a graph. Recently, GCNs have been applied to graphs of nuclei in histopathology images with classification accuracy that is at par with CNNs [gadiya2019histographs].
A large portion of the medical community is skeptical about deep learning deployment in histopathology due to the lack of transparency in its working. Some attempts have been made to make deep learning more explainable. For instance, attention-based multiple instance learning [ilse2018attention] frames classification of histopathology images as weakly supervised problem and assigns weights to patches of a large image. This method produces an attention map for histopathology images to highlight patches important for the classification of the overall slide, but it cannot be scaled to giga-pixel images because of its substantial computation requirements. Visualization in the form of clustering and heatmaps was presented in [coudray2018classification], but insightful interpretations beyond the highlighting of the tumor regions cannot be derived through these visualizations. Not only does interpretable visualization in general for histopathology images remains an open problem, to our knowledge, visualization for histopathology images through graph representation has also not been explored yet.
Iii Datasets and Methodology
In this section, we describe the datasets and methodology used.
In order to test the ability of the proposed method to highlight interpretable features automatically, we used two datasets for which we knew the features that were expected to be seen by the pathologists. The first dataset is from ICIAR2018 Grand Challenge on Breast Cancer Histology images (BACH) [aresta2019bach]
and it comprises of 400 histopathology images of breast cancer. Each image of this dataset is of the size of 2048 x 1536 pixels. The original BACH dataset contains four classes, viz. normal, benign, in-situ and invasive. We trained a GCN to perform the binary classification task between invasive and in-situ classes because these two differ in the spatial arrangement of nuclei even though the nuclei themselves share similar morphologies. We used PyTorch package for our simulations.
Gleason grade classification and visualization tasks were also performed on a prostate cancer dataset [arvaniti2018automated]. This dataset consists of a total of 1506 images for various prostate cancer tumor grades. Experiments were carried out for binary classification between Gleason grade 3+3 (primary+secondary) versus Gleason grade 4+4 or 4+5.
Iii-B Graph construction from Hematoxylin and eosin stain (H&E) stained images
We have used a UNet [ronneberger2015unet] based model for detecting the nuclei. Edge features are based on the inter-nucleus distance. We measure the distance between two nuclei as
, where are the co-ordinates of nucleus . We form an edge between two nodes i and j, if their inter-nuclei distance is less than 100 pixels and assign the following weight to the resultant edge in the adjacency matrix (A):
Iii-C Robust spatial filtering (RSF)
Our GCN was adapted from robust spatial filtering (RSF) [such2017robust]. For a graph , is the set of vertices and is the set of edges and is the number of nodes. Each vertex and edge can have multiple features.The numbers of features for a vertex and an edge are and respectively. The above arrangement allows the set and
to be represented as tensors such asand respectively. In RSF, the convolution operation on graphs is given by the following equation:
where, and b are learnable parameters and represents the edge feature of adjacency matrix. Multiple such filters are used to learn vertex features. In RSF, the graph adjacency matrix is not transformed into the spectral domain. Hence the computationally heavy operation of inversion of the Laplacian matrix is avoided.
For pooling operation, is derived from the input graph with and . This operation is similar to convolution operation given in Equation 2. Further, and with is obtained by,
Iii-D RSF with edge convolutions (RSF+Edge)
The convolutional layer in RSF convolves vertex features of neighbor vertices to learn enhanced vertex features. This operation does not exploit the edge features directly. Gadiya et al. [gadiya2018some] proposed a method to learn enhanced vertex as well as edge features. Edge convolutional is performed as per the following equation:
where is tensor of learnable parameters and is obtained by concatenating edge and vertex features of a node and
is a monotonic nonlinear activation function.
Iii-E Robust Spatial Filtering with Attention (RSF+Attention)
We conjectured that an attention mechanism could help rank the graph vertices in their relative order of importance. Attention mechanism is used in neural networks extensively for natural language processing and to a lesser extent for computer vision tasks[xu2015show, ilse2018attention]. In our work, the attention layer was included before the first pooling operation at the input to highlight important nuclei directly, as shown in Figure 1.
For the proposed model (RSF+Attention), we used the attention scores for visualization of the importance of individual nuclei. For the models that lacked an attention mechanism, given a trained model and a graph
, we rank all the nodes based on the drop in classification probability in a manner similar to[zeiler2014visualizing]. To get a more discernible drop in accuracy, for every node all the 1-hop neighbors along with their edges were also occluded. Occlusion of a node creates a new graph . Classification probability is computed for the occluded graph. The relative drop in probability for the nodes gives a measure for the importance of each node. We also tested 2-hop and 3-hop occlusion but the results were similar to those of 1-hop. Formally, for node can be given as,
Iv Experiments and Results
In this section, we show graphs formed from histology images, classification accuracy of using various GCN architectures, and visualization of highlighted nuclei.
|Original image||Detected nucleus map||RSF+edge||RSF+attention|
_set_from_clist:Nn ł_places_images_in_seq #2
_set_map:NNn ł_places_images_out_seq ł_places_images_in_seq _set_image:n ##1
_use:Nn ł_places_images_out_seq &
dim_new:N ł_places_width_dim _new:N ł_places_images_in_seq _new:N ł_places_images_out_seq _new_protected:Nn _set_image:n
Iv-a Graphs from H&E stained histopathology images
Each image produces a graph with a different number of nodes. For BACH and prostate cancer Gleason grade datasets, the average number of nodes in a graph was 1546 and 613, respectively. Figure 2 shows an example of transforming H&E stained histopathology image to a graph.
Iv-B Classification of breast and prostate cancers
|RSF||RSF + Edge||RSF + Attention|
|Vertex Conv 1||Vertex Conv 1||Vertex Conv 1|
|Vertex Conv 2||Vertex Conv 2||Vertex Conv 2 + Attn|
|Pooling 1||Pooling 1||Pooling 1|
|Edge Conv 1|
|Vertex Conv 3||Vertex Conv 3||Vertex Conv 3|
|Edge Conv 2|
|Pooling 2||Pooling 2||Pooling 2|
|Edge Conv 3|
|FC - 1||FC - 1||FC - 1|
|FC - 2||FC - 2||FC - 2|
|FC - 3||FC - 3||FC - 3|
|RSF + Edge||92%||97%|
|RSF + Attention||90%||97%|
We trained the three models described in the previous section, viz. robust spatial filtering (RSF), robust spatial filtering with edge convolution (RSF+Edge), and robust spatial filtering with attention (RSF+Attention). All models were trained for approximately 50 epochs with a learning rate of 0.01 using the Adam optimizer. The architectures of the three models are given in tableI. Table II shows that classification accuracy for the three models was quite comparable to each other. All the models contained nearly 300,000 parameters.
We now present the visualization produced by occlusion and attention mechanisms. We performed occlusion experiments on predictions of RSF and RSF+Edge models on the breast and prostate cancer datasets. Visualization produced by these models were nearly the same, so we have omitted the results from the former due to space constraints. The images in the first row correspond to in-situ subtype in breast cancer from BACH dataset. We can see that nuclei on the outer layer of the gland are highlighted by the occlusion experiments. Also, in the second row, which corresponds to the invasive class in BACH dataset, nearly all the nuclei are highlighted. Outer linings are crucial for in-situ classification and where as for invasive cancer is spread across the entire region. These are the characteristics of in-situ and invasive histologies that are correctly captured by the occlusion and attention experiments. In the last two rows, visualization results for the prostate cancer Gleason grade dataset are shown. In these images, nuclei of the glands that lose their structure are highlighted, as we expected them to be. The images in the last column of Figure 3
are visualization results from RSF+Attention model. These results were verified by expert pathologists and visibly better at highlighting the above mentioned features.
We occluded nuclei clusters and exploited an attention layer in a graph convolutional neural network to highlight nuclei in histopathology slides and visualized the results on a breast cancer and a prostate cancer datasets. The proposed methods provide a notably more interpretable map depicting the contribution of each nucleus and its neighborhood in the final diagnosis. The presented results provide a way to explain the new patterns the deep learning models found on the tissue images. The proposed techniques not only open a path for the verification of the existing practices in pathology but suggest a way to generate new knowledge on where to focus to find meaningful differences between tissue classes, for example, those that may have different disease or treatment outcome.
Authors would like to thank Nvidia Corporation for donation of GPUs used for this research.