Deep neural network models for computational histopathology: A survey

by   Chetan L. Srinidhi, et al.

Histopathological images contain rich phenotypic information that can be used to monitor underlying mechanisms contributing to diseases progression and patient survival outcomes. Recently, deep learning has become the mainstream methodological choice for analyzing and interpreting cancer histology images. In this paper, we present a comprehensive review of state-of-the-art deep learning approaches that have been used in the context of histopathological image analysis. From the survey of over 130 papers, we review the fields progress based on the methodological aspect of different machine learning strategies such as supervised, weakly supervised, unsupervised, transfer learning and various other sub-variants of these methods. We also provide an overview of deep learning based survival models that are applicable for disease-specific prognosis tasks. Finally, we summarize several existing open datasets and highlight critical challenges and limitations with current deep learning approaches, along with possible avenues for future research.


page 1

page 2

page 3

page 4


Deep Learning for Computational Cytology: A Survey

Computational cytology is a critical, rapid-developing, yet challenging ...

Objective Diagnosis for Histopathological Images Based on Machine Learning Techniques: Classical Approaches and New Trends

Histopathology refers to the examination by a pathologist of biopsy samp...

Deep Neural Networks and Tabular Data: A Survey

Heterogeneous tabular data are the most commonly used form of data and a...

A Brief Survey on Deep Learning Based Data Hiding, Steganography and Watermarking

Data hiding is the art of concealing messages with limited perceptual ch...

Applications of Deep Learning in Fundus Images: A Review

The use of fundus images for the early screening of eye diseases is of g...

Bladder segmentation based on deep learning approaches: current limitations and lessons

Precise determination and assessment of bladder cancer (BC) extent of mu...

Deep Learning for Virus-Spreading Forecasting: a Brief Survey

The advent of the coronavirus pandemic has sparked the interest in predi...

1 Introduction

The examination and interpretation of tissue sections stained with haematoxylin and eosin (H&E) by anatomic pathologists is an essential component in the assessment of disease. In addition to providing diagnostic information, the phenotypic information contained in histology slides can be used for prognosis. Features such as nuclear atypia, degree of gland formation, presence of mitosis and inflammation can all be indicative of how aggressive a tumour is, and may also allow predictions to be made about the likelihood of recurrence after surgery. Over the last 50 years, several scoring systems have been proposed that allow pathologists to grade tumours based on their appearance, for example, the Gleason score for prostate cancer (Epstein et al., 2005) and the Nottingham score for breast cancer (Rakha et al., 2008). These systems provide important information to guide decisions about treatment and are valuable in assessing heterogeneous disease. There is, however, considerable inter-pathologist variability, and some systems that require quantitative analysis, for example the residual cancer burden index (Symmans et al., 2007), are too time-consuming to use in a routine clinical setting.

The first efforts to extract quantitative measures from microscopy images were in cytology. Prewitt and Mendelsohn (1966)

laid out the steps required for the “effective and efficient discrimination and interpretation of images” which described the basic paradigm of object detection, feature extraction and finally the training of a classification function that is still in use more than 50 years later. Early work in cytology and histopathology was usually limited to the analysis of the small fields of view that could be captured using conventional microscopy, and image acquisition was a time-consuming process

(Mukhopadhyay et al., 2018). The introduction of whole slide scanners in the 1990s made it much easier to produce digitized images of whole tissue slides at microscopic resolution, and this led to renewed interest in the application of image analysis and machine learning techniques to histopathology.

In 2012, Krizhevsky et al. (2012)

showed that convolutional neural networks (CNNs) could outperform previous machine learning approaches by classifying 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into 1000 different classes. At the same time,

Cirecsan et al. (2012) showed that CNNs could outperform competing methods in segmenting nerves in electron microscopy images and detecting mitotic cells in histopathology images (Cirecsan et al., 2013). Since then, methods based on CNNs have consistently outperformed other handcrafted methods in a variety of deep learning (DL) tasks in digital pathology. The ability of CNNs to learn features directly from the raw data without the need for specialist input from pathologists and the availability of annotated histopathology datasets has also fueled the explosion of interest in deep learning applied to histopathology.

The analysis of whole-slide digital pathology images (WSIs) poses some unique challenges. The images are very large and have to be broken down into hundreds or thousands of smaller tiles before they can be processed. Both the context at low magnification, and the detail at high magnification, may be important for a task, therefore information from multiple scales needs to be integrated. In the case of survival prediction, salient regions of the image are not known a priori

and we may only have weak slide level labels. The variability within each disease subtype can be high and it usually requires a highly trained pathologist to make annotations. For cell based methods, many thousands of objects need to be detected and characterized. These challenges have made it necessary to adapt existing deep learning architectures and to design novel approaches specific to the digital pathology domain. In this work, we surveyed more than 130 papers, where deep learning has been applied to a wide variety of detection, diagnosis, prediction and prognosis tasks. We carried out this extensive review by searching Google Scholar, PubMed and arXiv for papers containing keywords such as (“convolutional” or “deep learning”) and (“digital pathology” or “histopathology” or “computational pathology”). Additionally, we also included conference proceedings from MICCAI, ISBI, MIDL, SPIE and EMBC based on title/abstract of the papers. We also iterated over the selected papers to include any additional cross-referenced works that were missing from our initial search criteria. The body of research in this area is growing rapidly and this survey covers the period up to and including December 2019. A descriptive statistics of published papers according to their category and year is illustrated in Fig.


(a) (b)
Figure 1: (a) An overview of numbers of papers published from January 2013 to December 2019 in deep learning based computation histopathology surveyed in this paper. (b) A categorical breakdown of the number of papers published in each learning schemas.

The remainder of this paper is organised as follows. Section 2 presents an overview of various learning schemes in DL literature in the context of computational histopathology. Section 3 discusses in detail different categories of DL schemes commonly used in this field. We categorize these learning mechanisms into supervised (Section 3.1), weakly supervised (Section 3.2), unsupervised (Section 3.3), transfer learning (Section 3.4). Section 4 discusses survival models related to disease prognosis task. In Section 5, we discuss various open challenges including prospective applications and future trends in computational pathology, and finally, conclusions are presented in Section 6.

2 Overview of learning schemas

Figure 2: An overview of deep neural network models in computational histopathology. These models have been constructed using various deep learning architectures (shown in alphabetical order) and applied to various histopathological image analysis tasks (depicted in numerical order).

In this section, we provide a formal introduction to various learning schemes in the context of DL applied to computational pathology. These learning schemes are illustrated with an example of classifying a histology WSI has cancerous or normal. Based on these formulations, various DL models have been proposed in the literature, which are traditionally based on convolutional neural network (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), auto-encoders (AEs) and various other variants. For a detailed and thorough background of DL fundamentals and its existing architectures, we refer readers to

LeCun et al. (2015); Goodfellow et al. (2016), and with specific application of DL in medical image analysis to Litjens et al. (2017); Shen et al. (2017); Yi et al. (2019).

In supervised learning, we have a set of training examples , where, each sample is an input image (a WSI of dimension pixels, with channels. For example, channels for an RGB image) associated with a class label , with possible classes. For example, in binary classification, takes the scalar form , and the set for a regression task. The goal is to train a model

that best predicts the label for an unknown test image based on a loss function

. For instance, ’s are the patches in WSIs and ’s are the labels annotated by the pathologist either as cancerous or normal. During the inference time, the model predicts the label of a patch from a previously unseen test set. This scheme is detailed in Section 3.1, with an example illustrated in Fig. 3.

In weakly supervised learning (WSL), the goal is to train a model using the readily available coarse-grained (image-level) annotations , to automatically infer the fine-grained (pixel/patch)-level labels . In histopathology, a pathologist labels a WSI as cancer, as long as a small part of this image contains cancerous region, without indicating its exact location. Such image-level annotations (often called “weak labels”) are relatively easier to obtain in practice compared to expensive pixel-wise labels for supervised methods. An illustrative example for WSL scheme is shown in Fig. 4, and this scheme is covered in-depth in Section 3.2.

The unsupervised learning

aims at identifying patterns on the image, without mapping an input image sample into a predefined set of output (i.e. label). This type of models includes fully unsupervised methods, where the raw data comes in the form of images without any expert-annotated labels. A common technique in unsupervised learning is to transform the input data into a lower-dimensional subspace, and then group these lower-dimension representations (i.e. the latent vector) into mutually exclusive or hierarchical groups, based on a clustering technique. An example of unsupervised learning scheme is illustrated in Fig.

5, with existing methods in Section 3.3.

In transfer learning (TL), the goal is to transfer knowledge from one domain (i.e., source) to another domain (i.e., target), by relaxing the assumption that the train and test set must be independent and identically distributed. Formally, given a domain , which is defined by the feature space

, a marginal probability distribution

(where ), and a task - consisting of label space and a prediction function . The aim of transfer learning is to improve the predictive function in target domain () by using the knowledge in source domain () and source task (), where, ; and/or . For example, in histology, this scenario can occur while training a classifier on the source task and possibly fine-tuning on a target task , with limited or no annotations. This scheme is explained in-detail in Section 3.4. Note that, the domain adaptation, which is a sub-field of transfer learning, is discussed thoroughly in Section 3.4.1.

Next, we discuss various deep neural network (DNN) models in each of these learning schemes published in histopathology domain, along with the existing challenges and gaps in current research, and possible future directions in this perspective.

3 Methodological approaches

The aim of this section is to provide a general reference guide to various deep learning models applied in computational histopathology from a methodological perspective. The DL models discussed in the following sections were originally developed for specific applications, but are applicable to a wide variety of histopathological tasks (Fig. 2). Based on the learning schemes, the following sections are divided into supervised, weakly supervised, unsupervised and transfer learning approaches. The details are presented next.

3.1 Supervised learning

Figure 3: An overview of supervised learning models.

Among the supervised learning techniques, we identify three major canonical deep learning models based on the nature of tasks that are solved in digital histopathology: classification, regression and segmentation based models, as illustrated in Fig. 3. The first category of models contains methods related to pixel-wise/sliding-window classification based approaches, which are traditionally formulated as object detection (Girshick, 2015) or image classification tasks (He et al., 2016)

in the computer vision literature. The second category of models focuses on predicting the position of objects (e.g., cells or nuclei

(Sirinukunwattana et al., 2016)) or sometimes predicting a cancer severity score (e.g., H-score of breast cancer images (Liu et al., 2019)) by enforcing topological/spatial constraints in DNN models. Finally, the last category of models is related to fully convolutional network (FCN) based approaches (Long et al., 2015; Ronneberger et al., 2015), which are widely adopted to solve semantic or instance segmentation problems in computer vision and medical imaging scenarios. The overview of papers in supervised learning is summarized in Table 1.

3.1.1 Classification models

This category of methods uses a sliding window approach (i.e., patch centred on a pixel of interest) to identify objects (such as cells, glands, nuclei) or make image-level predictions (such as disease diagnosis and prognosis). Within this category, we further identify two sub-categories: (i) local-level tasks, and (ii) global-level tasks. The former stream of methods is based on a region (i.e., cell, nuclei) represented by a spatially pooled feature representations or scores, aiming at identifying or localizing objects. While the latter consists of methods related to image-level prediction tasks such as whole-slide level disease grading.

A. Local-level task: Image classification such as detection of cells or nuclei is notably one of the most successful tasks, where deep learning techniques have made a tremendous contribution in the field of digital pathology. Methods based on CNNs have been extensively used for pixel-wise prediction task by a sliding window approach, to train the networks on small image patches rather than the entire WSI. Due to giga-resolution of WSIs (e.g., pixels), applying a CNN directly to WSI is impractical, and hence, the entire WSI is divided into segments of small patches for analysis. In practice, these image patches are often annotated by the pathologist as a region containing an object of interest (e.g., cells/nuclei) or a background. A large corpus of deep learning methods applied to digital pathology is akin to computer vision models applied to visual object recognition task (Russakovsky et al., 2015).

The earliest seminal work proposed in Cirecsan et al. (2013) revolutionised the entire field of digital histopathology, by applying CNN based pixel prediction to detect mitosis in routinely stained H&E breast cancer histology images. Their method significantly outperformed all other competing techniques in both ICPR 2012 and the AMIDA 2013 for mitosis detection challenge. Subsequent methods were based on CNNs or a combination of CNN and handcrafted features. Since training of CNN models is often complex and requires a larger training set, the earliest works (Wang et al., 2014; Kashif et al., 2016; Xing et al., 2016; Romo-Bucheli et al., 2016; Wang et al., 2016b) focused on integrating CNN with biologically interpretable handcrafted features. These models showed excellent performance results in addressing the touching nuclei segmentation problem, compared to their CNN counterparts. Recently, Qaiser et al. (2019b) proposed a hybrid method based on persistent homology and CNN based features for colon tumour segmentation, without compromising on inference speed. The authors also showed that the degree of spatial connectivity among touching nuclei and rotational invariant properties are captured well in homology features space, which is, in general, quite difficult to achieve using CNN models (Sabour et al., 2017).

Training a deep CNN from scratch requires large amounts of annotated data, which is very expensive and cumbersome to obtain in practice. A promising alternative is to use a pre-trained network (trained on a vast set of natural images, such as ImageNet) to fine-tune on a problem in different domain with limited number of annotations. Along these lines, Gao et al. (2017); Valkonen et al. (2019) proposed a fine-tuning based transfer learning approach, which consistently performed better than full training on a single dataset alone. In particular, Gao et al. (2017)

made several interesting observations about improving CNN performance by optimising the hyperparameters of the network, augmenting the training data and fine-tuning rather than full training of the model.

Albarqouni et al. (2016) went one step further by incorporating the crowd annotations (non-expert) directly into CNN learning process via an additional crowdsourcing layer to improve model performance.

The addition of multi-scale and contextual knowledge into CNN plays an essential role in identifying overlapping cell structures in histopathology images. Conventional single scale models often suffer from two main limitations: 1) the raw-pixel intensity information around a small window does not have enough information about the degree of overlap between cells, and 2) use of a large window leads to an increase in the number of model parameters and training time. To alleviate these issues, several authors (Song et al., 2015, 2017) proposed a multi-scale CNN model to accurately solve the overlapping cell segmentation problem, with the addition of domain-specific shape priors during training. Despite several modifications to CNN architectures, the traditional deep learning methods often lack generalisation ability due to stain variations across datasets. To address this issue, Tellez et al. (2018) proposed a CNN based data specific stain augmentation strategy primarily tailored to H&E WSI. Further, they proposed a scalable approach to reduce the exhaustive labeling effort of mitotic cells by first analysing mitotic activity in PHH3 restained sections, and later registering it to H&E images.

In summary, among the bottom-up approaches, CNN is the current gold standard technique applied to a wide variety of low-level histopathology tasks such as cell or nuclei detection. Methods based on multi-scale CNN and transfer learning approaches are becoming increasingly popular due to their excellent generalization adaptability across a wide range of datasets and scanning protocols.

B. Global-level task: Most of the published deep learning methods in this category focus on patch-based classification approach for whole-slide level disease prediction task. These techniques range from the use of simple CNN architectures (Cruz-Roa et al., 2014; Ertosun and Rubin, 2015) to more sophisticated models (Qaiser and Rajpoot, 2019; Zhang et al., 2019) for accurate tissue-level cancer localization and WSI-level disease grading. For instance, Cruz-Roa et al. (2014, 2017) proposed a simple 3-layer CNN for identifying invasive ductal carcinoma in breast cancer images. Their approach outperformed all the previous handcrafted methods by a margin of 5%, in terms of average sensitivity and specificity. The main disadvantage of these methods is the relatively long computational time required to carry out a dense patch-wise prediction over an entire WSI. To address this issue, Cruz-Roa et al. (2018) proposed a combination of CNN and adaptive sampling based on quasi-Monte Carlo sampling and a gradient-based adaptive strategy, to precisely focus only on those regions with high-uncertainty. Subsequently, a few authors (Litjens et al., 2016; Vandenberghe et al., 2017) employed a simpler patch-based CNN model for the identification of breast and prostate cancer in WSI, achieving an AUC of 0.99 for the breast cancer experiment. In more recent years, some authors (Bejnordi et al., 2018; Wei et al., 2019; Nagpal et al., 2019; Shaban et al., 2019b; Halicek et al., 2019) have trained networks from scratch (i.e., full training) on huge set of WSIs. These networks include the most popular deep learning models traditionally used for natural image classification task such as VGGNet (Simonyan and Zisserman, 2014), InceptionNet (Szegedy et al., 2015), ResNet (He et al., 2016) and MobileNet (Howard et al., 2017) architectures. There is no generic rule about the choice of architectures, with the type of disease prediction task. However, the main success of these CNN models depends on the number of images available for training, choice of network hyper-parameters and various other boosting techniques (Cirecsan et al., 2013; Nagpal et al., 2019) (Refer, Section 5.1 for more details).

A few authors try to encode both local and global contextual information into CNN learning process for more accurate disease prediction in WSIs. Typically, contextual knowledge is incorporated into a CNN framework by modelling the spatial correlations between neighbouring patches, using the strengths of CNNs and conditional random fields (CRF) (Zheng et al., 2015; Chen et al., 2017b). These techniques have been extensively used in computer vision tasks for sequence labeling (Artieres and others, 2010; Peng et al., 2009) and semantic image segmentation problems (Chen et al., 2017b). While in digital pathology, for instance, Kong et al. (2017)

introduced a spatially structured network (Spatio-Net) combining CNN with 2D Long-short Term Memory (LSTM) to jointly learn the image appearance and spatial dependency features for breast cancer metastasis detection. A similar approach has also been adopted in

Agarwalla et al. (2017) to aggregate features from neighbouring patches using 2D-LSTM’s on WSIs. In contrast, Li and Ping (2018)

proposed an alternative technique based on CRF for modelling spatial correlations through a fully connected CRF component. The advantages of such models are that the whole DNN can be trained in an end-to-end manner with the standard backpropagation algorithm, with a slight overhead in complexity. Alternative methods have also been proposed to encode global contextual knowledge by adopting different patch level aggregation strategies. For example,

Bejnordi et al. (2017) employed a cascaded CNN model to aggregate patch-level pyramid representations to simultaneously encode multi-scale and contextual information for breast cancer multi-classification. Similarly, Awan et al. (2018)

adopted a ResNet based patch classification model to output a high dimensional feature space. These features are then combined using a support vector machine (SVM) classifier to learn the context of a large patch, for discriminating different classes in breast cancer.

Although the above methods include contextual information in the form of patch-based approaches, they still suffer from loss of visual context due to disjoint/random selection of small image patches. Furthermore, applying a CNN based classification model directly to WSI is computationally expensive, and it scales linearly with an increasing number of input image patches (Qaiser and Rajpoot, 2019). Some recent studies (Qaiser and Rajpoot, 2019; BenTaieb and Hamarneh, 2018; Xu et al., 2019a) explored task-driven visual attention models (Mnih et al., 2014; Ranzato, 2014)

for histopathology WSI analysis. Such models tend to diagnose cancer by selectively focusing only on the most diagnostically useful areas (such as tissue components) while ignoring the irrelevant regions (such as the background) for further analysis. These kinds of visual attention models have been extensively explored in computer vision applications including object detection

(Liu et al., 2016), image classification (Mnih et al., 2014)

, image captioning

(Sharma et al., 2015), and action recognition (Xu et al., 2015b) tasks.

In routine clinical diagnosis, typically, a pathologist first examines different locations within a WSI to identify diagnostically indicative areas, and then combines this information over time across different eye fixations, to predict the presence or absence of cancer. This human visual attention mechanism can be modelled as a sequential learning task in deep learning using RNNs. For instance, Qaiser and Rajpoot (2019) modelled the prediction of immunohistochemical (IHC) scoring of HER2 (Qaiser et al., 2018)

as a sequential learning problem, where the whole DNN is optimized via policy gradients trained under a deep reinforcement learning (DRL) framework. Furthermore, the authors also incorporated an additional task-specific mechanism to inhibit the model from revisiting the previously attended locations for further diagnosis. Similarly,

BenTaieb and Hamarneh (2018); Xu et al. (2019a) proposed recurrent attention mechanisms to selectively attend and classify the most discriminate regions in WSI for breast cancer prediction. Inspired by recent works (Xu et al., 2015b; Krause et al., 2017) in image captioning for natural scenes, Zhang et al. (2019) proposed an attention-based multi-modal DL framework to automatically generate clinical diagnostic descriptions and tissue localization attention maps, mimicking the pathologist. An attractive feature of their system is the ability to create natural language descriptions of the histopathology findings, whose structure closely resembles that of a standard clinical pathology report.

In essence, attention-based models are gaining popularity in recent years and have several intriguing properties over traditional sliding-window (patch-based) approaches: i) by enforcing a region selection mechanism (i.e., attention), the model tries to learn only the most relevant diagnostically useful areas for disease prediction; ii) the number of model parameters is drastically reduced leading to faster inference time; and iii) the model complexity is independent of the size of WSI.

3.1.2 Regression models

This category of methods focuses on detection or localization of objects by directly regressing the likelihood of a pixel being the centre of an object (e.g., cell or nucleus centre). Detection of cells or nuclei in histopathology images is challenging due to their highly irregular appearance and their tendency to occur as overlapping clumps, which results in difficulty in separating them as a single cell or a nucleus (Naylor et al., 2018; Xie et al., 2018b; Graham et al., 2019b). The use of pixel-based classification approaches for this task may result in suboptimal performance, as they do not necessarily consider the topological relationship between pixels that lie in the object centre with those in their neighbourhood (Sirinukunwattana et al., 2016). To tackle this issue, many authors cast the object detection task as a regression problem, by enforcing topological constraints, such that the pixels near object centres have higher probability values than those further away. This formulation has shown to achieve better detection or localization of objects, even with significant variability in both the object appearance and their locations in images.

Deep regression models proposed in the literature are mainly based on either CNN or FCN architectures (Long et al., 2015). In the context of FCN, the earlier methods by Chen et al. (2016a); Xie et al. (2018a) proposed a simple FCN based regression model for detecting cells in histopathology images. The most recent methods attempt to improve the detection task by modifying the loss function (Xie et al., 2018b) or incorporating additional features into popular deep learning architectures (Graham et al., 2019b). Xie et al. (2015a, 2018b) proposed a structured regression model based on fully residual convolutional networks for detecting cells in four different tissue images. Instead of using the standard mean square error (MSE) loss function, the authors adopted a weighted MSE loss by assigning higher weights to misclassified pixels that are closer to cell centres. A similar approach by Xing et al. (2019), adopted a residual learning based FCN architecture for simultaneous nucleus detection and classification in pancreatic neuroendocrine tumour Ki-67 images. In their model, an additional auxiliary task (i.e., ROI extraction) is also introduced to assist and boost the nucleus classification task using weak annotations. To solve the challenging touching nuclei segmentation problem, Naylor et al. (2018) proposed a model to identify superior markers for the watershed algorithm by regressing the intra-nuclear distance map. Graham et al. (2019b) went one step further, proposing a slightly different strategy based on a unified FCN model for simultaneous nuclear instance segmentation and classification. The proposed model effectively encodes both the horizontal and vertical distance information of nuclei pixels to their centre of mass for accurate nuclei separation in multi-tissue histology images.

Other authors adopted alternative methods by modifying the output layer of CNN, to include distance constraints or a voting mechanism into the network learning process. For instance, Sirinukunwattana et al. (2016) introduced a new layer modifying the output of a CNN to predict a probability map which is topologically constrained, such that the high confidence scores are likely to be assigned to the pixels closer to nuclei centre in colon histology images. This method was later extended in Swiderska-Chadaj et al. (2019) to detect lymphocytes in immunohistochemistry images. Xie et al. (2015a)

proposed an alternative method based on the voting mechanism for nuclei localization. The proposed model can be viewed as an implicit Hough-voting codebook, which learns to map an image patch to a set of voting offsets (i.e., nuclei positions) and the corresponding confidence scores to weight each vote. This set of weighted votes is then aggregated to estimate the final density map used to localize the nuclei positions in neuroendocrine tumour images.

3.1.3 Segmentation models

Segmentation of histological primitives such as cells, glands, nuclei and other tissue components is an essential pre-requisite for obtaining reliable morphological measurements to assess the malignancy of several carcinomas (Chen et al., 2017a; Sirinukunwattana et al., 2017; Bulten et al., 2019b). Accurate segmentation of structures from histology images often requires the pixel-level delineation of object contour or the whole interior of the object of interest. CNNs trained to classify each patch centred on a pixel of interest as either foreground or background, can be used for segmentation tasks by employing a sliding-window approach. However, given the large size of giga-pixel WSIs, patch-based approaches lead to a large number of redundant computations in overlapping regions, in turn resulting in a drastic increase in computational complexity and loss of contextual information (Chen et al., 2017a; Lin et al., 2019). The other alternative is to employ fully convolutional networks (FCN) (Long et al., 2015; Ronneberger et al., 2015), which take as input an arbitrary sized image (or a patch) and output a similar-sized image in a single forward pass. The whole FCN model can be trained via end-to-end backpropagation and directly outputs a dense per-pixel prediction score map. Hence, segmentation models in histopathology are mainly built on the representative power of FCN and its variants, which are generally formulated as a semantic segmentation task, with applications ranging from nucleus/gland/duct segmentation (Kumar et al., 2019; Sirinukunwattana et al., 2017; Seth et al., 2019) to the prediction of cancer (Liu et al., 2019; Bulten et al., 2019a) in WSIs.

In order to determine an optimal model suitable for a given task, Swiderska-Chadaj et al. (2019); de Bel et al. (2018) compared FCN with UNet architecture (Ronneberger et al., 2015) and found that better generalization ability and robustness was achieved using a UNet model. The key feature of the UNet is the upsampling path of the network, which learns to propagate the contextual information to high-resolution layers, along with additional skip connections to yield more biologically plausible segmentation maps, compared to the standard FCN model. The traditional FCN model also lacks smoothness constraints, which can result in poor delineation of object contours and formation of spurious regions while segmenting touching/overlapping objects (Zheng et al., 2015). To circumvent this problem, BenTaieb and Hamarneh (2016) formulated a new loss function to incorporate boundary smoothness and topological priors into FCN learning, for discriminating epithelial glands with other tissue structures in histology images.

The appearance of histological objects such as glands and nuclei vary significantly in their size, shape and often occur as overlapping clumped instances, which makes them difficult to distinguish with the other surrounding structures. A few methods attempted to address this issue by leveraging the representation power of FCN with multi-scale feature learning strategies (Chen et al., 2017b; Lin et al., 2017); to effectively delineate varying size objects in histology images. For instance, Chen et al. (2017a) proposed a multi-level contextual FCN with auxiliary supervision mechanism (Xie and Tu, 2015) to segment both glands and nuclei in histology images. They also devised an elegant multi-task framework to integrate object appearance with contour information, for precise identification of touching glands. This work was later extended in Van Eycke et al. (2018) by combining the efficient techniques of DCAN (Chen et al., 2017a), UNet, and identity mapping in ResNet to build an FCN model for segmenting epithelial glands in double-stained images.

Some authors have proposed variants of FCN to enhance segmentation - in particular at glandular boundaries, by compensating for the loss occurring in max-pooling layers of FCNs. For example,

Graham et al. (2019a) introduced minimum information loss dilated units in residual FCNs, to help retain the maximal spatial resolution critical for segmenting glandular structures at boundary locations. Later, Ding et al. (2019) employed a similar technique to circumvent the loss of global information by introducing a high-resolution auxiliary branch in the multi-scale FCN model, to locate and shape the glandular objects. Zhao et al. (2019) proposed a feature pyramid based model (Lin et al., 2017) to aggregate local-to-global features in FCN, to enhance the discriminative capability of the model in identifying breast cancer metastasis. Moreover, they also devised a synergistic learning approach to collaboratively train both the primary detector and an extra decoder with semantic guidance, to help improve the model’s ability to retrieve metastasis.

Conventional FCN based models are fundamentally designed to predict the class label for each pixel as either foreground or background, but are unable to predict the individual object instances (i.e., recognizing the categorical label of foreground pixels). In computer vision, such problems can be formulated as an “instance-aware semantic segmentation” task (Hariharan et al., 2014; Li et al., 2017), where segmentation and classification of object instances are performed simultaneously in a joint end-to-end manner. In histology, Xu et al. (2017) formulated the gland instance segmentation as two sub-tasks - gland segmentation and instance recognition task, using a multi-channel deep network model (Dai et al., 2016). The gland segmentation is performed using FCN, while, the gland instance boundaries are recognized using the location (Girshick, 2015) and boundary cues (Xie and Tu, 2015). A similar formulation has been adopted in Qu et al. (2019a) to solve the joint segmentation and classification of nuclei using an FCN trained with perceptual loss (Johnson et al., 2016).

Most deep learning methods in digital pathology are applied on small-sized image patches rather than the entire WSI, restricting the prediction ability of the model to a narrow field-of-view. The conventional patch-based approaches often suffer from three main limitations: i) the extracted individual patches from WSI have a narrow field-of-view, with limited contextual knowledge about the surrounding structures; ii) patch-based models are not consistent with the way a pathologist analyzes a slide under a microscope; and iii) a large number of redundant computations are carried out in overlapping regions, resulting in increased computational complexity and slower inference speed. In order to alleviate the first two issues, attempts have been made to mimic the way in which a pathologist usually analyzes a slide at various magnification levels before arriving at the final decision. Such mechanisms are integrated into the FCN model by designing multi-magnification networks (Ho et al., 2019; Tokunaga et al., 2019), each trained on different field-of-view image patches to obtain a better discriminative feature representation compared to a single-magnification model. For instance, Ho et al. (2019) proposed a multi-encoder and multi-decoder FCN model utilizing multiple input patches at various magnification levels (e.g., 20x, 10x and 5x) to obtain intermediate feature representations that are shared among each FCN model for accurate breast cancer image segmentation. A similar approach has been adopted in Tokunaga et al. (2019); Gecer et al. (2018) by training multiple FCN’s on different field-of-view images, which are aggregated to obtain a final segmentation map. In contrast, Gu et al. (2018) designed a multiple encoder model to aggregate information across different magnification levels, but utilized only one decoder to generate a final prediction map.

Nevertheless, the above patch-based models still suffer from significant computational overhead at higher magnification levels, and hence, do not scale well to WSIs. Therefore, some authors (Lin et al., 2019, 2018) have proposed a variant of FCN which consists of a dense scanning mechanism, that shares computations in overlapping regions during image scanning. To further improve the prediction accuracy of the FCN model, a new pooling layer named as ‘anchor layer’ is also introduced in Lin et al. (2019) by reconstructing the loss occurred in max-pooling layers. Such models have been shown to have inference speeds a hundred times faster than traditional patch-based approaches, while still ensuring a higher prediction accuracy in WSI analysis. On the other hand, Guo et al. (2019) presented an alternative method for fast breast tumour segmentation, in which, a network first pre-selects the possible tumour area via CNN based classification, and later refines this initial segmentation using an FCN based model. Their proposed framework obtains dense predictions with 1/8 size of original WSI in 11.5 minutes (on CAMELYON16 dataset), compared to the model trained using FCN alone.

Reference Cancer types Staining Application Method Dataset
Classification models
A. Local-level task
Cirecsan et al. (2013) Breast H&E Mitosis detection Pixel based CNN classifier ICPR2012 (50 images)
Wang et al. (2014) Breast H&E Mitosis detection Cascaded ensemble of CNN + handcrafted features ICPR2012 (50 images)
Song et al. (2015) Cervix H&E Segmentation of cervical cytoplasm and nuclei Multi-scale CNN + graph-partitioning approach Private set containing 53 cervical cancer images
Kashif et al. (2016) Colon H&E Cell detection Spatially constrained CNN + hand-crafted features 15 images of colorectal cancer tissue images
Xing et al. (2016) Multi-Cancers H&E, IHC Nuclei segmentation CNN + selection-based sparse shape model Private set containing brain tumour (31), pancreatic NET (22), breast cancer (35) images
Romo-Bucheli et al. (2016) Breast H&E Tubule nuclei detection and classification CNN based classification of pre-detected candidate nuclei 174 images with ER(+) breast cancer cases
Wang et al. (2016b) Lung H&E Cell detection Two shared-weighted CNNs for joint cell detection and classification TCGA (300 images)
Albarqouni et al. (2016) Breast H&E Mitosis detection Multi-scale CNN via crowdsourcing layer AMIDA2013 (666 - HPF images)
Song et al. (2017) Cervix Pap, H&E Segmentation of cervical cells Multi-scale CNN model Overlapping cervical cytology image segmentation challenge (ISBI 2015) - 8 images, private set - 21 images
Gao et al. (2017) Multi-Cancers IFL Cell classification CNN (LeNet-5) based classification of HEp2-cells ICPR2012 (28 images), ICPR2014 (83 images)
Tellez et al. (2018) Breast H&E, PHH3 Mitosis detection Ensemble of CNNs using H&E registered to PHH3 tissue slides as reference standard TNBC (36 images), TUPAC (814 images)
Qaiser et al. (2019b) Colon H&E tumour segmentation Combination of CNN and persistent homology feature based patch classifier Two private sets containing 75 and 50 colorectal adenocarcinoma WSIs
B. Global-level task
Cruz-Roa et al. (2014) Breast H&E Detection of invasive ductal carcinoma CNN based patch classifier Private set - 162 cases
Ertosun and Rubin (2015) Brain H&E Glioma grading Ensemble of CNN models TCGA (54 WSIs)
Litjens et al. (2016) Multi-Cancers H&E Detection of prostate and breast cancer CNN based pixel classifier Two private sets (225 + 173 WSIs)
Bejnordi et al. (2017) Breast H&E Breast cancer classification Stacked CNN incorporating contextual information Private set - 221 images
Agarwalla et al. (2017) Breast H&E tumour segmentation CNN + 2D-LSTM for representation learning and context aggregation Camelyon16 (400 WSIs)
Kong et al. (2017) Breast H&E Detection of breast cancer metastases CNN with the 2D-LSTM to learn spatial dependencies between neighboring patches Camelyon16 (400 WSIs)
Vandenberghe et al. (2017) Breast IHC IHC scoring of HER2 status in breast cancer CNN based patch classifier 71 WSIs of invasive breast carcinoma (Private set)
Cruz-Roa et al. (2017) Breast H&E Detection of invasive breast cancer CNN based patch classifier TCGA + four other private sets (584 cases)
Sharma et al. (2017) Stomach H&E, IHC Gastric cancer classification and necrosis detection Patch-based CNN classifier Private set - 454 WSIs
BenTaieb and Hamarneh (2018) (✓) Breast H&E Detection of breast cancer metastases CNN based recurrent visual attention model Camelyon16 (400 WSIs)
Awan et al. (2018) Breast H&E Breast cancer classification CNN based patch classification model incorporating contextual information BACH 2018 challenge (400 WSIs)
Li and Ping (2018) (✓) Breast H&E Detection of breast cancer metastases CNN + CRF to model spatial correlations between neighboring patches Camelyon16 (400 WSIs)
Bejnordi et al. (2018) Breast H&E Detection of invasive breast cancer Multi-stage CNN that first identifies tumour-associated stromal alterations and further classify into normal/benign vs invasive breast cancer Private set - 2387 WSIs
Cruz-Roa et al. (2018) Breast H&E Detection of invasive breast cancer Patch based CNN model with adaptive sampling method to focus only on high uncertainty regions TCGA + 3 other public datasets (596 cases)
Qaiser et al. (2019b) Breast IHC Immunohistochemical scoring of HER2 Deep reinforcement learning model that treats IHC scoring as a sequential learning task using CNN + RNN HER2 scoring contest (172 images), private set - 82 gastroenteropancreatic NET images
Wei et al. (2019) (✓) Lung H&E Classifcation of histologic subtypes on lung adenocarcinoma ResNet-18 based patch classifier Private set - 143 WSIs
Nagpal et al. (2019) Prostate H&E Predicting Gleason score CNN based regional Gleason pattern classification + k-nearest-neighbor based Gleason grade prediction TCGA (397 cases) + two private sets (361 + 11 cases)
Shaban et al. (2019b) (✓) Mouth H&E tumour infiltrating lymphocytes abundance score prediction for disease free survival CNN (MobileNet) based patch classifier, followed by statistical analysis 70 cases of oral squamous cell carcinoma WSIs (Private set)
Halicek et al. (2019) Head & Neck H&E Detection of squamous cell carcinoma and thyroid carcinoma CNN (Inception-v4) based patch classifier Private set - 381 images
Xu et al. (2019a) Breast H&E Detection of breast cancer Deep hybrid attention (CNN + LSTM) network BreakHis (7,909 images)
Zhang et al. (2019) (✓) Bladder H&E Bladder cancer diagnosis CNN + RNN to generate clinical diagnostic descriptions and network visual attention maps 913 images of urothelial carcinoma from TCGA and private set
Regression models
Xie et al. (2015a) Multi-Cancers Ki-67 Nuclei detection CNN based hough voting approach Neuroendocrine tumour set (private - 44 images)
Xie et al. (2015b) Multi-Cancers H&E, Ki-67 Cell detection CNN based structured regression model TCGA (Breast-32 images), HeLa cervical cancer (22 images), Neuroendocrine tumour images (60 images)
Chen et al. (2016a) Breast H&E Mitosis detection FCN based deep regression network ICPR2012 (50 images)
Sirinukunwattana et al. (2016) Colon H&E Nuclei detection and classification CNN with spatially constrained regression CRCHisto (100 images)
Naylor et al. (2018) (✓) Multi-Cancers H&E Nuclei segmentation CNN based regression model for touching nuclei segmentation TNBC (50 images), MoNuSeg (30 images)
Xie et al. (2018b) Multi-Cancers H&E, Ki-67 Cell detection Structured regression model based on fully residual CNN TCGA (Breast-70 image patches), Bone marrow (11 image patches), HeLa cervical cancer (22 images), Neuroendocrine tumour set (59 image patches)
Graham et al. (2019b) (✓) Multi-Cancers H&E Nuclei segmentation and classification CNN based instance segmentation and classification framework CoNSeP (41 images), MoNuSeg (30 images), TNBC (50 images), CRCHisto (100 images), CPM-15 (15 images), CPM-17 (32 images)
Xing et al. (2019) Pancreas Ki-67 Nuclei detection and classification FCN based structured regression model Pancreatic neuroendocrine tumour set (private - 38 images)
Segmentation models
BenTaieb and Hamarneh (2016) Colon H&E Segmentation of colon glands A loss function accounting for boundary smoothness and topological priors in FCN learning GLAS challenge (165 images)
Chen et al. (2017a) Multi-Cancers H&E Segmentation of glands and nuclei Multi-task learning framework with contour-aware FCN model for instance segmentation GLAS challenge (165 images), MICCAI 2015 nucleus segmentation challenge (33 images)
Xu et al. (2017) Colon H&E Segmentation of colon glands Multi-channel deep network model for gland segmentation and instance recognition GLAS challenge (165 images)
de Bel et al. (2018) Kidney PAS Segmentation of renal tissue structures Evaluated three different architectures: FCN, Multi-scale FCN and UNet 15 WSIs of renal allograft resections (private set)
Van Eycke et al. (2018) Colon H&E, IHC Segmentation of glandular epithelium in H&E and IHC staining images CNN model based on integration of DCAN, UNet and ResNet models GLAS challenge (165 images) and a private set containing colorectal tissue microarray images
Gecer et al. (2018) Breast H&E Detection and classification of breast cancer Ensemble of multi-scale FCN’s followed by CNN based patch classifier 240 breast histopathology WSIs (private set)
Gu et al. (2018) Breast H&E Detection of breast cancer metastasis UNet based multi-resolution network with multi-encoder and single decoder model Camelyon16 (400 images)
Guo et al. (2019) Breast H&E Detection of breast cancer metastasis Classification (Inception-V3) based semantic segmentation model (DCNN) Camelyon16 (400 images)
Bulten et al. (2019b) Prostate H&E Grading of prostate cancer UNet based segmentation of Gleason growth patterns, followed by subsequent cancer grading 1243 WSIs of prostate biopsies (private set)
Lin et al. (2019) Breast H&E Detection of breast cancer metastasis FCN based model for fast inference of WSI analysis Camelyon16 (400 WSIs)
Liu et al. (2019) Breast DAB-H Immunohistochemical scoring for breast cancer Multi-stage FCN framework that directly predicts H-Scores of breast cancer TMA images 105 TMA images of breast adenocarcinomas (private set)
Bulten et al. (2019a) Prostate IHC, H&E Segmentation of epithelial tissue Pre-trained UNet on IHC is used as a reference standard to segment epithelial structures in H&E WSIs 102 prostatectomy WSIs
Swiderska-Chadaj et al. (2019) Multi-Cancers IHC Lymphocyte detection Investigated the effectiveness of four DL methods - FCN, UNet, YOLO and LSM LYON19 (test set containing 441 region-of-interests (ROIs))
Graham et al. (2019a) Colon H&E Segmentation of colon glands FCN with minimum information loss units and atrous spatial pyramid pooling GLAS challenge (165 images), CRAG dataset (213 images)
Ding et al. (2019) Colon H&E Segmentation of colon glands Multi-scale FCN model with a high-resolution branch to circumvent the loss in max-pooling layers GLAS challenge (165 images), CRAG dataset (213 images)
Zhao et al. (2019) Breast H&E Detection and classification of breast cancer metastasis Feature pyramid aggregation based FCN network with synergistic learning approach Camelyon16 (400 WSIs), Camelyon17 (1000 WSIs)
Qu et al. (2019a) (✓) Lung H&E Nuclei segmentation and classification FCN trained with perceptual loss 40 tissue images of lung adenocarcinoma (private set)
Ho et al. (2019) Breast H&E Breast cancer multi-class tissue segmentation Deep multi-magnification model with multi-encoder, multi-decoder and multi-concatenation network Private set containing TNBC (38 images) and breast margin dataset (10 images)
Tokunaga et al. (2019) Lung H&E Segmentation of multiple cancer subtype regions Multiple UNets trained with different FOV images + an adaptive weighting CNN for output aggregation 29 WSIs of lung adenocarcinoma (private set)
Lin et al. (2019) Breast H&E Detection of breast cancer metastasis FCN based model with anchor layers for fast and accurate prediction of cancer metastasis Camelyon16 (400 images)
Pinckaers and Litjens (2019) (✓) Colon H&E Segmentation of colon glands

Incorporating neural ordinary differential equations in UNet to allow an adaptive receptive field

GLAS challenge (165 images)
Seth et al. (2019) Breast H&E Segmentation of DCIS Compared UNets trained at multiple resolutions training:183 WSIs, testing:19 WSIs (private set)
Table 1: Overview of supervised learning models. The acronyms for the staining stands for: H&E (haematoxylin and eosin); DAB-H (Diaminobenzidine-Hematoxylin); IFL (Immunofluorescent); ER (Estrogen receptor), PR (Progesterone receptor); PC (Phase contrast); HPF (High power field); Pap (Papanicolaou stain); PHH3 (Phosphohistone-H3); IHC (Immunohistochemistry staining); PAS (Periodic acid–Schiff). Note: (✓) indicates the code is publicly available and the link is provided in their respective paper.

3.2 Weakly supervised learning

Figure 4: An overview of weakly supervised learning models.
Reference Cancer types Staining Application Method Dataset
Multiple instance learning (MIL)
Hou et al. (2015) Brain H&E Glioma subtype classification Expectation-maximization based MIL with CNN + logistic regression TCGA (1,064 slides)
Jia et al. (2017) Colon H&E Segmentation of cancerous regions FCN based MIL + deep supervision and area constraints Two private sets containing colon cancer images (910+60 images)
Liang et al. (2018) Stomach H&E Gastric tumour segmentation Patch-based FCN + iterative learning approach China Big Data and AI challenge (1,900 images)
Ilse et al. (2018) (✓) Multi-Cancers H&E Cancer image classification MIL pooling based on gated-attention mechanism CRCHisto (100 images)
Wang et al. (2019a) Stomach H&E Gastric cancer detection Two-stage CNN framework for localization and classification Private set (608 images)
Wang et al. (2019b) Lung H&E Lung cancer image classification Patch based FCN + context-aware block selection and feature aggregation strategy Private (939 WSIs), TCGA (500 WSIs)
Campanella et al. (2019) (✓) Multi-Cancers H&E Multiple cancer diagnosis in WSIs CNN (ResNet) + RNNs Prostate (24,859 slides), skin (9,962 slides), breast cancer metastasis (9,894 slides)
Dov et al. (2019) Thyroid Thyroid malignancy prediction CNN + ordinal regression for prediction of thyroid malignancy score Private set (cytopathology 908 WSIs)
Xu et al. (2019a) (✓) Multi-Cancers H&E Segmentation of breast cancer metastasis and colon glands FCN trained on instance-level labels, which are obtained from image-level annotations Camelyon16 (400 WSIs), Colorectal adenoma private dataset (177 WSIs)
Huang and Chung (2019) Breast H&E Localization of cancerous evidence in histopathology images CNN + multi-branch attention modules and deep supervision mechanism PCam (327,680 patches extracted from Camelyon16) and Camelyon16 (400 WSIs)
Other approaches
Campanella et al. (2018) Prostate H&E Prostate cancer detection CNN trained under MIL formulation with top-1 ranked instance aggregation approach Prostate biopsies (12,160 slides)
Akbar and Martel (2018) (✓) Breast H&E Detection of breast cancer metastasis

Clustering (VAE + K-means) based MIL framework

Camelyon16 (400 WSIs)
Tellez et al. (2019b) (✓) Multi-Cancers H&E Compression of gigapixel histopathology WSIs Unsupervised feature encoding method (VAE, Bi-GAN, contrastive training) that maps high-resolution image patches to low-dimensional embedding vectors Camelyon16 (400 WSIs), TUPAC16 (492 WSIs), Rectum (74 WSIs)
Qu et al. (2019b) (✓) Multi-Cancers H&E Nuclei segmentation Modified UNet trained using coarse level-labels + dense CRF loss for model refinement MoNuSeg (30 images), lung cancer private set (40 images)
Bokhorst et al. (2019) Colon H&E Segmentation of tissue types in colorectal cancer UNet with modified loss functions to circumvent sparse manual annotations Colorectal cancer WSIs (private set - 70 images)
Li et al. (2019a) (✓) Breast H&E Mitosis detection FCN trained with concentric loss on weakly annotated centriod label ICPR12 (50 images), ICPR14 (1,696 images), AMIDA13 (606 images), TUPAC16 (107 images)
Table 2: Overview of weakly supervised learning models. Note: (✓) indicates the code is publicly available and the link is provided in their respective paper.

The idea of weakly supervised learning (WSL) is to exploit coarse-grained (image-level) annotations to automatically infer fine-grained (pixel/patch-level) information. This paradigm is particularly well suited to the histopathology domain, where the coarse-grained information is often readily available in the form of image-level labels, e.g., cancer or non-cancer, but where pixel-level annotations are more difficult to obtain. Weakly supervised learning dramatically reduces the annotation burden on a pathologist (Xu et al., 2014), and an overview of these models is provided in Table 2.

In this survey, we explore one particular form of WSL, namely multiple-instance learning (MIL), which aims to train a model using a set of weakly labeled data (Dietterich et al., 1997; Quellec et al., 2017). In MIL, a training set consists of bags, labeled as positive or negative; and each bag includes many instances, whose label is to be predicted or unknown. For instance, each histology image with cancer/non-cancer label forms a ‘bag’ and each pixel/patch extracted from the corresponding image is referred to as an ‘instance’ (e.g., pixels containing cancerous cells). Here, the main goal is to train a classifier to predict both bag-level and instance-level labels, while only bag-level labels are given in the training set. We further categorize MIL approaches into three categories similar to Cheplygina et al. (2019): i) global detection - identifying a target pattern in a histology image (i.e., at bag level) such as the presence or absence of cancer; ii) local detection - identifying a target pattern in an image patch or a pixel (i.e., at instance level) such as highlighting the cancerous tissues or cells; iii) global and local detection - detecting whether an image has cancer and also identifying the location where it occurs within an image. These categories are illustrated in Fig. 4. There is also a significant interest in histopathology to include various kinds of weak annotations such as image-level tags (Campanella et al., 2019), points (Qu et al., 2019b), bounding boxes (Yang et al., 2018), polygons (Wang et al., 2019b) and percentage of the cancerous region within each image (Jia et al., 2017), to obtain clinically satisfactory performance with minimal annotation effort. For an in-depth review of MIL approaches in medical image analysis, refer to Quellec et al. (2017); Cheplygina et al. (2019); Rony et al. (2019); Kandemir and Hamprecht (2015).

Due to the variable nature of histopathology image appearances, the standard instance-level aggregation methods, such as voting or pooling, do not guarantee accurate image-level predictions, due to misclassifications of instance-level labels (Campanella et al., 2019; Rony et al., 2019). Hence, several papers on global detection based MIL method rely on alternative instance-level aggregation strategies to obtain reliable bag-level predictions suitable for a given histology task. For instance, Hou et al. (2015) integrated an expectation-maximization based MIL method with a CNN to output patch-level predictions. These instances are later aggregated by training a logistic regression model to classify glioma subtypes in WSIs. Dov et al. (2019) proposed an alternative approach based on ordinal regression framework for aggregating instances containing follicular (thyroid) cells to simultaneously predict both thyroid malignancy and TBS score in whole-slide cytopathology images. Recently, a remarkable work in Campanella et al. (2019) adopted an RNN model to integrate semantically rich feature representations across patch-level instances to obtain a final slide-level diagnosis. In their method, the author’s managed to obtain an AUC greater than 0.98 in detecting four types of cancers on an extensive multi-centre dataset of 44,732 WSIs, without expensive pixel-wise manual annotations.

The local detection based MIL approaches are based on an image-centric paradigm, where image-to-image prediction is performed using an FCN model - by computing features for all instances (pixels) together. These approaches are generally applied to image segmentation task for precisely delineating cancerous region in histology images. In the local detection approach, the bag labels are propagated to all instances to train a classifier in a supervised manner. However, sometimes even the best bag-level classifier seems to underperform on instance-level predictions due to lack of supervision (Cheplygina et al., 2019). To tackle this issue, additional weak constraints have been incorporated into FCN models to improve segmentation accuracy. For example, Jia et al. (2017) included an area constraint in the MIL formulation by calculating the rough estimate of the relative size of the cancerous region. This additional area constraint along with the image label has been shown to facilitate the model learning process, with an extra overhead cost on the annotation process. However, calculating such area constraints is tedious and can only be performed by an expert pathologist. Consequently, Xu et al. (2019b) proposed an alternative MIL framework to generate instance-level labels from image-level annotations. These predicted instance-level labels are later assigned to their corresponding image pixels to train an FCN in an end-to-end manner, while achieving comparable performance with supervised counterparts. Finally, in some cases, both a large number of bag labels and a partial set of instance labels are also adopted in FCN based reiterative learning framework (Liang et al., 2018), to further optimize final instance-level predictions.

Arguably, the most popular and clinically relevant MIL approach in histopathology is the global and local detection paradigm. In this approach, rather than just diagnosing cancer at whole-slide level, we can simultaneously localize the discriminative areas (instances) containing cancerous tissues or cells. In this context, the methods utilize either the bag-level label (Wang et al., 2019a) or both bag-level and some coarse level instance annotations (Wang et al., 2019b) to infer a global level decision. Note that the instance-level predictions are not usually validated due to lack of costly annotations, and are generally visualized as either a heatmap (Wang et al., 2019a, b) or a saliency map (Huang and Chung, 2019) to highlight the diagnostically significant locations in WSIs. The main essence of this approach is to capture the instance-wise dependencies and their impact on the final image-level decision score.

There is a some disagreement among MIL methods regarding the accuracy of instance-level predictions, when trained with only bag-level labels (Cheplygina et al., 2019; Kandemir and Hamprecht, 2015). The critical and often overlooked issue among MIL methods is that even the best bag-level classifier may not be an optimal instance-level classifier for instance predictions and vice versa (Cheplygina et al., 2019). Such problems have naturally led to new solutions that integrate the visual attention models with MIL techniques to enhance the interpretability of final model predictions (Ilse et al., 2018; Huang and Chung, 2019). For instance, Huang and Chung (2019) proposed a CNN model combining multi-branch attention modules and a deep supervision mechanism (Xie and Tu, 2015), which aims to localize the discriminative evidence for the class-of-interest from a set of weakly labeled training data. Such attention-based models can precisely pinpoint the location of cancer evidence in WSI, as well as achieving a competitive slide-level accuracy, thereby enhancing the interpretability of current DL models in histopathology applications.

Not all methods identified as weakly supervised in the literature necessarily fall under the MIL category. For instance, the methods in Qu et al. (2019b); Bokhorst et al. (2019); Li et al. (2019a) use the term “weakly supervised” to indicate that the model training has been performed on sparse set of annotations such as points inside the region of interest (Li et al., 2019a; Qu et al., 2019b), bounding box (Yang et al., 2018) and also some partial pixel-level annotations of cancerous region (Bokhorst et al., 2019). These approaches alleviate the need for expensive annotations by proposing newer variants of loss functions (Li et al., 2019a), feature encoding strategies (Tellez et al., 2019b; Akbar and Martel, 2018), loss balancing mechanisms (Bokhorst et al., 2019), and methods to derive coarse labels from weak annotations (Qu et al., 2019b) in order to eventually train fully-supervised models in a weakly supervised way.

3.3 Unsupervised learning

Figure 5: An overview of unsupervised learning models.

The goal of unsupervised learning is to learn something useful about the underlying data structure without the use of labels. The term “unsupervised” is sometimes used loosely among the digital pathology community for approaches that are not fully unsupervised. For instance, stain transfer without pairing, or domain adaptation via feature distribution matching are considered as unsupervised, even though the domains can be considered as labels for two separate datasets (Gadermayr et al., 2019a; de Bel et al., 2019; Ganin et al., 2016). In this survey, we examine fully unsupervised methods, where the raw data comes in the form of images without any identifiers (e.g., domain, cancerous vs. non-cancerous, tissue etc.). These approaches are rare, since the field of unsupervised learning among the machine learning community is also still in its infancy. However, it is clear why one should be interested in such approaches as the scarcity of labeled data due to regulatory concerns and labor costs (i.e., expert annotations) is a major bottleneck in achieving clinically satisfactory performance in medical imaging (Lee and Yoon, 2017).

In unsupervised learning, the learning task is ambiguous, since it is possible to map the inputs into infinitely many subsets, provided there are no restrictions. Most unsupervised approaches aim to maximize the probability distribution of the data, subject to some constraints, in order to limit the solution space and to achieve a desired grouping/clustering for the target task. A common technique is to transform the data into a lower-dimensional subspace, followed by aggregation of feature representations into mutually exclusive or hierarchical clusters, which is illustrated in Fig.


. Autoencoders are typically utilized for the dimensionality reduction step. Recent advances in modeling the stochasticity

(Kingma and Welling, 2013), and more robustly disentangling visual features (Higgins et al., 2017; Chen et al., 2018) have made autoencoders more attractive for feature modeling and dimensionality reduction. In early work, sparse autoencoders were utilized for unsupervised nuclei detection (Xu et al., 2015a). Later, detection performance was improved by modifying the receptive field of the convolutional filters to accommodate small nuclei (Hou et al., 2019b). For more complex tasks, such as tissue and cell classification, Generative Adversarial Networks (GANs) have also been employed. Specifically, InfoGANs (Chen et al., 2016b) have been used for extracting features, which maximize the mutual information between the generated images and a predefined subset of latent (noise) codes, which are then used for tasks such as cell-level classification, nuclei segmentation, and cell counting (Hu et al., 2018a).

Finally, we examine unsupervised transfer learning approaches, where instead of directly applying learned features on a target task, learned mapping functions are used as an initialization for target tasks, possibly with very few labeled training images. Using a loss term that is similar to the reconstruction objective of autoencoders, (Chang et al., 2017) trains a convolutional network using unlabeled images pertaining to a specific modality (e.g., brain MRI or kidney histology images), to learn filter banks at different scales by sparsely encoding the input images of sizes and pixels. The resulting filters are shift-invariant, scale-specific, and can uncover intricate patterns in various tasks, such as tumour classification of glioblastoma multiforme or kidney renal clear cell carcinoma. In machine learning, this form of unsupervised learning is called “self-supervised” learning. Since self-supervised techniques can deal with larger images in general, they offer a promising alternative to clustering approaches in histopathology, which usually require context and a larger field of view. Context-based self-supervised methods which predict spatial ordering (Noroozi and Favaro, 2016) or image rotations (Gidaris et al., 2018), and generative methods such as mapping grayscale images into their RGB counterparts have been successfully used for initializing networks for faster convergence and learning target tasks with fewer labels. However, in histopathology, the rules governing the spatial location of cell structures, or the color or staining of a histology image are different to those for natural scene images. While, this makes the task of unsupervised learning more difficult for histopathology images, it also presents an opportunity for researchers to develop novel techniques that may be applicable to medical images.

Unsupervised learning methods are desirable as they allow models to be trained with little or no labeled data. Furthermore, as these methods are constructed to disentangle relationships between samples in the dataset for grouping (or clustering), a successful unsupervised learning method can also improve the interpretability of a model, by examining how the model groups items into separate categories. While fully unsupervised methods for arbitrary tasks are still uncommon, techniques used for auxiliary tasks (e.g., pre-training) such as self-supervision (Tellez et al., 2019b) can reduce the annotation burden on the expert, thereby significantly expediting the research.

Reference Cancer types Staining Application Method Dataset
Xu et al. (2015a) Breast H&E Nuclei segmentation Stacked sparse autoencoders 537 H&E images from Case Western Reserve University
Hu et al. (2018a) (✓) Bone marrow H&E Tissue and cell classification InfoGAN 3 separate datasets: public data with 11 patches of size , private datasets with WSIs of 24 patients + 84 images
Bulten and Litjens (2018) Prostate H&E, IHC Classification of prostate into tumour vs. non-tumour Convolutional adversarial autoencoders 94 registered WSIs from Radboud University Medical Center
Sari and Gunduz-Demir (2019) Colon H&E, IHC Subtyping of intrahepatic cholangiocarcinoma (ICC) Restricted Boltzmann Machines + Clustering 3236 images, private dataset
Quiros et al. (2019) (✓) Breast H&E High resolution image generation + feature extraction BigGAN + Relativistic GAN 248 + 328 patients from private dataset
Hou et al. (2019b) Breast H&E Nuclei detection, segmentation and representation learning Sparse autoencoder 0.5 million images of nuclei from TCGA
Gadermayr et al. (2019a) Kidney Stain agnostic Segmentation of object-of-interest in WSIs CycleGAN + UNet segmentation 23 PAS, 6 AFOG, 6 Col3 and 6 CD31 WSIs
de Bel et al. (2019) Kidney Stain agnostic Tissue segmentation CycleGAN + UNet segmentation Private set containing 40 + 24 biopsy images
Gadermayr et al. (2019b) Kidney PAS, H&E Segmentation of the glomeruli CycleGAN 23 WSIs, private dataset
Table 3: Overview of unsupervised learning models. Note: (✓) indicates the code is publicly available and the link is provided in their respective paper.

3.4 Transfer learning

Reference Cancer types Staining Application Method Dataset
Wang et al. (2016a) Breast H&E Detection of breast cancer metastasis Pre-trained GoogleNet model Camelyon16 (400 WSIs)
Liu et al. (2017) Breast H&E Detection of breast cancer metastasis Pre-trained Inception-V3 model Camelyon16 (400 WSIs)
Han et al. (2017) Breast H&E Breast cancer multi-classification CNN integrated with feature space distance constraints for identifying feature space similarities BreaKHis (7,909 images)
Lee and Paeng (2018) Breast H&E Detection and pN-stage classification of breast cancer metastasis

Patch based CNN for metastasis detection + Random forest classifier for lymph node classification

Camelyon17 (1,000 WSIs)
Chennamsetty et al. (2018) Breast H&E Breast cancer classification Ensemble of three pre-trained CNNs + aggregation using majority voting BACH 2018 challenge (400 WSIs)
Kwok (2018) Breast H&E Breast cancer classification Inception-Resnet-V2 based patch classifier BACH 2018 challenge (400 WSIs)
Bychkov et al. (2018) Colon H&E Outcome prediction of colorectal cancer A 3-layer LSTM + VGG-16 pre-trained features to predict colorectal cancer outcome Private set (420 cases)
Arvaniti et al. (2018) (✓) Prostate H&E Predicting Gleason score Pre-trained MobileNet architecture Private set (886 cases)
Coudray et al. (2018) (✓) Lung H&E Genomics prediction from pathology images Patch based Inception-V3 model TCGA (1,634 WSIs) and validated on independent private set containing frozen sections (98 slides), FFPE sections (140 slides) and lung biopsies (102 slides)
Kather et al. (2019) (✓) Colon H&E Survival prediction of colorectal cancer Pre-trained VGG-19 based patch classifier TCGA (862 WSIs) and two other public datasets (25 + 86 WSIs)
Noorbakhsh et al. (2019) (✓) Multi-Cancers H&E Pan-cancer classification Pre-trained Inception-V3 model TCGA (27,815 WSIs)
Tabibu et al. (2019) (✓) Kidney H&E Classification of Renal Cell Carcinoma subtypes and survival prediction Pre-trained ResNet based patch classifier TCGA (2,093 WSIs)
Akbar et al. (2019) Breast H&E tumour cellularity (TC) scoring Two separate InceptionNets: one for classification (healthy vs. cancerous tissue) and the other outputs regression scores for TC BreastPathQ (96 WSIs)
Valkonen et al. (2019) (✓) Breast ER, PR, Ki-67 Cell detection Fine-tuning partially pre-trained CNN network DigitalPanCK (152 - invasive breast cancer images)
Table 4: Overview of transfer learning models. Note: (✓) indicates the code is publicly available and the link is provided in their respective paper.

The most popular and widely adopted technique in digital pathology is the use of transfer learning approach. In transfer learning, the goal is to extract knowledge from one domain (i.e., source) and apply it to another domain (i.e., target) by relaxing the assumption that the train and test set must be independent and identically distributed. In histopathology, this can be viewed as “different task, different domain” problem as categorised in Cheplygina et al. (2019). Here, the source task is to pre-train a network on ImageNet (Russakovsky et al., 2015), which is later used to initialize a network to fine-tune for a target task, where the source and target domain are different.

In a recent review by Litjens et al. (2017), the application of transfer learning techniques in medical imaging is sub-divided into two main streams: i) use of a pre-trained CNN as a feature extractor; ii) fine-tuning a network, which is initialized with pre-trained weights for the target task. The application of the latter technique is of particular interest to the histopathological imaging community, for a wide variety of classification problems. This attention is mainly due to the lack of well-annotated training datasets, which are generally expensive and cumbersome to obtain in practice. Hence, it is a common practice in the medical imaging domain to fine-tune a pre-trained network rather than training from scratch. Some earlier notable works (Shin et al., 2016; Tajbakhsh et al., 2016) in medical imaging investigated the effect of full training versus fine-tuning on various medical image analysis task and demonstrated that the fine-tuning strategy is most beneficial for problems with limited manual annotations.

In this regard, various pre-trained models have been adopted in histopathology domain such as: VGGNet (Simonyan and Zisserman, 2014), InceptionNet (Szegedy et al., 2015, 2016), ResNet (He et al., 2016), MobileNet (Howard et al., 2017), DenseNet (Huang et al., 2017), and various other variants of these models. These pre-trained models have been widely applied to various cancer grading and prognosis tasks (Refer, Table 4 and Section 4 for more details). A critical analysis of best performing methods on various Grand Challenges is discussed thoroughly in Section 5.1.

3.4.1 Domain adaptation

Domain adaptation is a sub-field of transfer learning, where a task is learned from one or more source domains with labeled data (e.g., segmentation of tumour epithelium on Programmed death-ligand 1, or PD-L1, stained images), and the aim is to achieve similar performance on the same task on a target domain (e.g., segmentation on Cytokeratin stained images) with little or no labeled data (Wang and Deng, 2018). In this work, we focus on DNN based domain adaptation, which tries to match the transformed feature distributions of source and target domains in a bottleneck layer prior to the task output layer (Ganin et al., 2016).

Earlier work extends Ganin et al. (2016) by using deep CNNs for domain classifiers, in order to accommodate complicated medical images (Lafarge et al., 2017; Ren et al., 2018), rather than simple datasets such as MNIST digits or CIFAR-10 (Krizhevsky and others, 2009). Deep networks have also been utilized in feature regularization within and across domains. For instance, Ren et al. (2018) performed unsupervised training based on siamese networks on prostate WSIs, positing that given a WSI, different patches should be given the same Gleason score, thereby extracting common features present in different parts of the WSI. This auxiliary task also helped increase the adversarial domain adaptation performance on another target dataset. Fake (artificially generated) images are also used as an auxiliary step in domain adaptation. Brieu et al. (2019) utilized semi-automatic labeling of the nuclei with one type of staining (Immunofluorescence) to alleviate costlier annotation of another staining method (H&E), where fake H&E images are generated from Immunofluorescence images to increase the dataset size. Similarly, Gadermayr et al. (2019a)

used artificial data generation with GANs for semantic segmentation in kidney histology images with multiple stains. Each work uses adversarial models (i.e., generators and discriminators) for image-to-image translation utilizing cycle consistency loss for unpaired training. The translation is performed to obtain an intermediate, stain-agnostic representation

(Lahiani et al., 2019), which is then fed to a network trained on this representation to perform segmentation.

Reference Cancer types Staining Application Method Dataset
Domain adaptation
Lafarge et al. (2017) Breast H&E Mitosis detection Gradient reversal with CNNs TUPAC16 (73 WSIs)
Ren et al. (2018) Prostate H&E Feature matching of image patches Siamese networks TCGA + private dataset
Brieu et al. (2019) Multi-Cancers Multi-stain Semi-automatic nuclei labeling using stain transfer CycleGAN TCGA (75 bladder cancer + 29 lung cancer + 142 tissue samples of FOVs images) + 30 FOVs of breast cancer (private set)
Gadermayr et al. (2019a) Kidney Stain agnostic Segmentation of object-of-interest in WSIs CycleGAN + UNet segmentation 23 PAS, 6 AFOG, 6 Col3 and 6 CD31 WSIs
Kapil et al. (2019) Lung PD-L1 + Cytokeratin Segmentation CycleGAN + SegNet segmentation model 56 Cytokeratin + 69 PD-L1 WSIs (private set)
Stain variability
Janowczyk et al. (2017) (✓) Multi-Cancers H&E Stain transfer for H&E staining Sparse autoencoders 5 breast biopsy slides + 7 gastro-intestinal biopsies
Cho et al. (2017) Breast H&E Stain transfer DCGAN conditioned on a target image CAMELYON16
BenTaieb and Hamarneh (2017) Multi-Cancers H&E Stain transfer GAN + regularization based on auxiliary task performance ICPR2014 + GLAS challenge + 135 WSIs (private set)
Zanjani et al. (2018) (✓) Lymph nodes H&E Stain transfer for H&E staining

Multiple studies with Gaussian mixture models, variational autoencoders, and InfoGAN

625 images from 125 WSIs of lymph nodes from 3 patients
de Bel et al. (2019) Kidney Stain agnostic Segmentation CycleGAN + UNet segmentation 40 + 24 biopsy images (private)
Shaban et al. (2019a) (✓) Breast H&E Stain transfer CycleGAN ICPR2014
Rivenson et al. (2019) Multi-Cancers H&E, Jones, Masson’s trichrome Digital staining of multiple tissues Custom GAN N/A
Lahiani et al. (2019) Liver FAP-CK from Ki67-CD8 Virtual stain transformation between different types of staining CycleGAN + Instance normalization 10 Ki67-CD8 + 10 FAP-CK stained colorectal carcinoma WSIs
Table 5: Overview of domain adaptation and stain normalization models. Note: (✓) indicates the code is publicly available and the link is provided in their respective paper.

3.4.2 Stain normalization

Stain normalization, augmentation and stain transfer are popular image preprocessing techniques to improve generalization of a task by modifying the staining properties of a given image to visually match another image. We discuss these techniques under transfer learning, as modifying color properties of a given image to match another image with different staining is closely related to domain adaptation. In contrast to domain adaptation, which modifies the features extracted from different image distributions so that they are indistinguishable from each other; stain normalization directly modifies the input images in order to obtain features that are robust to staining variability.

In histopathology, staining refers to the coloring of structures of interest using various stains or dye, to enhance their contrast with respect to the surrounding tissue. Depending on the application and extraction site, different types of staining can be employed (e.g., Ki-67 immunostaining for brain, breast and prostate samples for estimating tumour cell proliferation rate (Valkonen et al., 2019), or H&E, which is widely employed in histopathology for breast, lung, muscle, and skin). H&E staining is still a significant factor in the variability of WSIs that poses a challenge in examination and diagnosis, even for expert pathologists. As CNNs are highly sensitive to the data they were trained on, staining variation will likely lead to reduced testing performance, if the staining properties do not match the training images (Ciompi et al., 2017). In this section, we examine some of the approaches to alleviate this variability by normalization, augmentation or stain transfer.

One may combat staining variation by augmenting the training data by varying each pixel value per channel within a predefined range on transformed color spaces, such as HSV (hue, saturation and value) or HED (Hematoxylin, Eosin, and Diaminobenzidine) (Liu et al., 2017; Li and Ping, 2018; Tellez et al., 2018). Earlier machine learning (ML) methods (Macenko et al., 2009; Vahadane et al., 2016) assume that staining attenuates light (optical density) uniformly and decompose each optical density image into concentration (appearance) and color (stain) matrices, and use the latter to stain new images by splitting them into the same two components. The uniformity assumption is relaxed in more recent ML methods, where the type of chemical staining and morphological properties of an image are considered in generating stain matrices (Khan et al., 2014; Bejnordi et al., 2015). Neural networks, such as sparse autoencoders for template matching (Janowczyk et al., 2017), and GANs are also used for stain transfer and normalization (Zanjani et al., 2018; de Bel et al., 2019; BenTaieb and Hamarneh, 2017; Cho et al., 2017). Cycle consistency loss objective (Zhu et al., 2017a) has been utilized for improved stain transfer with structure preservation, as well as for training systems without annotating the pairing between the source (to be stained) and the target (used as a reference to stain new images) (Shaban et al., 2019a; de Bel et al., 2019). Various iterations on GANs have improved the structure and feature preservation properties (Cho et al., 2017), and auxiliary tasks, such as maintaining high prediction accuracy on classification or segmentation, have led to consistent stain transfer accounting for the type and/or shape of the tissue present (Odena et al., 2017; BenTaieb and Hamarneh, 2017). Recently, same techniques have also been used for virtually staining quantitative phase images of label-free tissue sections with different types of staining, to eliminate the need for the physical staining in various sections, including human skin (H&E stained), kidney (Jone’s stain), and liver tissue (Masson’s trichrome stain) (Rivenson et al., 2019).

While the modern approaches for stain transfer are visually superior to their traditional counterparts and are aesthetically more pleasing with fewer artefacts, their use cases are still not entirely clear. For instance, Shaban et al. (2019a) report considerable gains compared to a baseline (no augmentations or traditional methods such as Macenko et al. (2009); Reinhard et al. (2001); Vahadane et al. (2016) in the CAMELYON16 challenge; however, the winning entry (by a margin of 21% with respect to (Shaban et al., 2019a) in AUC for a binary classification task on WSIs) utilizes a traditional machine learning based normalization technique that aligns chromatic and density distributions of source and the target (Bejnordi et al., 2015). Note that we do not imply that normalization should not be used, or abandoned for research. A thorough study comparing numerous approaches found that it is always advisable to apply various forms of aggressive color augmentation in HSV or HED space, and additional slight performance gains are still achievable with a network-based augmentation strategy (Tellez et al., 2019a). Similarly, Stacke et al. (2019) examined the effect of domain shift on histopathological data and automated medical imaging systems, and found that augmentation and normalization strategies drastically alter the performance of these systems. Differentiating factors such as scale-spaces, resolution, image quality, scanner imperfections are also likely to affect the performance of a model, in addition to staining, which is less explored in the community.

4 Survival models for disease prognosis

This section concentrates on methods of training survival models that can either generate a probability of an event in a certain predefined period of time, or can predict time to an event using regression from a WSI. In the context of cancer, the term prognosis refers to the likely outcome for a patient on standard treatment, and the term prediction refers to how the patient responds to a particular treatment. Since the difference between these two terms is not relevant when carrying out survival analysis, we will use the term prediction to cover both prediction and prognosis. The outcome metrics used to train a prediction model will depend on the disease. For example, in patients with very aggressive disease such as glioblastoma, the survival time in months may be used as an endpoint, whereas for breast cancer, with an average survival rate at 10 years of around 80%, the time to recurrence of the disease after surgery is a more relevant metric and at least 5 years follow-up is required. Following up patients prospectively is a time consuming and expensive process and for this reason several studies use existing clinically validated risk models, or genomics assays as a proxy for long term outcomes; for example, the use of PAM50 scores (Veta et al., 2019; Couture et al., 2018) in breast cancer and Gleason grades in prostate cancer (Nagpal et al., 2019). The survival data or risk scores may be dichotomized e.g., survival at specific time points or risk score above or below a set cutoff; this allows the survival model to be treated as a classification problem, but information is lost and new models have to be trained if the cutoff value is changed, e.g., two different models are needed to predict 5-year and 10-year survival times. Time to event models are more complicated since nothing is known about what happens to a patient after they are lost to followup; this is known as right censoring. A proportional hazards model is commonly used to model an individual’s survival and can be implemented using a neural network (Katzman et al., 2018) and several groups have used this approach in digital pathology (Zhu et al., 2017b; Mobadersany et al., 2018; Tang et al., 2019).

The data used to train survival models is weakly labeled, with only one outcome label per patient. This poses a computational challenge as a WSI is so large that it has to be broken down into 100’s or even 1000’s of smaller patches for processing; since tumours may be very heterogeneous, only a subset of these patches may be salient for the prediction task. Although Campanella et al. (2019) recently demonstrated that a relatively simple MIL approach could produce accurate diagnostic results when more than 10,000 slides were available for training, datasets for survival analysis usually have fewer than 1000 slides available which makes the task much more difficult. Three main approaches are used to overcome the shortage of labeled data. The first is to use the image features that expert pathologists have already identified as being associated with survival. Examples include assessing tumour proliferation (Veta et al., 2019) in breast cancer, quantifying the stroma/tumour ratio in colorectal cancer (Geessink et al., 2019) and predicting the Gleason grade in prostate cancer (Nagpal et al., 2019)

. The role of deep learning in these cases is to provide an automatic and reproducible method for extracting these features. In the second approach, image features are extracted from image patches using a pre-trained CNN, then feature selection or dimensionality reduction is carried out and finally a survival model is trained on the resulting feature vector. Examples include survival prediction in mesothelioma

(Courtiol et al., 2019) and colorectal cancer (Kather et al., 2019; Bychkov et al., 2018), and risk of recurrence in breast cancer (Couture et al., 2018). In the third approach, unsupervised methods are used to learn a latent representation of the data which is then used to train the survival model. For example, Zhu et al. (2017b) apply K-means to small patches to identify 50 clusters or phenotypes for glioma and non-small-cell lung cancer, and Muhammad et al. (2019) use an autoencoder with an additional clustering constraint to predict survival in intrahepatic cholangiocarcinoma.

There are many possible ways of aggregating the predictions for individual patches to give a single prediction for a patient. The simplest approach is to take the mean prediction across all patches (Tang et al., 2019), but this will not work if the salient patches only represent a small fraction of the WSI; for this reason other schemes, such as taking the average of the two highest ranking patches (Mobadersany et al., 2018), may be more appropriate. Some methods generate a low dimensional feature vector that captures the distribution of scores across the patches. For example, Nagpal et al. (2019)

use the distribution of Gleason scores across all patches as an input feature vector to a KNN to generate a patient score, and

Couture et al. (2018)

aggregate patch probabilities into a quantile function which is then used by an SVM to generate a patient level class. Methods that assign patches to discrete classes or clusters can simply use the majority class to label the WSI

(Muhammad et al., 2019) or adopt a RNN to generate a single prediction from a sequence of patches (Bychkov et al., 2018).

End-to-end methods that learn features directly from the image data and allow probabilities to be associated with individual patches can be used to uncover new information how morphology is related to outcome. For example, Courtiol et al. (2019) were able to show that regions associated with stroma, inflammation, cellular diversity, and vacuolization were important in predicting survival in mesothelioma patients, and Mobadersany et al. (2018) showed that microvascular proliferation and increased cellularity is associated with poorer outcome in glioma patients. Prediction heatmaps may also allow researchers to uncover patterns of tumour heterogeneity and could be used to guide tissue extraction for genomics and proteomics assays. Deep learning survival models are, therefore, of great interest to cancer researchers as well as to pathologists and oncologists.

Reference Cancer types Application Method Dataset
Zhu et al. (2017b) Multi-Cancers Loss function based on survival time Raw pixel values of downsampled patches used as feature vectors; 10 clusters identified using K-means clustering. Deep survival models are trained for each cluster separately. Significant clusters are identified and corresponding scores are fed into final WSI classifier TCIA-NLST, TCGA-LUSC, TCGA-GBM
Bychkov et al. (2018) Colorectal 5 year disease specific survival Extracted features using pre-trained VGG-16. Used RNN to generate WSI prediction from tiles Private set - TMAs from 420 patients
Couture et al. (2018) (✓) Breast Prediction of tumour grade, ER status, PAM50 intrinsic subtype, histologic subtype and risk of recurrence score Pre-trained VGG-16 model. Aggregate features over regions to predict class for each patch, then frequency distribution of classes input to SVM to combine regions to predict TMA class TMA cores (Private-1203 cases)
Mobadersany et al. (2018) (✓) Brain Time to event modelling CNN integrated with a Cox proportional hazards model to predict patient outcomes using histology and genomic biomarkers. Calculate median risk for each ROI, then average 2 highest risk regions TCGA-LGG, TCGA-GBM (1,061 WSIs)
Courtiol et al. (2019) Mesothelioma Loss function based on survival time Pre-trained ResNet50 extracts features from 10000 tiles. 1-D convolutional layer generates score for each tile. 10 highest and lowest scores fed into MLP classifier for WSI prediction MESOPATH/MESOBANK (private set-2,981 WSIs), TCGA validation set (56 WSIs)
Geessink et al. (2019) Colorectal Dichotomized tumour/stromal ratios CNN based patch classifier trained to identify tissue components. Calculate tumour-stroma ratio for manually defined hot-spots Private set-129 WSIs
Kather et al. (2019) (✓) Colorectal Dichotomized stromal score VGG-19 based patch classifier trained to identify tissue component. Calculate HR for each tissue component using mean activation. Combine components with to give a “deep stromal score” NCT-CRC-HE-100k; TCGA-READ, TCGA-COAD
Muhammad et al. (2019) Liver ICC HRs of clusters compared Unsupervised method to cluster tiles using autoencoder. WSI assigned to cluster corresponding to majority of tiles Private set - 246 ICC H&E WSIs
Nagpal et al. (2019) Prostate Gleeson scoring Trained InceptionV3 network to predict Gleeson score on labeled patches. Then calculate % patches with each grade on the WSI and use result as a low dimensional feature vector input to kNN classifier TCGA-PRAD and private dataset
Qaiser et al. (2019a) Lymphoma Generate 4 DPC categories Multi-task CNN model for simultaneous cell detection and classification, followed by digital proximity signature (DPS) estimation Private set-32 IHC WSIs
Tang et al. (2019) Multi-Cancers Dichototomized survival time ( year and year) A capsule network is trained using a loss function that combines a reconstruction loss, margin loss ans Cox loss. The mean of all patch-level survival predictions is calculated to achieve a final patient-level survival prediction. TCGA-GBM and TCGA-LUSC
Veta et al. (2019) Breast Predict mitotic score & PAM50 proliferation score Multiple methods from challenge teams TUPAC 2016
Table 6: Overview of survival models for disease prognosis. Note: (✓) indicates the code is publicly available and the link is provided in their respective paper.

5 Discussion and future trends

Dataset / Year Cancer types Goal Images / Cases (train+test) Annotation Link
ICPR 2012 (Cirecsan et al., 2013) Breast Mitosis detection 50 (35+15) Pixel-level annotation of mitotic cells
AMIDA 2013 (Veta et al., 2015) Breast Mitosis detection 23 (12+11) Centroid pixel of mitotic cells
ICPR 2014 (Cirecsan et al., 2013) Breast Mitosis detection 2112 (2016+96) Centroid pixel of mitotic cells
GLAS 2015 (Sirinukunwattana et al., 2017) Colon Gland segmentation 165 (85+80) Glandular boundaries
TUPAC 2016 (Veta et al., 2019) Breast tumour proliferation based on mitosis counting & molecular data + two auxiliary tasks 821 (500+321) + (73/34) Proliferation scores & ROI of mitotic cells
HER2 Scoring 2016 (Qaiser et al., 2018) Breast HER2 scoring in breast cancer WSIs 86 (52+28) HER2 score on whole-slide level
BreakHis 2016 (Spanhol et al., 2015) Breast Breast cancer detection 82 (7909 patches) WSL benign vs. malignant annotation
CRCHisto 2016 (Sirinukunwattana et al., 2016) Colon Nuclei detection & classification 100 29,756 nuclei centres + out of which 22,444 with associated class labels
CAMELYON16 (Ehteshami Bejnordi et al., 2017) Breast Breast cancer metastasis detection 400 (270+130) Contour of cancer locations
CAMELYON17 (Bandi et al., 2018) Breast Breast cancer metastasis detection & pN-stage prediction 1000 (500+500) Contour of cancer locations + patient level score
MoNuSeg 2018 (Kumar et al., 2019) Multi-Cancers Nuclei segmentation 44(30+14) 22,000+7000 nuclear boundary annotations
PCam 2018 (Veeling et al., 2018) Breast Metastasis detection 3,27,680 patches Patch-level binary label
TNBC 2018 (Naylor et al., 2018) Breast Nuclei segmentation 50 4022 pixel-level annotated nuclei
BACH 2018 (Aresta et al., 2019) Breast Breast cancer classification 500 (400+100) Image-wise & pixel-level annotations
BreastPathQ 2018 (Akbar et al., 2019) Breast tumour cellularity 96 (69+25) WSIs 3,700 patch-level tumour cellularity score
Post-NAT-BRCA (Martel et al., 2019) Breast tumour cellularity 96 WSIs Nuclei, patch and patient level annotations
CoNSeP 2019 (Graham et al., 2019b) Colon Nuclei segmentation and classification 41 24,319 pixel-level annotated nuclei
CRAG 2019 (Graham et al., 2019a) Colon Gland segmentation 213 (173+40) Gland instance-level ground truth
LYON 2019 (Swiderska-Chadaj et al., 2019) Multi-Cancers Lymphocyte detection 83 WSIs 171,166 lymphocytes in 932 ROIs were annotated
TCGA (TCGA, ) Multi-Cancers Multiple —- —-
TCIA (TCIA, ) Multi-Cancers Multiple —- —-
NCT-CRC-HE-100k (Kather et al., 2019) Colon Tissue classification 1,00,000 patches(86 WSIs)+7,180 patches(25 WSIs) Patch-label for nine class tissue classification
Table 7: Summary of publicly available databases in computational histopathology.

5.1 Effect of deep learning architectures on task performance

In most applications, standard architectures (e.g., VGGNet (Simonyan and Zisserman, 2014), InceptionNet (Szegedy et al., 2015, 2016), ResNet (He et al., 2016), MobileNet (Howard et al., 2017), DenseNet (Huang et al., 2017)) can be directly employed, and custom networks should only be used if it is impossible to transform the inputs into a suitable format for the given architecture, or the transformation may cause significant information loss that may affect the task performance. For instance, if the scanner pixel scale does not match with the powers of two for nuclei segmentation, custom neural networks with varying image sizes (e.g., ) can be utilized (Saha et al., 2017)

. The most standard architectures are exhaustively tested by many, where their pitfalls, convergence behaviour and weaknesses are well documented in the literature. Unlike the previous, the custom network design choices such as the type of pooling - performed for spatial dimensionality reduction, sizes of the convolutional filters, inclusion of residual connections and/or any other blocks (e.g., Squeeze-and-Excite modules

(Hu et al., 2018b)

or inception modules

(Szegedy et al., 2016)

) are left for the researchers to explore and can be critical to the performance of the network. In general, it is recommended to use larger convolutional filters if the input size is large, skip connections in segmentation tasks, and batch normalization for faster convergence and to obtain better performance

(Van Eycke et al., 2018).

In a standard architecture, pre-trained networks are widely employed for improved performance. While pre-training is known to improve convergence speed significantly, it might not always lead to a better performance compared to a network trained from scratch, given enough time for convergence (Liu et al., 2017; He et al., 2019). In pre-trained networks, natural scene image databases (e.g., ImageNet) are commonly used, hence it is likely that the learned feature representations may not be accurate for histopathology images. Given a training dataset with few images, training only a few of the last decision layers (i.e., freezing the initial layers), using a nonlinear decision layer (i.e., composed of one or more hidden layers with nonlinear activations), using regularization techniques such as weight decay and dropout with ratio are recommended to avoid overfitting (Tabibu et al., 2019; Valkonen et al., 2019).

CNNs have been widely employed with significant improvements as opposed to their traditional counterparts in various natural scene image applications (Krizhevsky et al., 2012). Until very recently, traditional image processing techniques such as density-based models or handcrafted features were competitive against these network in digital histopathology (Sharma et al., 2017), however, as neural networks become more and more capable, state-of-the-art results in digital pathology have overwhelmingly come from CNN based methods. While we acknowledge the use of CNNs certainly seem to provide a noticeable increase in performance, but it is not entirely clear how much of this increase can be attributed to the most recent advancements in neural networks, as opposed to proper validation, data mining and processing practices, or the general familiarity of researchers with DNN. For instance, in magnetic resonance image analysis, Isensee et al. (2019) showed it is possible to train a brain tumour segmentation network with a slightly modified UNet with large patch sizes to achieve state-of-the-art results, where most of the competitive results use larger and newer networks with various modifications (e.g., using very small patch sizes, where the incurred loss is only taken from the center of the patch) to both the networks, as well as the loss functions.

As such comparative studies have yet to exist for digital histopathology, we examined various public histopathology challenges (Refer, Table 7) to assess the impact of the architecture, and found that in many tasks, the specific architecture was not a determining factor in the task objective outcome. For instance, in BACH challenge on breast histology images (Aresta et al., 2019), the winning entry won the 1 place (with a 14% margin compared to the next best entry) despite using significantly less data without any ensembling networks and a smaller contextual window (i.e., the patch size of the input image), while employing a hard example mining scheme, which made it possible for a network to learn from few examples to converge faster and to avoid overfitting. The winning entry from a MoNuSeg (multi-organ nucleus segmentation) challenge (Kumar et al., 2019) employed a UNet without any post-processing step (e.g., watershed transformation or morphological operations to separate nuclei), whereas, the participants using cascaded UNets, ResNets and feature pyramid networks (FPN) or DenseNets consistently scored lower. The winning entry for a TUPAC (tumour proliferation rate estimation challenge) used a hard negative mining technique with a modified ResNet with 6 or 9 residual blocks, and an SVM (support vector machine) for the decision (feature aggregation) layer for mitosis detection (Veta et al., 2019), beating architectures including GoogleNet, UNet and VGGNet. The winning entry for a CAMELYON16 challenge, including the detection of lymph node metastases (Ehteshami Bejnordi et al., 2017)

utilized an ensemble of two GoogleNet networks that has given superior results compared to pathologists with time constraints. In contrast, the same network architecture without any stain standardization or data augmentation achieved 10% and 7% lower in free-response receiver operator characteristic curve (FROC) and area under curve (AUC) metrics, respectively. Three out of five of the top results used GoogleNet architecture with 7 million parameters (22 layers), and the second best entry employed a ResNet with 44 million parameters (101 layers). Results from a subsequent CAMELYON17 challenge involving detection of cancer metastases in lymph nodes and lymph node status classification

(Bandi et al., 2018) suggest that using ensemble networks may help self-correct predictions by a suitable form of voting between ensemble networks. In this challenge, the top entry achieved around 2% better in quadratic weighted Cohen’s kappa metric (Cohen, 1960) over the second best entry. The top two entries both used ResNet-101 networks, where the top entry used an ensemble of three, and the second best participant used a single network with image resolution four times smaller than the first entry, with about four times the patch size ( versus pixels).

It is noteworthy that all of the “winning networks” for the various challenges described above were invented on or before the year 2015, whereas challenge dates vary from 2016 to 2019. While challenge scores are not necessarily indicative of the use case performance, as the challenge participants tend to heavily fine-tune their models to achieve the highest possible score, these results indirectly indicate simpler networks can still prevail, provided that appropriate training practices are applied for the specific problem at hand.

5.2 Challenges in histopathology image analysis

Standard DL architectures require their inputs (e.g., images) in a specific format with certain spatial dimensions. Furthermore, these architectures are generally designed for RGB images, whereas in digital histopathology, working with images in grayscale, HSV or HED color spaces may be desirable for a specific application. Converting images between color spaces, resizing images to fit into GPU memory, quantizing images from a higher bit representation into a lower one, deciding the best resolution for the application at hand and tiling, are some of the choices researchers need to make that will lead to varying degrees of information loss. A reasonable data processing strategy aims to achieve minimal information loss while utilizing architectures to their maximal capacity.

In most applications, it is inevitable that input images will need to be tiled or resized. Memory and computational constraints also make it necessary to find a balance between the required context and the magnification and, as CNNs learn more quickly from smaller images, one should not use images larger than the required context. The optimum trade-off between field of view (FOV) and resolution will depend on the application; for example classifying ductal carcinoma (DCIS) requires a large context to capture morphology, whilst for nuclei detection, it is common to use the highest possible power as the required context is as small as one nucleus. In some cases, both high resolution and large FOV are required, for example in cellularity assessment a high power is needed to differentiate between malignant and benign nuclei and a larger FOV is needed to provide the context (Akbar et al., 2019). A considerable amount of work has been done to combine low and high-resolution inputs in making better decisions in various forms and problems (Li et al., 2019b; Chang et al., 2017; Wang et al., 2019a; Li et al., 2018). However, it is still unclear that these methods are more effective in segmentation tasks compared to selecting a single “best fit” resolution (Seth et al., 2019).

Image pre- and post-processing can be used to boost the task performance of a DL model. In addition to standard preprocessing practices (e.g., resizing an input image and normalization, noise removal, morphological operations to smooth the segmentation masks), preprocessing can also be used to eliminate the need for computationally costly post-processing steps such as iterative refinement of boundaries of segmented regions using conditional random fields. For instance, Xu et al. (2016) used an FCN and generated an edge map from HED channel of colon histology images for gland instance segmentation, in addition to the original gland segmentation mask. This is done in order to explicitly learn the second channel of edge or boundary information that can be used to segment glands into multiple instances. This demonstrates that even though each gland mask consists of the boundary information (as its boundaries are part of the mask), an explicit learning task provided by a cost-free relabeling preprocessing step can improve the task performance. Furthermore, relabeling can enable training for different tasks by simply transforming the original data to accommodate the new task.

Finally, post-processing techniques can also be used to iteratively refine the model outputs to marginally improve the task objective. Methods based on CRFs are commonly employed to refine boundaries in segmentation tasks (e.g., nuclei) for better delineation of structure boundaries (Qu et al., 2019b). Post-processing can also be used for bootstrapping, where the trained model is used for selecting hard examples from an unseen test set in which the model underperforms. Then, the model is trained with a subset of original training data and the hard examples obtained from the post-processing step. This form of post-processing is especially useful in selecting a small subset of data from the majority class to prevent class imbalance, or balance the foreground and background samples (i.e., hard negative mining), and is applicable in many tasks including multiple instance learning or segmentation (Li et al., 2019c; Kwok, 2018).

5.3 Quality of training and validation data

The success of DL depends on the availability of high-quality training sets to achieve the desired predictive performance (Madabhushi and Lee, 2016; Bera et al., 2019; Niazi et al., 2019).

It is evident from this survey that a vast majority of methods are based on fully-supervised learning. Obtaining a well-curated data set is, however, often expensive and requires significant manual expertise to obtain clean and accurate annotations. There will always be variability between pathologists so ideally the inter-observer agreement should be quantified (Bulten et al., 2019b; Akbar et al., 2019; Seth et al., 2019) and if possible, a consensus between pathologists reached (Veta et al., 2015). Some attempts have been made to generate additional annotated data by using alternative techniques like data augmentation (Tellez et al., 2019a), image synthesis (Hou et al., 2019a) and crowdsourcing (Albarqouni et al., 2016), but it is not yet clear that they are appropriate for digital pathology. In some cases, it is possible to acquire additional information to provide definitive ground truth labels, for example, cytokeratin-stained slides were used to resolve diagnostic uncertainty in the CAMELYON16 challenge (Ehteshami Bejnordi et al., 2017). It is important for researchers to understand how labels are generated and to have some measure of label accuracy.

One way to increase model robustness and improve generalization ability is to include diversity in the training data such as images from multiple scan centres (Campanella et al., 2019), images containing heterogeneous tissue types (Hosseini et al., 2019) with variations in staining protocols (Bulten et al., 2019a). For instance, Campanella et al. (2019)

trained their DL model on an extensive training set containing more than 15,000 patients of various cancer types, obtained across 45 countries. The authors achieved an excellent performance of AUC greater than 0.98 for three histology task, which demonstrates the importance of a large diverse dataset on model performance. With an increase in the number of well-curated open-source datasets hosted by the Cancer Genome Atlas

(TCGA, ), the Cancer Imaging Archive (TCIA, ) and various Biomedical Grand Challenges (Refer, Table 7), it is increasingly possible to test methods on a standard benchmark dataset. There is, however, a need for more clinically relevant datasets which capture the complexity of real clinical tasks. The expansion of the breast cancer metastases dataset, CAMELYON16, to CAMELYON17 provides a good illustration of how much larger datasets are needed to assess an algorithm in a more meaningful clinical context (Litjens et al., 2018); in CAMELYON16 399 WSIs from 2 centres were made available but slides containing only isolated tumour cells were excluded and only slide level labels were provided; in CAMELYON17 an additional 1000 WSIs were added from 500 patients and five centres and the total dataset grew to 2.95 terabytes. Even this large dataset does not capture the scale of the clinical task where patients may have multiple WSIs from many more lymph nodes (the CAMELYON set excludes patients with dissected nodes), and it also excludes patients who have undergone neoadjuvant therapy which is known to adversely affect classification accuracy (Campanella et al., 2019).

As the number of clinical centres adopting a fully digital workflow increases, it is likely that the expectation will be that all digital pathology models should be trained and tested on large, clinically relevant datasets. Making such large datasets of WSIs and associated clinical data available publicly poses significant challenges and one way of addressing this may be to move away from the current approach of moving data to the model, and instead, to create mechanisms for researchers (and companies) to move the training and testing of models to the data. A recent example of this was the DREAM mammography challenge (DREAM, 2016), where only a small subset of data was released to allow developers to test software, and developers then had to submit docker containers to a central server to access the primary dataset for training and testing.

5.4 Model interpretability

In recent years, DL based methods have achieved near human-level performance in many different histology applications (Campanella et al., 2019; Noorbakhsh et al., 2019; Coudray et al., 2018). However, the main issue with DL models is that they are generally regarded as a “black box”, and lack sufficient interpretability in explaining human-like reasoning process while making predictions (Holzinger et al., 2017). Consequently, several explainable AI systems (Samek et al., 2017; Chen et al., 2019) have been developed in recent years, which attempt to gain deeper insights into the working of DL models in order to understand why a particular decision has been made. In histology, interpretability of DL models was addressed by using visual attention maps (Huang and Chung, 2019; BenTaieb and Hamarneh, 2018), saliency maps (Tellez et al., 2019b), heatmaps (Paschali et al., 2019) and image captioning (Zhang et al., 2019; Weng et al., 2019) techniques. These methods aim to highlight discriminative evidence locations in WSIs by providing pathologists with more clinically interpretable results. For instance, Zhang et al. (2019) presented a biologically inspired multimodal DL model capable of visualizing learned representations to produce rich interpretable predictions using network visual attention maps. Furthermore, their model also learns to generate diagnostic reports based on natural language descriptions of histologic findings in a way understandable to a pathologist. Such multimodal models trained on metadata (such as pathology images, clinical reports and genomic sequences) have great potential to offer reliable diagnosis, strong generalizability and objective second opinions, while simultaneously encouraging consensus in routine clinical histopathology practices.

One of the most overlooked issue with current DL models is the vulnerability to adversarial attacks (Papernot et al., 2017). Several recent studies (Finlayson et al., 2019; Jungo and Reyes, 2019; Ma et al., 2019)

have demonstrated that DL system can be compromised by carefully designed adversarial examples, i.e., even small imperceptible perturbations can deceive neural networks in predicting wrong outputs with high certainty. This behaviour has raised concerns in successful real-time integration of these DL systems in critical applications like face recognition

(Sharif et al., 2016), autonomous driving (Eykholt et al., 2017) and medical diagnosis (Ma et al., 2019). In the context of medical diagnosis, one promising solution is to estimate uncertainty maps for medical image segmentation (DeVries and Taylor, 2018; Jungo and Reyes, 2019), from which clinicians can reason with confidence about where and why the system is failing. Further, such uncertainty information can be used to alert the human expert to manually correct predicted results to avoid undesirable clinical outcomes. Naturally, the above challenges present an opportunity for researchers to devise novel and robust DL models that may apply to histopathology images.

5.5 Clinical translation

There has been a rapid growth in artificial intelligence (AI) research applied to medical imaging, and its potential impact has been demonstrated by applications which include detection of breast cancer metastasis in lymph nodes

(Steiner et al., 2018), interpreting chest X-rays (Nam et al., 2018), detecting brain tumours in MRI (Kamnitsas et al., 2016), detecting skin cancers (Esteva et al., 2017), diagnosing diseases in retinal images (Gulshan et al., 2016), and so on. Despite this impressive array of applications, the real and impactful deployment of AI in clinical practice still has a far way to go.

The main challenges and potential implications in transforming AI technologies from research to clinical use are as follows. First, the major bottleneck is the regulatory and privacy concern in getting ownership of the patient data such as images and personal health records (Bera et al., 2019; Kelly et al., 2019). This makes it challenging to train, develop and test safe AI solutions for clinical use. Furthermore, the comparison of DL algorithms in an objective manner is challenging due to variability in design methodologies, which are specifically targeted for a small group of populations. To make fair comparisons, the AI models need to be tested on the same independent test set, which represents the same target population with similar performance metrics. Second, most AI algorithms suffer from inapplicability outside of the training domain, algorithmic bias and can be easily fooled by adversarial attacks (Kelly et al., 2019) or by the inclusion of disease subtypes not considered during training. These issues can be partly addressed by developing “interpretable” AI systems (Liu et al., 2018; Rudin, 2019) which provide a reliable measure of model confidence and also generalization to different multi-cohort datasets. Finally, developing human-centred AI models that can meaningfully represent clinical knowledge and provide a clear explanation for model prediction to facilitate improved interactions with clinicians and machines is of paramount importance. If the above challenges are taken into consideration while designing AI solutions, then they are most likely to be transformational in routine patient health care system.

6 Conclusions

In this survey, we have presented a comprehensive overview of deep neural network models developed in the context of computational histopathology image analysis. The availability of large-scale whole-slide histology image databases and recent advancements in technology have triggered the development of complex deep learning models in computational pathology. From the survey of over 130 papers, we have identified that the automatic analysis of histopathology images has been tackled by different deep learning perspectives (e.g., supervised, weakly-supervised, unsupervised and transfer learning) for a wide variety of histology tasks (e.g., cell or nuclei segmentation, tissue classification, tumour detection, disease prediction and prognosis), and has been applied to multiple cancer types (e.g., breast, kidney, colon, lung). The categorization of methodological approaches presented in this survey acts as a reference guide to current techniques available in the literature for computational histopathology. We have also discussed the critical analysis of deep learning architectures on task performance, along with the importance of training data and model interpretability for successful clinical translation. Finally, we have outlined some open issues and future trends for the progress of this field.

Conflict of interest

We declare no conflict of interest.


This research is funded by: Canadian Cancer Society (grant number 705772), National Cancer Institute of the National Institutes of Health (grant number U24CA199374-01), Canadian Institutes of Health Research.


  • A. Agarwalla, M. Shaban, and N. M. Rajpoot (2017) Representation-aggregation networks for segmentation of multi-gigapixel histology images. arXiv preprint arXiv:1707.08814. Cited by: §3.1.1, Table 1.
  • S. Akbar and A. L. Martel (2018) Cluster-based learning from weakly labeled bags in digital pathology. arXiv preprint arXiv:1812.00884. Cited by: §3.2, Table 2.
  • S. Akbar, M. Peikari, S. Salama, A. Y. Panah, S. Nofech-Momes, and A. L. Martel (2019) Automated and manual quantification of tumour cellularity in digital slides for tumour burden assessment. Scientific Reports 9 (1), pp. 14099. Cited by: Table 4, §5.2, §5.3, Table 7.
  • S. Albarqouni, C. Baur, F. Achilles, V. Belagiannis, S. Demirci, and N. Navab (2016) Aggnet: deep learning from crowds for mitosis detection in breast cancer histology images. IEEE Transactions on Medical Imaging 35 (5), pp. 1313–1321. Cited by: §3.1.1, Table 1, §5.3.
  • G. Aresta, T. Araújo, S. Kwok, S. S. Chennamsetty, M. Safwan, V. Alex, B. Marami, M. Prastawa, M. Chan, M. Donovan, G. Fernandez, J. Zeineh, M. Kohl, C. Walz, F. Ludwig, S. Braunewell, M. Baust, Q. D. Vu, M. N. N. To, E. Kim, J. T. Kwak, S. Galal, V. Sanchez-Freire, N. Brancati, M. Frucci, D. Riccio, Y. Wang, L. Sun, K. Ma, J. Fang, I. Kone, L. Boulmane, A. Campilho, C. Eloy, A. Polónia, and P. Aguiar (2019) BACH: grand challenge on breast cancer histology images. Medical Image Analysis 56, pp. 122–139. Cited by: §5.1, Table 7.
  • T. Artieres et al. (2010) Neural conditional random fields. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 177–184. Cited by: §3.1.1.
  • E. Arvaniti, K. S. Fricker, M. Moret, N. Rupp, T. Hermanns, C. Fankhauser, N. Wey, P. J. Wild, J. H. Rueschoff, and M. Claassen (2018) Automated gleason grading of prostate cancer tissue microarrays via deep learning. Scientific Reports 8. Cited by: Table 4.
  • R. Awan, N. A. Koohbanani, M. Shaban, A. Lisowska, and N. Rajpoot (2018) Context-aware learning using transferable features for classification of breast cancer histology images. In International Conference Image Analysis and Recognition, pp. 788–795. Cited by: §3.1.1, Table 1.
  • P. Bandi, O. Geessink, Q. Manson, M. Van Dijk, M. Balkenhol, M. Hermsen, B. E. Bejnordi, B. Lee, K. Paeng, A. Zhong, et al. (2018) From detection of individual metastases to classification of lymph node status at the patient level: the CAMELYON17 challenge. IEEE Transactions on Medical Imaging 38 (2), pp. 550–560. Cited by: §5.1, Table 7.
  • B. E. Bejnordi, G. Litjens, N. Timofeeva, I. Otte-Höller, A. Homeyer, N. Karssemeijer, and J. A. van der Laak (2015) Stain specific standardization of whole-slide histopathological images. IEEE Transactions on Medical Imaging 35 (2), pp. 404–415. Cited by: §3.4.2, §3.4.2.
  • B. E. Bejnordi, M. Mullooly, R. M. Pfeiffer, S. Fan, P. M. Vacek, D. L. Weaver, S. Herschorn, L. A. Brinton, B. van Ginneken, N. Karssemeijer, et al. (2018) Using deep convolutional neural networks to identify and classify tumor-associated stroma in diagnostic breast biopsies. Modern Pathology 31 (10), pp. 1502. Cited by: §3.1.1, Table 1.
  • B. E. Bejnordi, G. Zuidhof, M. Balkenhol, M. Hermsen, P. Bult, B. van Ginneken, N. Karssemeijer, G. Litjens, and J. van der Laak (2017) Context-aware stacked convolutional neural networks for classification of breast carcinomas in whole-slide histopathology images. Journal of Medical Imaging 4 (4), pp. 044504. Cited by: §3.1.1, Table 1.
  • A. BenTaieb and G. Hamarneh (2016) Topology aware fully convolutional networks for histology gland segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 460–468. Cited by: §3.1.3, Table 1.
  • A. BenTaieb and G. Hamarneh (2017) Adversarial stain transfer for histopathology image analysis. IEEE Transactions on Medical Imaging 37 (3), pp. 792–802. Cited by: §3.4.2, Table 5.
  • A. BenTaieb and G. Hamarneh (2018) Predicting cancer with a recurrent visual attention model for histopathology images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 129–137. Cited by: §3.1.1, §3.1.1, Table 1, §5.4.
  • K. Bera, K. A. Schalper, D. L. Rimm, V. Velcheti, and A. Madabhushi (2019) Artificial intelligence in digital pathology—new tools for diagnosis and precision oncology. Nature Reviews Clinical Oncology 16 (11), pp. 703–715. Cited by: §5.3, §5.5.
  • J. Bokhorst, H. Pinckaers, P. van Zwam, I. Nagtegaal, J. van der Laak, and F. Ciompi (2019) Learning from sparsely annotated data for semantic segmentation in histopathology images. In Proceedings of the 2nd International Conference on Medical Imaging with Deep Learning, Vol. 102, pp. 84–91. Cited by: §3.2, Table 2.
  • N. Brieu, A. Meier, A. Kapil, R. Schoenmeyer, C. G. Gavriel, P. D. Caie, and G. Schmidt (2019) Domain adaptation-based augmentation for weakly supervised nuclei detection. arXiv preprint arXiv:1907.04681. Cited by: §3.4.1, Table 5.
  • W. Bulten, P. Bándi, J. Hoven, R. van de Loo, J. Lotz, N. Weiss, J. van der Laak, B. van Ginneken, C. Hulsbergen-van de Kaa, and G. Litjens (2019a) Epithelium segmentation using deep learning in H&E-stained prostate specimens with immunohistochemistry as reference standard. Scientific Reports 9 (1), pp. 864. Cited by: §3.1.3, Table 1, §5.3.
  • W. Bulten and G. Litjens (2018) Unsupervised prostate cancer detection on h&e using convolutional adversarial autoencoders. arXiv preprint arXiv:1804.07098. Cited by: Table 3.
  • W. Bulten, H. Pinckaers, H. van Boven, R. Vink, T. de Bel, B. van Ginneken, J. van der Laak, C. H. de Kaa, and G. Litjens (2019b) Automated gleason grading of prostate biopsies using deep learning. arXiv preprint arXiv:1907.07980. Cited by: §3.1.3, Table 1, §5.3.
  • D. Bychkov, N. Linder, R. Turkki, S. Nordling, P. E. Kovanen, C. Verrill, M. Walliander, M. Lundin, C. Haglund, and J. Lundin (2018) Deep learning based tissue analysis predicts outcome in colorectal cancer. Scientific Reports 8 (1), pp. 3395. Cited by: Table 4, Table 6, §4, §4.
  • G. Campanella, M. G. Hanna, L. Geneslaw, A. Miraflor, V. W. K. Silva, K. J. Busam, E. Brogi, V. E. Reuter, D. S. Klimstra, and T. J. Fuchs (2019) Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature Medicine 25 (8), pp. 1301–1309. Cited by: §3.2, §3.2, Table 2, §4, §5.3, §5.4.
  • G. Campanella, V. W. K. Silva, and T. J. Fuchs (2018) Terabyte-scale deep multiple instance learning for classification and localization in pathology. arXiv preprint arXiv:1805.06983. Cited by: Table 2.
  • H. Chang, J. Han, C. Zhong, A. M. Snijders, and J. Mao (2017) Unsupervised transfer learning via multi-scale convolutional sparse coding for biomedical applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (5), pp. 1182–1194. Cited by: §3.3, §5.2.
  • C. Chen, O. Li, D. Tao, A. Barnett, C. Rudin, and J. K. Su (2019) This looks like that: deep learning for interpretable image recognition. In Advances in Neural Information Processing Systems, pp. 8928–8939. Cited by: §5.4.
  • H. Chen, X. Qi, L. Yu, Q. Dou, J. Qin, and P. Heng (2017a) DCAN: deep contour-aware networks for object instance segmentation from histology images. Medical Image Analysis 36, pp. 135–146. Cited by: §3.1.3, §3.1.3, Table 1.
  • H. Chen, X. Wang, and P. A. Heng (2016a) Automated mitosis detection with deep regression networks. In 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI), pp. 1204–1207. Cited by: §3.1.2, Table 1.
  • L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille (2017b) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (4), pp. 834–848. Cited by: §3.1.1, §3.1.3.
  • T. Q. Chen, X. Li, R. B. Grosse, and D. K. Duvenaud (2018) Isolating sources of disentanglement in variational autoencoders. In Advances in Neural Information Processing Systems, pp. 2610–2620. Cited by: §3.3.
  • X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel (2016b) Infogan: interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems, pp. 2172–2180. Cited by: §3.3.
  • S. S. Chennamsetty, M. Safwan, and V. Alex (2018) Classification of breast cancer histology image using ensemble of pre-trained neural networks. In International Conference Image Analysis and Recognition, pp. 804–811. Cited by: Table 4.
  • V. Cheplygina, M. de Bruijne, and J. P.W. Pluim (2019) Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Medical Image Analysis 54, pp. 280–296. Cited by: §3.2, §3.2, §3.2, §3.4.
  • H. Cho, S. Lim, G. Choi, and H. Min (2017) Neural stain-style transfer learning using GAN for histopathological images. arXiv preprint arXiv:1710.08543. Cited by: §3.4.2, Table 5.
  • F. Ciompi, O. Geessink, B. E. Bejnordi, G. S. De Souza, A. Baidoshvili, G. Litjens, B. Van Ginneken, I. Nagtegaal, and J. Van Der Laak (2017) The importance of stain normalization in colorectal tissue classification with convolutional networks. In IEEE 14th International Symposium on Biomedical Imaging, pp. 160–163. Cited by: §3.4.2.
  • D. C. Cirecsan, A. Giusti, L. M. Gambardella, and J. Schmidhuber (2013) Mitosis detection in breast cancer histology images with deep neural networks. In International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 411–418. Cited by: §1, §3.1.1, §3.1.1, Table 1, Table 7.
  • D. Cirecsan, A. Giusti, L. M. Gambardella, and J. Schmidhuber (2012)

    Deep neural networks segment neuronal membranes in electron microscopy images

    In Advances in Neural Information Processing Systems, pp. 2843–2851. Cited by: §1.
  • J. Cohen (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20 (1), pp. 37–46. Cited by: §5.1.
  • N. Coudray, P. S. Ocampo, T. Sakellaropoulos, N. Narula, M. Snuderl, D. Fenyö, A. L. Moreira, N. Razavian, and A. Tsirigos (2018) Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nature Medicine 24 (10), pp. 1559. Cited by: Table 4, §5.4.
  • P. Courtiol, C. Maussion, M. Moarii, E. Pronier, S. Pilcer, M. Sefta, P. Manceron, S. Toldo, M. Zaslavskiy, N. Le Stang, et al. (2019) Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nature Medicine 25 (10), pp. 1519–1525. Cited by: Table 6, §4, §4.
  • H. D. Couture, L. A. Williams, J. Geradts, S. J. Nyante, E. N. Butler, J. Marron, C. M. Perou, M. A. Troester, and M. Niethammer (2018) Image analysis with deep learning to predict breast cancer grade, er status, histologic subtype, and intrinsic subtype. NPJ Breast Cancer 4 (1), pp. 30. Cited by: Table 6, §4, §4, §4.
  • A. Cruz-Roa, A. Basavanhally, F. González, H. Gilmore, M. Feldman, S. Ganesan, N. Shih, J. Tomaszewski, and A. Madabhushi (2014) Automatic detection of invasive ductal carcinoma in whole slide images with convolutional neural networks. In Medical Imaging 2014: Digital Pathology, Vol. 9041, pp. 904103. Cited by: §3.1.1, Table 1.
  • A. Cruz-Roa, H. Gilmore, A. Basavanhally, M. Feldman, S. Ganesan, N. N. Shih, J. Tomaszewski, F. A. González, and A. Madabhushi (2017) Accurate and reproducible invasive breast cancer detection in whole-slide images: a deep learning approach for quantifying tumor extent. Scientific Reports 7, pp. 46450. Cited by: §3.1.1, Table 1.
  • A. Cruz-Roa, H. Gilmore, A. Basavanhally, M. Feldman, S. Ganesan, N. Shih, J. Tomaszewski, A. Madabhushi, and F. González (2018) High-throughput adaptive sampling for whole-slide histopathology image analysis (HASHI) via convolutional neural networks: application to invasive breast cancer detection. PloS one 13 (5), pp. e0196828. Cited by: §3.1.1, Table 1.
  • J. Dai, K. He, and J. Sun (2016) Instance-aware semantic segmentation via multi-task network cascades. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 3150–3158. Cited by: §3.1.3.
  • T. de Bel, M. Hermsen, J. Kers, J. van der Laak, and G. Litjens (2019) Stain-transforming cycle-consistent generative adversarial networks for improved segmentation of renal histopathology. In International Conference on Medical Imaging with Deep Learning, pp. 151–163. Cited by: §3.3, §3.4.2, Table 3, Table 5.
  • T. de Bel, M. Hermsen, B. Smeets, L. Hilbrands, J. van der Laak, and G. Litjens (2018) Automatic segmentation of histopathological slides of renal tissue using deep learning. In Medical Imaging 2018: Digital Pathology, Vol. 10581, pp. 1058112. Cited by: §3.1.3, Table 1.
  • T. DeVries and G. W. Taylor (2018) Leveraging uncertainty estimates for predicting segmentation quality. arXiv preprint arXiv:1807.00502. Cited by: §5.4.
  • T. G. Dietterich, R. H. Lathrop, and T. Lozano-Pérez (1997) Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89 (1-2), pp. 31–71. Cited by: §3.2.
  • H. Ding, Z. Pan, Q. Cen, Y. Li, and S. Chen (2019) Multi-scale fully convolutional network for gland segmentation using three-class classification. Neurocomputing. Cited by: §3.1.3, Table 1.
  • D. Dov, S. Z. Kovalsky, J. Cohen, D. E. Range, R. Henao, and L. Carin (2019) A deep-learning algorithm for thyroid malignancy prediction from whole slide cytopathology images. arXiv preprint arXiv:1904.12739. Cited by: §3.2, Table 2.
  • DREAM (2016) The Digital Mammography DREAM Challenge. Note: Cited by: §5.3.
  • B. Ehteshami Bejnordi, M. Veta, P. Johannes van Diest, B. van Ginneken, N. Karssemeijer, G. Litjens, J. A. W. M. van der Laak, and the CAMELYON16 Consortium (2017) Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. JAMA 318 (22), pp. 2199–2210. Cited by: §5.1, §5.3, Table 7.
  • J. Epstein, W. J. Allsbrook, M. Amin, and L. Egevad (2005) The 2005 international society of urological pathology (ISUP) consensus conference on gleason grading of prostatic carcinoma.. Am J Surg Pathol 29 (9), pp. 1228–1242. Cited by: §1.
  • M. G. Ertosun and D. L. Rubin (2015) Automated grading of gliomas using deep learning in digital pathology images: a modular approach with ensemble of convolutional neural networks. In AMIA Annual Symposium Proceedings, Vol. 2015, pp. 1899. Cited by: §3.1.1, Table 1.
  • A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542 (7639), pp. 115. Cited by: §5.5.
  • K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, A. Prakash, T. Kohno, and D. Song (2017) Robust physical-world attacks on deep learning models. arXiv preprint arXiv:1707.08945. Cited by: §5.4.
  • S. G. Finlayson, J. D. Bowers, J. Ito, J. L. Zittrain, A. L. Beam, and I. S. Kohane (2019) Adversarial attacks on medical machine learning. Science 363 (6433), pp. 1287–1289. Cited by: §5.4.
  • M. Gadermayr, L. Gupta, V. Appel, P. Boor, B. M. Klinkhammer, and D. Merhof (2019a) Generative adversarial networks for facilitating stain-independent supervised & unsupervised segmentation: a study on kidney histology. IEEE Transactions on Medical Imaging. Cited by: §3.3, §3.4.1, Table 3, Table 5.
  • M. Gadermayr, L. Gupta, B. M. Klinkhammer, P. Boor, and D. Merhof (2019b) Unsupervisedly training GANs for segmenting digital pathology with automatically generated annotations. In Proceedings of The 2nd International Conference on Medical Imaging with Deep Learning, Vol. 102, pp. 175–184. Cited by: Table 3.
  • Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky (2016) Domain-adversarial training of neural networks. The Journal of Machine Learning Research 17 (1), pp. 2096–2030. Cited by: §3.3, §3.4.1, §3.4.1.
  • Z. Gao, L. Wang, L. Zhou, and J. Zhang (2017) HEp-2 cell image classification with deep convolutional neural networks. IEEE Journal of Biomedical and Health Informatics 21 (2), pp. 416–428. Cited by: §3.1.1, Table 1.
  • B. Gecer, S. Aksoy, E. Mercan, L. G. Shapiro, D. L. Weaver, and J. G. Elmore (2018) Detection and classification of cancer in whole slide breast histopathology images using deep convolutional networks. Pattern Recognition 84, pp. 345–356. Cited by: §3.1.3, Table 1.
  • O. G. Geessink, A. Baidoshvili, J. M. Klaase, B. E. Bejnordi, G. J. Litjens, G. W. van Pelt, W. E. Mesker, I. D. Nagtegaal, F. Ciompi, and J. A. van der Laak (2019) Computer aided quantification of intratumoral stroma yields an independent prognosticator in rectal cancer. Cellular Oncology 42 (3), pp. 331–341. Cited by: Table 6, §4.
  • S. Gidaris, P. Singh, and N. Komodakis (2018) Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728. Cited by: §3.3.
  • R. Girshick (2015) Fast R-CNN. In The IEEE International Conference on Computer Vision (ICCV), Cited by: §3.1.3, §3.1.
  • I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. MIT press. Cited by: §2.
  • S. Graham, H. Chen, J. Gamper, Q. Dou, P. Heng, D. Snead, Y. W. Tsang, and N. Rajpoot (2019a) MILD-Net: minimal information loss dilated network for gland instance segmentation in colon histology images. Medical Image Analysis 52, pp. 199–211. Cited by: §3.1.3, Table 1, Table 7.
  • S. Graham, Q. D. Vu, S. E. A. Raza, A. Azam, Y. W. Tsang, J. T. Kwak, and N. Rajpoot (2019b) Hover-net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Medical Image Analysis 58, pp. 101563. Cited by: §3.1.2, §3.1.2, Table 1, Table 7.
  • F. Gu, N. Burlutskiy, M. Andersson, and L. K. Wilén (2018) Multi-resolution networks for semantic segmentation in whole slide images. In Computational Pathology and Ophthalmic Medical Image Analysis, pp. 11–18. Cited by: §3.1.3, Table 1.
  • V. Gulshan, L. Peng, M. Coram, M. C. Stumpe, D. Wu, A. Narayanaswamy, S. Venugopalan, K. Widner, T. Madams, J. Cuadros, et al. (2016) Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316 (22), pp. 2402–2410. Cited by: §5.5.
  • Z. Guo, H. Liu, H. Ni, X. Wang, M. Su, W. Guo, K. Wang, T. Jiang, and Y. Qian (2019) A fast and refined cancer regions segmentation framework in whole-slide breast pathological images. Scientific Reports 9 (1), pp. 882. Cited by: §3.1.3, Table 1.
  • M. Halicek, M. Shahedi, J. V. Little, A. Y. Chen, L. L. Myers, B. D. Sumer, and B. Fei (2019) Head and neck cancer detection in digitized whole-slide histology using convolutional neural networks. Scientific Reports 9 (1), pp. 1–11. Cited by: §3.1.1, Table 1.
  • Z. Han, B. Wei, Y. Zheng, Y. Yin, K. Li, and S. Li (2017) Breast cancer multi-classification from histopathological images with structured deep learning model. Scientific Reports 7 (1), pp. 4172. Cited by: Table 4.
  • B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik (2014) Simultaneous detection and segmentation. In European Conference on Computer Vision, pp. 297–312. Cited by: §3.1.3.
  • K. He, R. Girshick, and P. Dollár (2019) Rethinking Imagenet pre-training. pp. 4918–4927. Cited by: §5.1.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §3.1.1, §3.1, §3.4, §5.1.
  • I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. M. Botvinick, S. Mohamed, and A. Lerchner (2017) Beta-VAE: learning basic visual concepts with a constrained variational framework. In ICLR, Cited by: §3.3.
  • D. J. Ho, D. V. Yarlagadda, T. M. D’Alfonso, M. G. Hanna, A. Grabenstetter, P. Ntiamoah, E. Brogi, L. K. Tan, and T. J. Fuchs (2019) Deep multi-magnification networks for multi-class breast cancer image segmentation. arXiv preprint arXiv:1910.13042. Cited by: §3.1.3, Table 1.
  • A. Holzinger, C. Biemann, C. S. Pattichis, and D. B. Kell (2017) What do we need to build explainable ai systems for the medical domain?. arXiv preprint arXiv:1712.09923. Cited by: §5.4.
  • M. S. Hosseini, L. Chan, G. Tse, M. Tang, J. Deng, S. Norouzi, C. Rowsell, K. N. Plataniotis, and S. Damaskinos (2019) Atlas of digital pathology: a generalized hierarchical histological tissue type-annotated database for deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11747–11756. Cited by: §5.3.
  • L. Hou, A. Agarwal, D. Samaras, T. M. Kurc, R. R. Gupta, and J. H. Saltz (2019a) Robust histopathology image analysis: to label or to synthesize?. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8533–8542. Cited by: §5.3.
  • L. Hou, V. Nguyen, A. B. Kanevsky, D. Samaras, T. M. Kurc, T. Zhao, R. R. Gupta, Y. Gao, W. Chen, D. Foran, et al. (2019b) Sparse autoencoder for unsupervised nucleus detection and representation in histopathology images. Pattern Recognition 86, pp. 188–200. Cited by: §3.3, Table 3.
  • L. Hou, D. Samaras, T. M. Kurc, Y. Gao, J. E. Davis, and J. H. Saltz (2015) Efficient multiple instance convolutional neural networks for gigapixel resolution image classification. arXiv preprint arXiv:1504.07947, pp. 7. Cited by: §3.2, Table 2.
  • A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. Cited by: §3.1.1, §3.4, §5.1.
  • B. Hu, Y. Tang, I. Eric, C. Chang, Y. Fan, M. Lai, and Y. Xu (2018a) Unsupervised learning for cell-level visual representation in histopathology images with generative adversarial networks. IEEE Journal of Biomedical and Health Informatics 23 (3), pp. 1316–1328. Cited by: §3.3, Table 3.
  • J. Hu, L. Shen, and G. Sun (2018b) Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. Cited by: §5.1.
  • G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger (2017) Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708. Cited by: §3.4, §5.1.
  • Y. Huang and A. Chung (2019) CELNet: evidence localization for pathology images using weakly supervised learning. arXiv preprint arXiv:1909.07097. Cited by: §3.2, §3.2, Table 2, §5.4.
  • M. Ilse, J. Tomczak, and M. Welling (2018) Attention-based deep multiple instance learning. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80, pp. 2127–2136. Cited by: §3.2, Table 2.
  • F. Isensee, P. Kickingereder, W. Wick, M. Bendszus, and K. H. Maier-Hein (2019) No new-net. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, pp. 234–244. Cited by: §5.1.
  • A. Janowczyk, A. Basavanhally, and A. Madabhushi (2017) Stain normalization using sparse autoencoders (stanosa): application to digital pathology. Computerized Medical Imaging and Graphics 57, pp. 50–61. Cited by: §3.4.2, Table 5.
  • Z. Jia, X. Huang, I. Eric, C. Chang, and Y. Xu (2017) Constrained deep weak supervision for histopathology image segmentation. IEEE Transactions on Medical Imaging 36 (11), pp. 2376–2388. Cited by: §3.2, §3.2, Table 2.
  • J. Johnson, A. Alahi, and L. Fei-Fei (2016)

    Perceptual losses for real-time style transfer and super-resolution

    In European Conference on Computer Vision, pp. 694–711. Cited by: §3.1.3.
  • A. Jungo and M. Reyes (2019) Assessing reliability and challenges of uncertainty estimations for medical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 48–56. Cited by: §5.4.
  • K. Kamnitsas, E. Ferrante, S. Parisot, C. Ledig, A. V. Nori, A. Criminisi, D. Rueckert, and B. Glocker (2016) DeepMedic for brain tumor segmentation. In International workshop on Brainlesion: Glioma, multiple sclerosis, stroke and traumatic brain injuries, pp. 138–149. Cited by: §5.5.
  • M. Kandemir and F. A. Hamprecht (2015) Computer-aided diagnosis from weak supervision: a benchmarking study. Computerized Medical Imaging and Graphics 42, pp. 44–50. Cited by: §3.2, §3.2.
  • A. Kapil, T. Wiestler, S. Lanzmich, A. Silva, K. Steele, M. Rebelatto, G. Schmidt, and N. Brieu (2019) DASGAN - joint domain adaptation and segmentation for the analysis of epithelial regions in histopathology PD-L1 images. CoRR abs/1906.11118. Cited by: Table 5.
  • M. N. Kashif, S. E. A. Raza, K. Sirinukunwattana, M. Arif, and N. Rajpoot (2016) Handcrafted features with convolutional neural networks for detection of tumor cells in histology images. In 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI), pp. 1029–1032. Cited by: §3.1.1, Table 1.
  • J. N. Kather, J. Krisam, P. Charoentong, T. Luedde, E. Herpel, C. Weis, T. Gaiser, A. Marx, N. A. Valous, D. Ferber, et al. (2019) Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Medicine 16 (1), pp. e1002730. Cited by: Table 4, Table 6, §4, Table 7.
  • J. L. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, and Y. Kluger (2018) DeepSurv: personalized treatment recommender system using a cox proportional hazards deep neural network. BMC Medical Research Methodology 18 (24). Cited by: §4.
  • C. J. Kelly, A. Karthikesalingam, M. Suleyman, G. Corrado, and D. King (2019) Key challenges for delivering clinical impact with artificial intelligence. BMC Medicine 17 (1), pp. 195. Cited by: §5.5.
  • A. M. Khan, N. Rajpoot, D. Treanor, and D. Magee (2014) A nonlinear mapping approach to stain normalization in digital histopathology images using image-specific color deconvolution. IEEE Transactions on Biomedical Engineering 61 (6), pp. 1729–1738. Cited by: §3.4.2.
  • D. P. Kingma and M. Welling (2013) Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: §3.3.
  • B. Kong, X. Wang, Z. Li, Q. Song, and S. Zhang (2017) Cancer metastasis detection via spatially structured deep network. In International Conference on Information Processing in Medical Imaging, pp. 236–248. Cited by: §3.1.1, Table 1.
  • J. Krause, J. Johnson, R. Krishna, and L. Fei-Fei (2017) A hierarchical approach for generating descriptive image paragraphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 317–325. Cited by: §3.1.1.
  • A. Krizhevsky et al. (2009) Learning multiple layers of features from tiny images. Technical report Citeseer. Cited by: §3.4.1.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1097–1105. Cited by: §1, §5.1.
  • N. Kumar, A. Sethi, and and Others (2019) A multi-organ nucleus segmentation challenge. IEEE Transactions on Medical Imaging, pp. 1–1. Cited by: §3.1.3, §5.1, Table 7.
  • S. Kwok (2018) Multiclass classification of breast cancer in whole-slide images. In International Conference Image Analysis and Recognition, pp. 931–940. Cited by: Table 4, §5.2.
  • M. W. Lafarge, J. P. Pluim, K. A. Eppenhof, P. Moeskops, and M. Veta (2017) Domain-adversarial neural networks to address the appearance variability of histopathology images. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 83–91. Cited by: §3.4.1, Table 5.
  • A. Lahiani, J. Gildenblat, I. Klaman, S. Albarqouni, N. Navab, and E. Klaiman (2019) Virtualization of tissue staining in digital pathology using an unsupervised deep learning approach. In European Congress on Digital Pathology, pp. 47–55. Cited by: §3.4.1, Table 5.
  • Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. Nature 521 (7553), pp. 436–444. Cited by: §2.
  • B. Lee and K. Paeng (2018) A robust and effective approach towards accurate metastasis detection and pn-stage classification in breast cancer. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 841–850. Cited by: Table 4.
  • C. H. Lee and H. Yoon (2017) Medical big data: promise and challenges. Kidney Research and Clinical Practice 36 (1), pp. 3. Cited by: §3.3.
  • C. Li, X. Wang, W. Liu, L. J. Latecki, B. Wang, and J. Huang (2019a) Weakly supervised mitosis detection in breast histopathology images using concentric loss. Medical Image Analysis 53, pp. 165–178. Cited by: §3.2, Table 2.
  • J. Li, W. Li, A. Gertych, B. S. Knudsen, W. Speier, and C. W. Arnold (2019b) An attention-based multi-resolution model for prostate whole slide image classification and localization. arXiv preprint arXiv:1905.13208. Cited by: §5.2.
  • J. Li, W. Speier, K. C. Ho, K. V. Sarma, A. Gertych, B. S. Knudsen, and C. W. Arnold (2018) An EM-based semi-supervised deep learning approach for semantic segmentation of histopathological images from radical prostatectomies. Computerized Medical Imaging and Graphics 69, pp. 125–133. Cited by: §5.2.
  • M. Li, L. Wu, A. Wiliem, K. Zhao, T. Zhang, and B. C. Lovell (2019c) Deep instance-level hard negative mining model for histopathology images. arXiv preprint arXiv:1906.09681. Cited by: §5.2.
  • Y. Li and W. Ping (2018) Cancer metastasis detection with neural conditional random field. Cited by: §3.1.1, §3.4.2, Table 1.
  • Y. Li, H. Qi, J. Dai, X. Ji, and Y. Wei (2017) Fully convolutional instance-aware semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2359–2367. Cited by: §3.1.3.
  • Q. Liang, Y. Nan, G. Coppola, K. Zou, W. Sun, D. Zhang, Y. Wang, and G. Yu (2018) Weakly supervised biomedical image segmentation by reiterative learning. IEEE Journal of Biomedical and Health Informatics 23 (3), pp. 1205–1214. Cited by: §3.2, Table 2.
  • H. Lin, H. Chen, Q. Dou, L. Wang, J. Qin, and P. Heng (2018) Scannet: a fast and dense scanning framework for metastastic breast cancer detection from whole-slide image. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 539–546. Cited by: §3.1.3.
  • H. Lin, H. Chen, S. Graham, Q. Dou, N. Rajpoot, and P. Heng (2019) Fast Scannet: fast and dense analysis of multi-gigapixel whole-slide images for cancer metastasis detection. IEEE Transactions on Medical Imaging 38 (8), pp. 1948–1958. Cited by: §3.1.3, §3.1.3, Table 1.
  • T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie (2017) Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125. Cited by: §3.1.3, §3.1.3.
  • G. Litjens, P. Bandi, B. E. Bejnordi, O. Geessink, M. Balkenhol, P. Bult, A. Halilovic, M. Hermsen, R. van de Loo, R. Vogels, Q. F. Manson, N. Stathonikos, A. Baidoshvili, P. van Diest, C. Wauters, M. van Dijk, and J. van der Laak (2018) 1399 h&e-stained sentinel lymph node sections of breast cancer patients: the camelyon dataset. GigaScience 7, pp. 1–8. Cited by: §5.3.
  • G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A.W.M. van der Laak, B. van Ginneken, and C. I. Sánchez (2017) A survey on deep learning in medical image analysis. Medical Image Analysis 42, pp. 60–88. Cited by: §2, §3.4.
  • G. Litjens, C. I. Sánchez, N. Timofeeva, M. Hermsen, I. Nagtegaal, I. Kovacs, C. Hulsbergen-Van De Kaa, P. Bult, B. Van Ginneken, and J. Van Der Laak (2016) Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Scientific Reports 6, pp. 26286. Cited by: §3.1.1, Table 1.
  • J. Liu, B. Xu, C. Zheng, Y. Gong, J. Garibaldi, D. Soria, A. Green, I. O. Ellis, W. Zou, and G. Qiu (2019) An end-to-end deep learning histochemical scoring system for breast cancer TMA. IEEE Transactions on Medical Imaging 38 (2), pp. 617–628. Cited by: §3.1.3, §3.1, Table 1.
  • X. Liu, T. Xia, J. Wang, Y. Yang, F. Zhou, and Y. Lin (2016) Fully convolutional attention networks for fine-grained recognition. arXiv preprint arXiv:1603.06765. Cited by: §3.1.1.
  • Y. Liu, K. Gadepalli, M. Norouzi, G. E. Dahl, T. Kohlberger, A. Boyko, S. Venugopalan, A. Timofeev, P. Q. Nelson, G. S. Corrado, et al. (2017) Detecting cancer metastases on gigapixel pathology images. arXiv preprint arXiv:1703.02442. Cited by: §3.4.2, Table 4, §5.1.
  • Y. Liu, T. Kohlberger, M. Norouzi, G. E. Dahl, J. L. Smith, A. Mohtashamian, N. Olson, L. H. Peng, J. D. Hipp, and M. C. Stumpe (2018) Artificial intelligence–based breast cancer nodal metastasis detection: insights into the black box for pathologists. Archives of Pathology & Laboratory Medicine. Cited by: §5.5.
  • J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §3.1.2, §3.1.3, §3.1.
  • X. Ma, Y. Niu, L. Gu, Y. Wang, Y. Zhao, J. Bailey, and F. Lu (2019) Understanding adversarial attacks on deep learning based medical image analysis systems. arXiv preprint arXiv:1907.10456. Cited by: §5.4.
  • M. Macenko, M. Niethammer, J. S. Marron, D. Borland, J. T. Woosley, X. Guan, C. Schmitt, and N. E. Thomas (2009) A method for normalizing histology slides for quantitative analysis. In 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 1107–1110. Cited by: §3.4.2, §3.4.2.
  • A. Madabhushi and G. Lee (2016) Image analysis and machine learning in digital pathology: challenges and opportunities. Medical Image Analysis 33, pp. 170–175. Cited by: §5.3.
  • A. L. Martel, S. Nofech-Mozes, S. Salama, S. Akbar, and M. Peikari (2019) Assessment of residual breast cancer cellularity after neoadjuvant chemotherapy using digital pathology [data set].. Note: Cited by: Table 7.
  • V. Mnih, N. Heess, A. Graves, et al. (2014) Recurrent models of visual attention. In Advances in Neural Information Processing Systems, pp. 2204–2212. Cited by: §3.1.1.
  • P. Mobadersany, S. Yousefi, M. Amgad, D. A. Gutman, J. S. Barnholtz-Sloan, J. E. V. Vega, D. J. Brat, and L. A. Cooper (2018) Predicting cancer outcomes from histology and genomics using convolutional networks. Proceedings of the National Academy of Sciences 115 (13), pp. E2970–E2979. Cited by: Table 6, §4, §4, §4.
  • H. Muhammad, C. S. Sigel, G. Campanella, T. Boerner, L. M. Pak, S. Büttner, J. N. IJzermans, B. G. Koerkamp, M. Doukas, W. R. Jarnagin, et al. (2019) Unsupervised subtyping of cholangiocarcinoma using a deep clustering convolutional autoencoder. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 604–612. Cited by: Table 6, §4, §4.
  • S. Mukhopadhyay, M. D. Feldman, E. Abels, R. Ashfaq, S. Beltaifa, N. G. Cacciabeve, H. P. Cathro, L. Cheng, K. Cooper, G. E. Dickey, et al. (2018) Whole slide imaging versus microscopy for primary diagnosis in surgical pathology: a multicenter blinded randomized noninferiority study of 1992 cases (pivotal study). The American Journal of Surgical Pathology 42 (1), pp. 39. Cited by: §1.
  • K. Nagpal, D. Foote, Y. Liu, P. C. Chen, E. Wulczyn, F. Tan, N. Olson, J. L. Smith, A. Mohtashamian, J. H. Wren, et al. (2019) Development and validation of a deep learning algorithm for improving gleason scoring of prostate cancer. npj Digital Medicine 2 (1), pp. 48. Cited by: §3.1.1, Table 1, Table 6, §4, §4, §4.
  • J. G. Nam, S. Park, E. J. Hwang, J. H. Lee, K. Jin, K. Y. Lim, T. H. Vu, J. H. Sohn, S. Hwang, J. M. Goo, et al. (2018) Development and validation of deep learning–based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology 290 (1), pp. 218–228. Cited by: §5.5.
  • P. Naylor, M. Laé, F. Reyal, and T. Walter (2018) Segmentation of nuclei in histopathology images by deep regression of the distance map. IEEE Transactions on Medical Imaging 38 (2), pp. 448–459. Cited by: §3.1.2, §3.1.2, Table 1, Table 7.
  • M. K. K. Niazi, A. V. Parwani, and M. N. Gurcan (2019) Digital pathology and artificial intelligence. The Lancet Oncology 20 (5), pp. e253–e261. Cited by: §5.3.
  • J. Noorbakhsh, S. Farahmand, M. Soltanieh-ha, S. Namburi, K. Zarringhalam, and J. Chuang (2019) Pan-cancer classifications of tumor histological images using deep learning. BioRxiv, pp. 715656. Cited by: Table 4, §5.4.
  • M. Noroozi and P. Favaro (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In European Conference on Computer Vision, pp. 69–84. Cited by: §3.3.
  • A. Odena, C. Olah, and J. Shlens (2017) Conditional image synthesis with auxiliary classifier gans. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2642–2651. Cited by: §3.4.2.
  • N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami (2017) Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp. 506–519. Cited by: §5.4.
  • M. Paschali, M. F. Naeem, W. Simson, K. Steiger, M. Mollenhauer, and N. Navab (2019) Deep learning under the microscope: improving the interpretability of medical imaging neural networks. arXiv preprint arXiv:1904.03127. Cited by: §5.4.
  • J. Peng, L. Bo, and J. Xu (2009) Conditional neural fields. In Advances in Neural Information Processing Systems, pp. 1419–1427. Cited by: §3.1.1.
  • H. Pinckaers and G. Litjens (2019) Neural ordinary differential equations for semantic segmentation of individual colon glands. arXiv preprint arXiv:1910.10470. Cited by: Table 1.
  • J. M. Prewitt and M. L. Mendelsohn (1966) The analysis of cell images. Annals of the New York Academy of Sciences 128 (3), pp. 1035–1053. Cited by: §1.
  • T. Qaiser and N. M. Rajpoot (2019) Learning where to see: a novel attention model for automated immunohistochemical scoring. IEEE Transactions on Medical Imaging, pp. 1–1. Cited by: §3.1.1, §3.1.1, §3.1.1.
  • T. Qaiser, A. Mukherjee, C. Reddy Pb, S. D. Munugoti, V. Tallam, T. Pitkäaho, T. Lehtimäki, T. Naughton, M. Berseth, A. Pedraza, et al. (2018) Her-2 challenge contest: a detailed assessment of automated her 2 scoring algorithms in whole slide images of breast cancer tissues. Histopathology 72 (2), pp. 227–238. Cited by: §3.1.1, Table 7.
  • T. Qaiser, M. Pugh, S. Margielewska, R. Hollows, P. Murray, and N. Rajpoot (2019a) Digital tumor-collagen proximity signature predicts survival in diffuse large B-cell lymphoma. In European Congress on Digital Pathology, pp. 163–171. Cited by: Table 6.
  • T. Qaiser, Y. Tsang, D. Taniyama, N. Sakamoto, K. Nakane, D. Epstein, and N. Rajpoot (2019b) Fast and accurate tumor segmentation of histology images using persistent homology and deep convolutional features. Medical Image Analysis 55, pp. 1–14. Cited by: §3.1.1, Table 1.
  • H. Qu, G. Riedlinger, P. Wu, Q. Huang, J. Yi, S. De, and D. Metaxas (2019a) Joint segmentation and fine-grained classification of nuclei in histopathology images. In 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI), pp. 900–904. Cited by: §3.1.3, Table 1.
  • H. Qu, P. Wu, Q. Huang, J. Yi, G. M. Riedlinger, S. De, and D. N. Metaxas (2019b) Weakly supervised deep nuclei segmentation using points annotation in histopathology images. In International Conference on Medical Imaging with Deep Learning, pp. 390–400. Cited by: §3.2, §3.2, Table 2, §5.2.
  • G. Quellec, G. Cazuguel, B. Cochener, and M. Lamard (2017) Multiple-instance learning for medical image and video analysis. IEEE Reviews in Biomedical Engineering 10, pp. 213–234. Cited by: §3.2.
  • A. C. Quiros, R. Murray-Smith, and K. Yuan (2019) Pathology gan: learning deep representations of cancer tissue. ArXiv abs/1907.02644. Cited by: Table 3.
  • E. Rakha, M. El-Sayed, A. Lee, C. Elston, M. Grainge, Z. Hodi, R. Blamey, and I. Ellis (2008) Prognostic significance of nottingham histologic grade in invasive breast carcinoma.. J Clin Oncol 26 (19), pp. 3153–3158. Cited by: §1.
  • M. Ranzato (2014) On learning where to look. arXiv preprint arXiv:1405.5488. Cited by: §3.1.1.
  • E. Reinhard, M. Adhikhmin, B. Gooch, and P. Shirley (2001) Color transfer between images. IEEE Computer Graphics and Applications 21 (5), pp. 34–41. Cited by: §3.4.2.
  • J. Ren, I. Hacihaliloglu, E. A. Singer, D. J. Foran, and X. Qi (2018) Adversarial domain adaptation for classification of prostate histopathology whole-slide images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 201–209. Cited by: §3.4.1, Table 5.
  • Y. Rivenson, T. Liu, Z. Wei, Y. Zhang, K. de Haan, and A. Ozcan (2019) PhaseStain: the digital staining of label-free quantitative phase microscopy images using deep learning. Light: Science & Applications 8 (1), pp. 23. Cited by: §3.4.2, Table 5.
  • D. Romo-Bucheli, A. Janowczyk, H. Gilmore, E. Romero, and A. Madabhushi (2016) Automated tubule nuclei quantification and correlation with oncotype DX risk categories in ER+ breast cancer whole slide images. Scientific Reports 6, pp. 32706. Cited by: §3.1.1, Table 1.
  • O. Ronneberger, P. Fischer, and T. Brox (2015) UNet: convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Cited by: §3.1.3, §3.1.3, §3.1.
  • J. Rony, S. Belharbi, J. Dolz, I. Ben Ayed, L. McCaffrey, and E. Granger (2019) Deep weakly-supervised learning methods for classification and localization in histology images: a survey. arXiv preprint arXiv:1909.03354. Cited by: §3.2, §3.2.
  • C. Rudin (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1 (5), pp. 206–215. Cited by: §5.5.
  • O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. (2015) Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115 (3), pp. 211–252. Cited by: §3.1.1, §3.4.
  • S. Sabour, N. Frosst, and G. E. Hinton (2017) Dynamic routing between capsules. In Advances in Neural Information Processing Systems, pp. 3856–3866. Cited by: §3.1.1.
  • M. Saha, C. Chakraborty, I. Arun, R. Ahmed, and S. Chatterjee (2017) An advanced deep learning approach for Ki-67 stained hotspot detection and proliferation rate scoring for prognostic evaluation of breast cancer. Scientific Reports 7 (1), pp. 3213. Cited by: §5.1.
  • W. Samek, T. Wiegand, and K. Müller (2017) Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296. Cited by: §5.4.
  • C. T. Sari and C. Gunduz-Demir (2019) Unsupervised feature extraction via deep learning for histopathological classification of colon tissue images. IEEE Transactions on Medical Imaging 38, pp. 1139–1149. Cited by: Table 3.
  • N. Seth, S. Akbar, S. Nofech-Mozes, S. Salama, and A. L. Martel (2019) Automated segmentation of DCIS in whole slide images. In European Congress on Digital Pathology ECDP 2019, pp. 67–74. Cited by: §3.1.3, Table 1, §5.2, §5.3.
  • M. T. Shaban, C. Baur, N. Navab, and S. Albarqouni (2019a) Staingan: stain style transfer for digital histological images. In 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pp. 953–956. Cited by: §3.4.2, §3.4.2, Table 5.
  • M. Shaban, S. A. Khurram, M. M. Fraz, N. Alsubaie, I. Masood, S. Mushtaq, M. Hassan, A. Loya, and N. M. Rajpoot (2019b) A novel digital score for abundance of tumour infiltrating lymphocytes predicts disease free survival in oral squamous cell carcinoma. Scientific Reports 9 (1), pp. 1–13. Cited by: §3.1.1, Table 1.
  • M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter (2016) Accessorize to a crime: real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 1528–1540. Cited by: §5.4.
  • H. Sharma, N. Zerbe, I. Klempert, O. Hellwich, and P. Hufnagl (2017) Deep convolutional neural networks for automatic classification of gastric carcinoma using whole slide images in digital histopathology. Computerized Medical Imaging and Graphics 61, pp. 2–13. Cited by: Table 1, §5.1.
  • S. Sharma, R. Kiros, and R. Salakhutdinov (2015) Action recognition using visual attention. arXiv preprint arXiv:1511.04119. Cited by: §3.1.1.
  • D. Shen, G. Wu, and H. Suk (2017) Deep learning in medical image analysis. Annual Review of Biomedical Engineering 19, pp. 221–248. Cited by: §2.
  • H. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, and R. M. Summers (2016) Deep convolutional neural networks for computer-aided detection: cnn architectures, dataset characteristics and transfer learning. IEEE Transactions on Medical Imaging 35 (5), pp. 1285–1298. Cited by: §3.4.
  • K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §3.1.1, §3.4, §5.1.
  • K. Sirinukunwattana, S. e Ahmed Raza, Y. Tsang, D. R. Snead, I. A. Cree, and N. M. Rajpoot (2016) Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images.. IEEE Transactions on Medical Imaging 35 (5), pp. 1196–1206. Cited by: §3.1.2, §3.1.2, §3.1, Table 1, Table 7.
  • K. Sirinukunwattana, J. P.W. Pluim, H. Chen, X. Qi, P. Heng, Y. B. Guo, L. Y. Wang, B. J. Matuszewski, E. Bruni, U. Sanchez, A. Böhm, O. Ronneberger, B. B. Cheikh, D. Racoceanu, P. Kainz, M. Pfeiffer, M. Urschler, D. R.J. Snead, and N. M. Rajpoot (2017) Gland segmentation in colon histology images: the glas challenge contest. Medical Image Analysis 35, pp. 489–502. Cited by: §3.1.3, Table 7.
  • Y. Song, E. Tan, X. Jiang, J. Cheng, D. Ni, S. Chen, B. Lei, and T. Wang (2017) Accurate cervical cell segmentation from overlapping clumps in pap smear images. IEEE Transactions on Medical Imaging 36 (1), pp. 288–300. Cited by: §3.1.1, Table 1.
  • Y. Song, L. Zhang, S. Chen, D. Ni, B. Lei, and T. Wang (2015) Accurate segmentation of cervical cytoplasm and nuclei based on multiscale convolutional network and graph partitioning. IEEE Transactions on Biomedical Engineering 62 (10), pp. 2421–2433. Cited by: §3.1.1, Table 1.
  • F. A. Spanhol, L. S. Oliveira, C. Petitjean, and L. Heutte (2015) A dataset for breast cancer histopathological image classification. IEEE Transactions on Biomedical Engineering 63 (7), pp. 1455–1462. Cited by: Table 7.
  • K. Stacke, G. Eilertsen, J. Unger, and C. Lundström (2019) A closer look at domain shift for deep learning in histopathology. CoRR abs/1909.11575. Cited by: §3.4.2.
  • D. F. Steiner, R. MacDonald, Y. Liu, P. Truszkowski, J. D. Hipp, C. Gammage, F. Thng, L. Peng, and M. C. Stumpe (2018) Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer. The American Journal of Surgical Pathology 42 (12), pp. 1636. Cited by: §5.5.
  • Z. Swiderska-Chadaj, H. Pinckaers, M. van Rijthoven, M. Balkenhol, M. Melnikova, O. Geessink, Q. Manson, M. Sherman, A. Polonia, J. Parry, et al. (2019) Learning to detect lymphocytes in immunohistochemistry with deep learning. Medical Image Analysis 58, pp. 101547. Cited by: §3.1.2, §3.1.3, Table 1, Table 7.
  • W. F. Symmans, F. Peintinger, C. Hatzis, R. Rajan, H. Kuerer, V. Valero, L. Assad, A. Poniecka, B. Hennessy, M. Green, et al. (2007) Measurement of residual breast cancer burden to predict survival after neoadjuvant chemotherapy. Journal of Clinical Oncology 25 (28), pp. 4414–4422. Cited by: §1.
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich (2015) Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. Cited by: §3.1.1, §3.4, §5.1.
  • C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826. Cited by: §3.4, §5.1.
  • S. Tabibu, P. Vinod, and C. Jawahar (2019) Pan-renal cell carcinoma classification and survival prediction from histopathology images using deep learning. Scientific Reports 9 (1), pp. 1–9. Cited by: Table 4, §5.1.
  • N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway, and J. Liang (2016) Convolutional neural networks for medical image analysis: full training or fine tuning?. IEEE Transactions on Medical Imaging 35 (5), pp. 1299–1312. Cited by: §3.4.
  • B. Tang, A. Li, B. Li, and M. Wang (2019) CapSurv: capsule network for survival analysis with whole slide pathological images. IEEE Access 7, pp. 26022–26030. Cited by: Table 6, §4, §4.
  • [199] TCGA The cancer genome atlas. Note: Cited by: §5.3, Table 7.
  • [200] TCIA The cancer imaging archive. Note: Cited by: §5.3, Table 7.
  • D. Tellez, M. Balkenhol, I. Otte-Höller, R. van de Loo, R. Vogels, P. Bult, C. Wauters, W. Vreuls, S. Mol, N. Karssemeijer, G. Litjens, J. van der Laak, and F. Ciompi (2018) Whole-slide mitosis detection in H&E breast histology using PHH3 as a reference to train distilled stain-invariant convolutional networks. IEEE Transactions on Medical Imaging 37 (9), pp. 2126–2136. Cited by: §3.1.1, §3.4.2, Table 1.
  • D. Tellez, G. Litjens, P. Bándi, W. Bulten, J. Bokhorst, F. Ciompi, and J. van der Laak (2019a) Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology. Medical Image Analysis 58, pp. 101544. Cited by: §3.4.2, §5.3.
  • D. Tellez, G. Litjens, J. van der Laak, and F. Ciompi (2019b) Neural image compression for gigapixel histopathology image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1. Cited by: §3.2, §3.3, Table 2, §5.4.
  • H. Tokunaga, Y. Teramoto, A. Yoshizawa, and R. Bise (2019) Adaptive weighting multi-field-of-view cnn for semantic segmentation in pathology. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12597–12606. Cited by: §3.1.3, Table 1.
  • A. Vahadane, T. Peng, A. Sethi, S. Albarqouni, L. Wang, M. Baust, K. Steiger, A. M. Schlitter, I. Esposito, and N. Navab (2016) Structure-preserving color normalization and sparse stain separation for histological images. IEEE Transactions on Medical Imaging 35 (8), pp. 1962–1971. Cited by: §3.4.2, §3.4.2.
  • M. Valkonen, J. Isola, J. Isola, O. Ylinen, V. Muhonen, A. Saxlin, T. Tolonen, M. Nykter, and P. Ruusuvuori (2019) Cytokeratin-supervised deep learning for automatic recognition of epithelial cells in breast cancers stained for ER, PR, and Ki-67. IEEE Transactions on Medical Imaging, pp. 1–1. Cited by: §3.1.1, §3.4.2, Table 4, §5.1.
  • Y. Van Eycke, C. Balsat, L. Verset, O. Debeir, I. Salmon, and C. Decaestecker (2018) Segmentation of glandular epithelium in colorectal tumours to automatically compartmentalise ihc biomarker quantification: a deep learning approach. Medical Image Analysis 49, pp. 35–45. Cited by: §3.1.3, Table 1, §5.1.
  • M. E. Vandenberghe, M. L. Scott, P. W. Scorer, M. Söderberg, D. Balcerzak, and C. Barker (2017) Relevance of deep learning to facilitate the diagnosis of HER2 status in breast cancer. Scientific Reports 7, pp. 45938. Cited by: §3.1.1, Table 1.
  • B. S. Veeling, J. Linmans, J. Winkens, T. Cohen, and M. Welling (2018) Rotation equivariant cnns for digital pathology. In International Conference on Medical image computing and computer-assisted intervention, pp. 210–218. Cited by: Table 7.
  • M. Veta, Y. J. Heng, N. Stathonikos, B. E. Bejnordi, F. Beca, T. Wollmann, K. Rohr, M. A. Shah, D. Wang, M. Rousson, M. Hedlund, D. Tellez, F. Ciompi, E. Zerhouni, D. Lanyi, M. Viana, V. Kovalev, V. Liauchuk, H. A. Phoulady, T. Qaiser, S. Graham, N. Rajpoot, E. Sjöblom, J. Molin, K. Paeng, S. Hwang, S. Park, Z. Jia, E. I. Chang, Y. Xu, A. H. Beck, P. J. van Diest, and J. P.W. Pluim (2019) Predicting breast tumor proliferation from whole-slide images: the TUPAC16 challenge. Medical Image Analysis 54, pp. 111–121. Cited by: Table 6, §4, §4, §5.1, Table 7.
  • M. Veta, P. J. Van Diest, S. M. Willems, H. Wang, A. Madabhushi, A. Cruz-Roa, F. Gonzalez, A. B. Larsen, J. S. Vestergaard, A. B. Dahl, et al. (2015) Assessment of algorithms for mitosis detection in breast cancer histopathology images. Medical Image Analysis 20 (1), pp. 237–248. Cited by: §5.3, Table 7.
  • D. Wang, A. Khosla, R. Gargeya, H. Irshad, and A. H. Beck (2016a) Deep learning for identifying metastatic breast cancer. arXiv preprint arXiv:1606.05718. Cited by: Table 4.
  • H. Wang, A. C. Roa, A. N. Basavanhally, H. L. Gilmore, N. Shih, M. Feldman, J. Tomaszewski, F. Gonzalez, and A. Madabhushi (2014) Mitosis detection in breast cancer pathology images by combining handcrafted and convolutional neural network features. Journal of Medical Imaging 1 (3). Cited by: §3.1.1, Table 1.
  • M. Wang and W. Deng (2018) Deep visual domain adaptation: a survey. Neurocomputing 312, pp. 135–153. Cited by: §3.4.1.
  • S. Wang, J. Yao, Z. Xu, and J. Huang (2016b) Subtype cell detection with an accelerated deep convolution neural network. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 640–648. Cited by: §3.1.1, Table 1.
  • S. Wang, Y. Zhu, L. Yu, H. Chen, H. Lin, X. Wan, X. Fan, and P. Heng (2019a) RMDL: recalibrated multi-instance deep learning for whole slide gastric image classification. Medical Image Analysis 58, pp. 101549. Cited by: §3.2, Table 2, §5.2.
  • X. Wang, H. Chen, C. Gan, H. Lin, Q. Dou, E. Tsougenis, Q. Huang, M. Cai, and P. Heng (2019b) Weakly supervised deep learning for whole slide lung cancer image analysis. IEEE Transactions on Cybernetics, pp. 1–13. Cited by: §3.2, §3.2, Table 2.
  • J. W. Wei, L. J. Tafe, Y. A. Linnik, L. J. Vaickus, N. Tomita, and S. Hassanpour (2019) Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks. Scientific Reports 9 (1), pp. 3358. Cited by: §3.1.1, Table 1.
  • W. Weng, Y. Cai, A. Lin, F. Tan, and P. C. Chen (2019) Multimodal multitask representation learning for pathology biobank metadata prediction. arXiv preprint arXiv:1909.07846. Cited by: §5.4.
  • S. Xie and Z. Tu (2015) Holistically-nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1395–1403. Cited by: §3.1.3, §3.1.3, §3.2.
  • W. Xie, J. A. Noble, and A. Zisserman (2018a) Microscopy cell counting and detection with fully convolutional regression networks. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization 6 (3), pp. 283–292. Cited by: §3.1.2.
  • Y. Xie, X. Kong, F. Xing, F. Liu, H. Su, and L. Yang (2015a) Deep voting: a robust approach toward nucleus localization in microscopy images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 374–382. Cited by: §3.1.2, §3.1.2, Table 1.
  • Y. Xie, F. Xing, X. Kong, H. Su, and L. Yang (2015b) Beyond classification: structured regression for robust cell detection using convolutional neural network. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 358–365. Cited by: Table 1.
  • Y. Xie, F. Xing, X. Shi, X. Kong, H. Su, and L. Yang (2018b) Efficient and robust cell detection: a structured regression approach. Medical Image Analysis 44, pp. 245–254. Cited by: §3.1.2, §3.1.2, Table 1.
  • F. Xing, Y. Xie, and L. Yang (2016) An automatic learning-based framework for robust nucleus segmentation. IEEE Transactions on Medical Imaging 35 (2), pp. 550–566. Cited by: §3.1.1, Table 1.
  • F. Xing, T. C. Cornish, T. Bennett, D. Ghosh, and L. Yang (2019) Pixel-to-pixel learning with weak supervision for single-stage nucleus recognition in Ki-67 images. IEEE Transactions on Biomedical Engineering 66 (11), pp. 3088–3097. Cited by: §3.1.2, Table 1.
  • B. Xu, J. Liu, X. Hou, B. Liu, J. Garibaldi, I. O. Ellis, A. Green, L. Shen, and G. Qiu (2019a) Look, investigate, and classify: a deep hybrid attention method for breast cancer classification. In 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pp. 914–918. Cited by: §3.1.1, §3.1.1, Table 1, Table 2.
  • G. Xu, Z. Song, Z. Sun, C. Ku, Z. Yang, C. Liu, S. Wang, J. Ma, and W. Xu (2019b) CAMEL: a weakly supervised learning framework for histopathology image segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 10682–10691. Cited by: §3.2.
  • J. Xu, L. Xiang, Q. Liu, H. Gilmore, J. Wu, J. Tang, and A. Madabhushi (2015a) Stacked sparse autoencoder (SSAE) for nuclei detection on breast cancer histopathology images. IEEE Transactions on Medical Imaging 35 (1), pp. 119–130. Cited by: §3.3, Table 3.
  • K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio (2015b) Show, attend and tell: neural image caption generation with visual attention. In International Conference on Machine Learning, pp. 2048–2057. Cited by: §3.1.1, §3.1.1.
  • Y. Xu, Y. Li, Y. Wang, M. Liu, Y. Fan, M. Lai, and E. I. Chang (2017) Gland instance segmentation using deep multichannel neural networks. IEEE Transactions on Biomedical Engineering 64 (12), pp. 2901–2912. Cited by: §3.1.3, Table 1.
  • Y. Xu, Y. Li, M. Liu, Y. Wang, M. Lai, I. Eric, and C. Chang (2016) Gland instance segmentation by deep multichannel side supervision. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 496–504. Cited by: §5.2.
  • Y. Xu, J. Zhu, E. I. Chang, M. Lai, and Z. Tu (2014) Weakly supervised histopathology cancer image segmentation and classification. Medical Image Analysis 18 (3), pp. 591–604. Cited by: §3.2.
  • L. Yang, Y. Zhang, Z. Zhao, H. Zheng, P. Liang, M. T. Ying, A. T. Ahuja, and D. Z. Chen (2018) Boxnet: deep learning based biomedical image segmentation using boxes only annotation. arXiv preprint arXiv:1806.00593. Cited by: §3.2, §3.2.
  • X. Yi, E. Walia, and P. Babyn (2019) Generative adversarial network in medical imaging: a review. Medical Image Analysis, pp. 101552. Cited by: §2.
  • F. G. Zanjani, S. Zinger, B. E. Bejnordi, J. A. van der Laak, et al. (2018) Histopathology stain-color normalization using deep generative models. In 1st Conference on Medical Imaging with Deep Learning (MIDL), Amsterdam, The Netherlands., Cited by: §3.4.2, Table 5.
  • Z. Zhang, P. Chen, M. McGough, F. Xing, C. Wang, M. Bui, Y. Xie, M. Sapkota, L. Cui, J. Dhillon, et al. (2019) Pathologist-level interpretable whole-slide cancer diagnosis with deep learning. Nature Machine Intelligence 1 (5), pp. 236. Cited by: §3.1.1, §3.1.1, Table 1, §5.4.
  • Z. Zhao, H. Lin, H. Chen, and P. Heng (2019) PFA-ScanNet: pyramidal feature aggregation with synergistic learning for breast cancer metastasis analysis. In Medical Image Computing and Computer Assisted Intervention, pp. 586–594. Cited by: §3.1.3, Table 1.
  • S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. Torr (2015) Conditional random fields as recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1529–1537. Cited by: §3.1.1, §3.1.3.
  • J. Zhu, T. Park, P. Isola, and A. A. Efros (2017a) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232. Cited by: §3.4.2.
  • X. Zhu, J. Yao, F. Zhu, and J. Huang (2017b) WSISA: making survival prediction from whole slide histopathological images. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6855–6863. Cited by: Table 6, §4, §4.