Case-based similar image retrieval for weakly annotated large histopathological images of malignant lymphoma using deep metric learning

07/08/2021
by   Noriaki Hashimoto, et al.
0

In the present study, we propose a novel case-based similar image retrieval (SIR) method for hematoxylin and eosin (H E)-stained histopathological images of malignant lymphoma. When a whole slide image (WSI) is used as an input query, it is desirable to be able to retrieve similar cases by focusing on image patches in pathologically important regions such as tumor cells. To address this problem, we employ attention-based multiple instance learning, which enables us to focus on tumor-specific regions when the similarity between cases is computed. Moreover, we employ contrastive distance metric learning to incorporate immunohistochemical (IHC) staining patterns as useful supervised information for defining appropriate similarity between heterogeneous malignant lymphoma cases. In the experiment with 249 malignant lymphoma patients, we confirmed that the proposed method exhibited higher evaluation measures than the baseline case-based SIR methods. Furthermore, the subjective evaluation by pathologists revealed that our similarity measure using IHC staining patterns is appropriate for representing the similarity of H E-stained tissue images for malignant lymphoma.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 18

page 19

page 21

02/15/2019

Enhancing Remote Sensing Image Retrieval with Triplet Deep Metric Learning Network

With the rapid growing of remotely sensed imagery data, there is a high ...
10/22/2018

Brain Tumor Image Retrieval via Multitask Learning

Classification-based image retrieval systems are built by training convo...
04/07/2013

Image Retrieval using Histogram Factorization and Contextual Similarity Learning

Image retrieval has been a top topic in the field of both computer visio...
05/29/2019

Approaching Adaptation Guided Retrieval in Case-Based Reasoning through Inference in Undirected Graphical Models

In Case-Based Reasoning, when the similarity assumption does not hold, t...
12/11/2017

Deep metric learning for multi-labelled radiographs

Many radiological studies can reveal the presence of several co-existing...
11/16/2020

A New Similarity Space Tailored for Supervised Deep Metric Learning

We propose a novel deep metric learning method. Differently from many wo...
11/11/2019

Part-based Multi-stream Model for Vehicle Searching

Due to the enormous requirement in public security and intelligent trans...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this study, we propose a case-based similar image retrieval (SIR) method for histopathological whole slide images (WSIs) in digital pathology of malignant lymphoma. Malignant lymphoma is a group of blood malignancies with more than 70 subtypes [39]. Because each subtype has a different treatment strategy and prognosis, it is crucially important to identify the correct subtype through pathological diagnosis. Digital pathology based on WSIs is increasingly becoming popular, where a WSI contains an extremely large (e.g., pixels) digital image of an entire specimen invasively extracted from a patient [11, 28]. Given the WSI of a new malignant lymphoma patient, the goal of this SIR task is to retrieve “similar” WSIs from the database of past malignant lymphoma cases, where we need to find an appropriate “similarity” metric that is useful for pathological diagnosis of malignant lymphoma.

In practice, hematopathologists perform diagnosis of malignant lymphoma in the following two stages. In the first stage, hematoxylin and eosin (H&E)-stained tissue slides are analyzed to narrow down the list of potential subtypes and determine a combination of immunohistochemical (IHC) stains. In the second stage, the subtype is identified by analyzing the expression patterns of tissue slides stained with several IHC antibodies selected in the first stage. For inexperienced pathologists, narrowing down the subtypes and determining IHC staining patterns (combinations of IHC stains) in the first stage is a challenging and time-consuming task. Given an H&E stained tissue specimen as an input query, the proposed case-based SIR method can retrieve similar cases from a database of past cases along with their IHC staining patterns and subtypes which were diagnosed by experienced hematopathologists. Such retrieved similar cases and their IHC patterns and subtypes will help to support the decision-making of pathologists in the first stage.

The proposed case-based SIR method is constructed by learning an appropriate similarity or distance metric that has the following two properties: The first property is that the similarity between a query case and a retrieved case is determined based on the tumor regions in the two WSIs. Because the WSI of a malignant lymphoma specimen contains both normal and tumor cells, the similarity should be determined based only on information in the tumor region. However, it is difficult to know the region of a WSI that contains tumor cells, and the proposed method has to be trained using WSIs that have no annotations for the tumor region111 It is an extremely time-consuming task for pathologists to manually annotate the tumor regions in a large WSI, and it is almost impossible to conduct such annotations for hundreds of WSIs. . To overcome this difficulty, we effectively incorporate attention-based multiple-instance learning (MIL) [22, 18] into case-based SIR tasks. Because malignant lymphoma subtypes are characterized by information in the tumor region only, our basic idea is that the attention region extracted through attention-based MIL for subtype classification can be regarded as the tumor region.

The second property is that the similarity is defined in such a way that cases with similar IHC staining patterns can be selectively retrieved. Several conventional SIR methods for digital pathology [42, 16, 31, 20, 36, 46]

employed distance metric learning (DML), which is based on the relevance of subtype labels where feature learning is performed such that the distance between two feature vectors from the same-labeled images is shorter. As malignant lymphoma cases are highly heterogeneous, different combinations of IHC stains are used even among cases with the same subtype labels. Owing to such heterogeneity of IHC staining patterns, the coincidence of subtype would not be a suitable distance metric. In this study, we address this problem by using the similarity of IHC staining patterns as the similarity between cases. We regard the similarity as the continuous relevance index between two cases, which enables us to learn a metric that properly incorporates the similarity of IHC staining patterns through contrastive DML 

[5].

Figure 1 shows an example of the output of our case-based SIR method. Because a WSI contains the entire specimen including normal regions, it is important to find cases in which there exist similar tumor regions, rather than finding cases in which the entire WSI is similar. Therefore, as shown in Fig. 1, the proposed case-based SIR method not only provides similar past cases but also presents the tumor regions that are used to determine the similarity between a query and a retrieved similar case. It helps pathologists to understand how the retrieved similar cases were selected, IHC staining patterns used by the experienced pathologists, and subtypes finally identified for the retrieved similar cases. In this study, to verify the effectiveness of the proposed method, we applied it to 249 malignant lymphoma cases, each of which consisted of a WSI of a specimen, selected IHC staining patterns, and the final subtype diagnosed by experienced hematopathologists. In addition to the quantitative evaluation based on the similarity of IHC staining patterns, subjective evaluations by 12 pathologists were conducted to confirm that the retrieved similar cases based on the obtained similarity metric are helpful for malignant lymphoma pathology.

Figure 1: Example of the output of the proposed case-based SIR method. When a user inputs a query case, the proposed method retrieves multiple similar past cases in the database. The proposed method provides not only similar WSIs but also attention weight heatmaps and representative image patches that contribute to the determination of retrieved similar cases. The five image patches for a query case are high-attention (HA) patches in which attention weights are relatively high in the entire tissue, whereas the corresponding similar image patches for retrieved similar cases are the most similar image patches to each of the HA image patches. The color frames in the image patches correspond to the small squares with same colors in the thumbnail of the WSI, indicating that each image patch was extracted from the corresponding region. The main challenge addressed in this study is determining the attention weights and similarity between image patches for the pathological diagnosis of malignant lymphoma.

2 Preliminaries

In this section, we first present the related works and our contributions. Thereafter, we formulate the problem setup and evaluation measures of the case-based SIR method. We also briefly describe the basic ideas of attention-based MIL and contrastive DML and explain the use of these techniques for developing the case-based SIR method.

2.1 Related works and our contributions

Related works

In digital pathology, the availability of digital histopathological images has enabled numerous applications including image classification [29, 21, 3], detection [6, 8, 1], and segmentation [45, 12, 41, 40]. Similar image retrieval is an important task in digital pathology [47, 2]

, and recent advances in deep learning techniques have accelerated the development of SIR methods 

[23, 26, 34, 31, 20, 25].

SIR methods in digital pathology are mainly categorized into two approaches: content-based SIR and case-based SIR. In digital pathology, because a WSI is extremely large, analyses are usually performed on the basis of small image patches extracted from a WSI. In content-based SIR, an image patch is given as a query, and similar image patches are retrieved as the results. Examples of content-based SIR studies for digital pathology include [48] (lung cancer), [49] (breast cancer), and [31] (colorectal cancer). In contrast, case-based SIR methods must first select “informative” patches, and thereafter aggregate the similarities defined in multiple informative patches to form the overall WSI similarity. In the context of digital pathology, relatively few studies on case-based SIRs have been conducted. In [23], an indicator known as the blue ratio, which is higher in regions with more cells, was used as the criterion for selecting informative patches. In [25], clustering was first applied to patches, and a representative patch from each cluster was considered as an informative patch. Although these approaches use simple methods for selecting informative patches, we employ attention-based MIL such that informative patches can be selected from the tumor region. For pathological image classification based on WSIs that contain normal and tumor cells, MIL has been demonstrated to be effective [27, 22, 9, 7, 43, 3, 38, 18]. Attention-based MIL [18] is particularly useful because it can quantify the relative importance of each image patch in the WSI as attention weight. In pathological image classification, this implies that patches with high attention can be regarded as tumor regions because the information that determines subtype classification would exist in tumor regions. In this study, we incorporate attention-based MIL into case-based SIR tasks that enable the learning of distance measures based only on the information in the tumor region.

In SIR, DML has been effectively used to obtain an appropriate distance metric [42, 16, 31, 20, 36, 46]. DML methods can be roughly categorized into parametric distance measure- and feature learning-based approaches. The former approach includes Mahalanobis DML [44, 35, 4] and multiple kernel learning [37, 33, 13, 14]

. When the SIR method is implemented with a deep neural network (DNN) model, a feature learning-based approach is often adopted. For instance, given a class label for each image, feature learning is performed based on a loss function such that images with the same label are closer together, whereas images with different labels are farther apart 

[36, 20, 31]. By selecting images based on the distance in the learned feature space, the distance metric in the SIR method can properly consider the class labels. The proposed SIR method is designed to selectively retrieve similar cases with similar IHC staining patterns. As mentioned, we regard the similarity of IHC staining patterns to the continuous similarity between the two cases. Metric learning algorithms that use a continuous label or multi-label have been proposed [24, 17]. We also define the continuous relevance index between two cases using IHC staining patterns and perform metric learning that utilizes pathological images.

Our contributions

In this study, we propose a case-based SIR method that supports malignant lymphoma pathology diagnosis by introducing a DNN model that effectively incorporates MIL and DML. To the best of our knowledge, there is no method that combines these techniques with the case-based SIR method. Overall, the main advantages of the proposed method and our contributions in this study are summarized as follows:

  • By incorporating attention-based MIL into cased-based SIR tasks, the proposed method can retrieve similar cases based on a similarity measure that depends only on patches in the tumor region.

  • By defining the similarity of two H&E stained images using IHC staining patterns in contrastive DML, the proposed method can retrieve similar cases that would have similar IHC staining patterns.

  • We applied the proposed method to 249 cases of malignant lymphoma and demonstrated its effectiveness through quantitative evaluation and subjective evaluation by 12 pathologists.

2.2 Problem setup

In this study, we denote the set of natural numbers up to as . Let be the number of past malignant lymphoma cases (patients), be the number of subtypes, and be the number of the kinds of IHC stains. The entire database of the past cases is represented as , where is a WSI, is a -dimensional one-hot vector for the subtype, and is an -dimensional binary vector for IHC staining patterns. Here, the position of 1 in the one-hot vector indicates the subtype, whereas the values 1 and 0 in the binary vector indicate whether or not the corresponding IHC stain was used for the pathological diagnosis. We develop a case-based SIR method using a database of past cases , as shown in Fig. 2. As discussed in §1, the proposed method is designed to selectively retrieve similar cases that would have similar IHC staining patterns using features in the tumor region only.

Consider a situation in which a WSI of a new query case is input to the method, and we want to compute the distance between the query case and one of the past cases . Let and be the sets of all image patches taken from and , respectively, where image patches are randomly sampled from the entire WSI222 We set for the training phase and 1000 for the test phase in the demonstration study in §4. . Furthermore, let and

be the sets of image patches taken from the (estimated) tumor region in

and , respectively, where “HA” denotes “High-Attention” and we refer to those patches as HA patches. We denote the image patches from cases and as and , respectively. Furthermore, let us denote the feature vectors (which will be learned through DNN representation learning whose details will be explained in §3) corresponding to image patches and as and , respectively. The desirable distance metric for the proposed case-based SIR method between query case and past case is defined as

(1)

Given a query WSI , the proposed case-based SIR method retrieves a (or a few) similar case such that the distance is less than those of the remaining cases 333The pairs of five image patches in Fig. 1 are the top-5 HA image patches of the query case and the corresponding similar image patches of the retrieved similar cases. We display these pairs of image patches to explain the similarity between the query and retrieved similar cases. .

There are two challenges for learning the desirable distance metric eq:prob. The first challenge is that the sets of patches in the tumor region and are unknown. As mentioned in §1, to overcome this challenge, we employ attention-based MIL, the details of which will be described later. The second challenge is learning the features and as a DNN representation such that the distances in eq:prob tend to be small when the IHC staining patterns and are similar. As mentioned in §1, to overcome this challenge, we employ contrastive DML, the details of which will be described later. In §3, we propose a DNN model and its learning algorithm by effectively combining attention-based MIL and contrastive DML for IHC staining patterns.

2.3 Evaluation measure

In the context of DML for classification problems, a common evaluation measure is simply the classification error (the subtype classification error in our problem setup). However, because H&E stained tissues are highly heterogeneous even among cases with the same subtype labels, retrieving cases with the same subtypes is not sufficient. As a quantitative performance measure of the proposed case-based SIR method, we thus employ the Jaccard index for IHC staining patterns. Given a query case

and a retrieved case , the Jaccard index of their IHC staining patterns and is defined as follows:

(2)

We verified that the similarity of IHC staining patterns in the form of eq:jaccard is more meaningful measure than subtype classification error for case-based SIR in practical malignant lymphoma pathology by conducting subjective evaluation experiments by 12 pathologists. In the subjective evaluation experiments, given a query case, we present a pair of retrieved similar cases, one of which is selected based on a distance measure trained to comply with the IHC staining patterns, whereas the other is selected using a distance measure trained to minimize subtype classification error, and pathologists answer which of the two cases are more similar to the query case. The details of the subjective evaluations are presented in §4.

2.4 Attention-based MIL

Here, we describe the basic idea of attention-based MIL [22], which is used as a component of the proposed DNN model in the next section, and its use in the proposed case-based SIR method. In attention-based MIL, we define a bag as a set of image patches randomly sampled from a WSI. The basic idea of attention-based MIL is to assign an attention weight to each image patch, which indicates the relative importance of each image patch within the bag. Let be the set of bags in the case and be the set of image patches in the bag of case . Furthermore, we denote as the attention weight of the image patch . The attention weight takes a value in , and it is normalized such that the sum of the attention weights in each bag is one, that is,

. In attention-based MIL, a classifier (for malignant lymphoma subtype classification) is trained with the importance weighting of each image patch by the attention weight, whereas the attention weights are also adaptively updated during the training process. Because malignant lymphoma subtypes are characterized by information in the tumor region only, image patches that have high attention weights are considered to be taken from the tumor region. In this study, we assume that at least

image patches in each bag are sampled from the tumor region; thus, the collection of the top- image patches in all the bags are considered as the set of high-attention image patches 444 We set of all the image patches in a bag in the demonstration study in §4. .

2.5 Contrastive DML

Here, we describe the basic idea of contrastive DML [5], which is used as a component of the proposed DNN model in the next section, and its use in the proposed case-based SIR method. The goal of conventional feature learning-based DML is to learn a function that maps an image patch to a feature vector for such that the Euclidean distance between the features and is small if the cases and belong to the same class, that is, . As discussed, because we intend to retrieve similar cases that would have similar IHC staining patterns rather than just belonging to the same subtype, we need to incorporate the similarity of IHC staining patterns into the distance metric. Let , and consider the problem of learning the distance function to minimize the following loss function:

(3)

where

is a hyperparameter that defines a margin between dissimilar image patches

555 We set in the demonstration study in §4. (see eq:concrete_distance_function in §3.1 for the concrete formulation of the distance function ). In eq:contrastive_loss, is known as the relevance index in the context of contrastive DML, and we employ the Jaccard index in eq:jaccard as the relevance index for our task. The first term in eq:contrastive_loss works such that image features of two inputs with similar labels are closer together, whereas the second term works such that image features of two inputs with different labels are farther apart based on the margin

. By learning a feature extraction function that minimizes the loss in eq:contrastive_loss, we can obtain a distance function

that incorporates the similarity of the IHC staining patterns.

Figure 2: Overview of the training and test phases of the proposed case-based SIR model. In the training phase, only HA image patches are sampled from the entire image patches of past cases, and the image features for HA patches are saved in the search database. In the test (similar case retrieval) phase, given a new query WSI , HA image patches are sampled similarly and their features are calculated. By computing the case distance between the query case and all the past cases , the case that has the minimum (resp., the 2nd, 3rd, minimum) distance from a query case is retrieved as the most similar (resp., the 2nd, 3rd, most similar) case.

3 Proposed case-based SIR method

We propose a DNN model and its learning algorithm that provides the desirable distance metric in eq:prob for our case-based SIR task by effectively combining attention-based MIL and contrastive DML. The problem of learning the desirable distance metric in eq:prob is decomposed into two sub-problems as follows: The first sub-problem is to learn a function for extracting a set of image patches from the estimated tumor region. The second sub-problem is to learn a function that maps an image patch into a feature vector for , , the latter of which is used to measure the distance in eq:prob. Each of these two functions is obtained as a part of the entire DNN model.

3.1 DNN model

Figure 3 illustrates the entire DNN model that consists of four components: , , , and , each of which is parametrized by a set of learnable parameters , , , and , respectively. Each component is described as follows:

  • Feature extractor : The first component is known as the feature extractor, which is introduced such that the two aforementioned sub-problems have common shared features. The feature extractor is a mapping as follows:

    (4)

    where denotes a feature vector of the image patch , which is implicitly defined by learning the representation in the DNN model.

  • Attention network : The second component is used to compute the attention weights , , and it is formally expressed as follows:

    (5)

    Particularly, the attention weight is computed as follows:

    (6)

    where denotes a matrix of parameters, and denotes a vector of parameters with appropriate dimensions, that is, .

  • Classifier network : The third component is used to classify the malignant lymphoma subtype based on the MIL framework (see §2.4). In MIL, a bag (a set of image patches randomly sampled from a WSI) , , is classified into one of the subtypes. The input of is the weighted feature vector with attention weights as follows:

    (7)

    Given an input , the subtype classifier outputs the

    -dimensional class probability vector

    . Note that constructing a subtype classifier is not the main purpose of this study. By training the subtype classifier in the MIL framework, the attention network is trained such that image patches taken from the tumor region have large attention weights.

  • Metric network : The fourth component is used to transform the feature vector , , , obtained by the feature extractor into another feature vector , which is used for the desirable distance metric in eq:prob through contrastive DML (see §2.5). The metric network is trained such that the contrastive loss in eq:contrastive_loss is minimized, where the distance function is implemented with and as follows:

    (8)

    Note that when is trained, only parts of the image patches are used. As described in §2.4, is the set of image patches whose attention weights are within the top in each bag .

3.2 Training DNN model

The parameters of the four components , , , and , respectively, for , , , and are optimized by the alternate algorithm following two minimization problems:

(9)
(10)

where the loss function is the standard cross-entropy loss defined as follows:

(11)

whereas the loss function is the contrastive loss function (see eq:contrastive_loss), defined as follows:

(12)

In our implementation, four components are implemented as follows: We employ ResNet50 [19] as the feature extractor

, and it is initialized with the extractor pre-trained with the ImageNet database 

[10]. The attention network is implemented as a softmax operator in eq:attention_weight. The classifier network is implemented using a simple multi-layer perception for multiclass classification. The metric network is implemented using a simple multi-layer perception for feature transformation.

3.3 Case-based SIR based on the trained DNN model

For the construction of the search database in the case-based SIR task, image patches are randomly sampled from each WSI in the training dataset, and the attention weights of these image patches are calculated by the trained feature extractor and the trained attention network . Thereafter, from these image patches, image patches that have higher attention weights than other image patches are selected as the HA patches , and their feature vectors are saved on the database as references. In the testing (retrieval) phase, when a new query case is input to the SIR model, of HA image patches are sampled from image patches , and the feature vectors for HA image patches are computed using the aforementioned procedure. For each , the distance in eq:prob is calculated, and the most similar case (or multiple cases with the highest similarity) are retrieved. Along with the selection of the most similar case(s), a set of similar image patch pairs is also provided as additional information (see §2.2 and Fig. 1).

Multi-scale input

Because pathologists observe the H&E stained tissue slides under a microscope at different magnifications, it is preferred that the retrieval results are also based on similarity using multi-scale information. In the training phase using multi-scale inputs, different DNN models are independently trained with image patches of the corresponding magnifications. In the testing (retrieval) phase using two magnifications (e.g., 40x and 5x), the distance between image patches of high and low magnifications is calculated similar to eq:concrete_distance_function as follows:

(13)

The embedded image features and are calculated from the image patches and for high and low magnifications, respectively. Note that image patches and are extracted from the regions of the same central field of view in the WSI. High-attention patches for multi-scale input are selected using the average attention weights for multiple magnifications . Similar cases are obtained by comparing the case distances using multi-scale patch distances between the HA image patches that are selected based on multi-scale attention weights . If we employ three or more magnifications as the multi-scale input, the same number of DNN models are trained according to the increase in the number of input magnifications.

Figure 3: Schematic illustration of the DNN model for the proposed method. The DNN model effectively combines attention-based MIL classification to identify tumor-specific image patches as HA patches and contrastive DML to learn the appropriate distance metric by incorporating the similarity of IHC staining patterns. The proposed DNN model consists of four components, , , , and , each of which is parameterized by a set of learnable parameters , , , and , respectively. The parameters for attention-based MIL and contrastive DML are alternately updated.

4 Experiments

To demonstrate the effectiveness of the proposed method, we applied it to 249 malignant lymphoma cases, each of which contains a WSI of a specimen, selected IHC stains, and the final subtype diagnosed by experienced hematopathologists. In addition to the quantitative evaluation based on the similarity of IHC staining patterns, subjective evaluations by 12 pathologists were conducted to confirm that the retrieved similar cases based on the obtained similarity metric are useful in the pathological diagnosis of malignant lymphoma.

4.1 Experimental setup

Malignant lymphoma dataset

The malignant lymphoma dataset contains clinical cases with three subtypes: 76 diffuse large B-cell lymphoma, 90 follicular lymphoma, and 83 reactive lymphoid hyperplasia. All cases were diagnosed at Kurume University in 2018, and the subtype labels for each case were identified by confirming the expression patterns of IHC stained tissue slides. The samples used in this study were approved by the Ethics Review Committee of Kurume University and RIKEN in accordance with the recommendations of the Declaration of Helsinki. For each case, we can refer to diagnostic information containing patient metadata, subtype, IHC staining pattern, and other findings. In total, types of IHC antibodies were used in this study. All glass slides were digitized using a WSI scanner Aperio GT 450 (Leica Biosystems, Germany) at 40x the original magnification (0.26 um/pixel). The 249 cases were split into five groups while maintaining the ratio of subtypes, and 5-fold cross-validation was performed. In each fold of cross-validation, 25% of the training cases were used as validation cases, which were used for selecting hyperparameters. Using a WSI software OpenSlide [15], the sets of image patches were extracted from the tissue regions that were determined using Otsu method [30] through saturation of HSV color space.

Implementation details

The feature extractor

was initialized by the ResNet50 pre-trained with the ImageNet database, and the dimension size of the feature vector

was set to 2048 after global average pooling layer. For attention-based MIL, 5000 image patches with pixels were extracted from each WSI and 50 bags, each of which contained 100 image patches, were used666If a WSI was excessively small to extract 5000 non-overlapping image patches, only a small number of available image patches were used.

. After one-epoch training of attention-based MIL, that is, all bags were used for training once, the training of contrastive DML was conducted by using HA patches whose attention weights were ranked at the top

of all image patches in each bag, that is, HA patches. In contrastive DML, 100 HA patches were randomly selected from the 500 HA patches for each case, and HA patch pairs were constructed such that each selected HA patch was used only for one of the pairs, where

denotes the number of training cases. The training of contrastive DML was conducted for 10 epochs. This process was repeated 10 times in overall training, that is, attention-based MIL was trained for 10 epochs, whereas contrastive DML was trained for 100 epochs. The parameters of the network were optimized using stochastic gradient descent (SGD) momentum 

[32], where the learning rate, momentum, and weight decay were set to 1.25, 0.9, and , respectively.

In the case-based SIR task after training, 1000 image patches were randomly extracted from each WSI, and 100 HA image patches with attention weights within the top 10% in each case were used to compute the case distance . Similar cases were retrieved based on the case distance, and the method provided retrieval results in descending (resp. ascending) order of similarity (resp. distance). In our experiment, we considered 40x image input, 5x image input, and multi-scale input of 40x and 5x as the magnification of the input image patches (see Fig. 1).

Baseline methods

We compared the following five methods:

  • pre-trained ResNet50 + all patches,

  • subtype-based metric + all patches,

  • staining-based metric + all patches,

  • subtype-based metric + HA patches, and

  • staining-based metric + HA patches (proposed method).

Here, “all patches” represents that attention-based MIL was not employed and image patches were randomly selected, whereas “HA patches” represents that attention-based MIL was used for selecting HA image patches. In “all patches” setting, 1000 image patches were first randomly extracted from each WSI, and the training of contrastive DML was conducted for 100 epochs in which 100 image patches were randomly extracted from the 1000 image patches. Furthermore, “subtype-based metric” indicates that the relevance index in eq:contrastive_loss for contrastive DML was defined as 1 if the subtypes of the two cases are the same and 0 otherwise, whereas “staining-based metric” indicates that the Jaccard index of IHC staining patterns in eq:jaccard was used as the relevance index. The first method “pre-trained ResNet50 + all patches” is a simple baseline in which neither attention-based MIL nor contrastive DML was used, and the distance between two cases was simply measured by the distances between two feature vectors obtained by a pre-trained ResNet50 with the ImageNet database without any fine tuning. By comparing the proposed method with the first four baseline methods, we demonstrate the effect of selecting HA image patches through attention-based MIL and the effect of considering the similarity of IHC staining patterns through contrastive DML.

4.2 Results

One of the main contributions of this study is the utilization of IHC staining patterns to provide a useful similarity measure for heterogeneous malignant lymphoma cases. The performance of the proposed and baseline methods was evaluated not only through a quantitative evaluation but also through a subjective evaluation by 12 pathologists. First, in the quantitative evaluation, the similarity of IHC staining patterns between a test query case and a retrieved similar case were compared among the methods. In the subjective evaluation, we examined whether IHC staining similarity is a more appropriate measure than subtype similarity for the pathological diagnosis of malignant lymphoma.

Quantitative evaluation

We evaluated the case-based SIR performance based on the similarity of the IHC staining patterns between an input query case and a retrieved similar case in the form of IHC staining accuracy defined by the Jaccard index in eq:jaccard. Table 1 summarizes the results with three types of magnifications: 40x, 5x, and multi-scale of 40x & 5x. In the table, the average IHC staining accuracies of top-5 retrieved similar cases are listed. The results demonstrate that the proposed method has the highest IHC staining accuracy among all the methods for all three types of magnifications. The differences in the IHC staining accuracies between the proposed and four baseline methods are statistically significant at a significance level of 0.05, except for the 40x image input.

Methods Magnifications
40x 5x 40x & 5x
Pre-trained ResNet50 + all patches 0.6020.001 0.6010.001 0.6090.010
Subtype-based metric + all patches 0.6180.010 0.6410.011 0.6450.011
Staining-based metric + all patches 0.6420.011 0.6500.011 0.6600.012
Subtype-based metric + HA patches 0.6330.010 0.6430.011 0.6530.010
Staining-based metric + HA patches (proposed) 0.6470.011 0.6690.011 0.6760.011
Table 1:

Comparison of IHC staining accuracy between query cases and retrieved similar cases through 5-fold cross-validation with three types of magnifications. Each result shows the mean and standard error of the average accuracies of the top-5 retrieved similar cases for each query case. The proposed method achieved the best IHC staining accuracy in all types of magnifications. The differences between the proposed method and all the baseline methods are statistically significant at the 0.05 level in 5x and multiscales of 40x & 5x.

We also compared the similarity of malignant lymphoma subtypes between an input query case and a retrieved similar case in the form of subtype accuracy, which takes 1 if the two subtypes are the same and 0 otherwise. Table 2 summarizes the subtype accuracy results in the same format as Table 1. Although the criterion employed in the proposed method is not directly related to the subtype accuracy measure, the proposed method achieved the best performance among the five methods in two of the three magnification settings. In the magnification setting of 40x, “subtype-based metric + HA patches” achieved the best performance. This is reasonable because the subtype-based metric is directly tailored to subtype accuracy. In terms of the reason why the proposed method with “staining-based metric” was better or comparative to the method with “subtype-based metric,” we conjecture that a good representation for IHC staining patterns is also a good representation for the subtype because the difference in IHC staining patterns reflects the heterogeneity of malignant lymphoma subtypes.

Methods Magnifications
40x 5x 40x & 5x
Pre-trained ResNet50 + all patches 0.6330.018 0.6250.017 0.6540.017
Subtype-based metric + all patches 0.6730.019 0.7140.020 0.7360.020
Staining-based metric + all patches 0.5780.021 0.6000.020 0.5650.022
Subtype-based metric + HA patches 0.7200.020 0.7370.021 0.7740.020
Staining-based metric + HA patches (proposed) 0.7120.021 0.7700.020 0.7830.019
Table 2: Comparison of subtype accuracy between query cases and retrieved similar cases through 5-fold cross-validation with three types of magnifications. Each result shows the mean and standard error of the average accuracies of the top-5 retrieved similar cases for each query case. The proposed method achieved the best subtype accuracy in two out of three magnifications and it was comparable in the remaining magnification. Note that the subtype accuracy is directly tailored to subtype-based metric. We conjecture that the proposed method with a staining-based metric had good performance in terms of not only IHC staining accuracy but also subtype accuracy because a good representation of IHC staining patterns is also a good representation of subtypes.

Subjective evaluation

The goal of the subjective evaluation is to confirm whether IHC staining similarity is a more appropriate measure than subtype similarity for the pathological diagnosis of malignant lymphoma. To this end, we only compared the proposed method “staining-based metric + HA patches” with one of the baseline methods “subtype-based metric + HA patches” in the subjective evaluations. The task of each participant (pathologist) was to evaluate which of the two retrieval results (obtained using the proposed and baseline methods, respectively) was more similar to an input query. An example of the subjective evaluation task is shown in Fig. 4. For an input query case, the image patches of 40x and 5x magnifications with the top-5 attention weights were shown. For a retrieved similar case, the image patches that had the minimum distance from each query image patch obtained using each method were shown. For instance, “Similar patch #1” was the most similar image patch corresponding to “Patch #1” in the query case. All 249 cases were used as an input query once, that is, for each input query, the task was to find similar cases from the training (+validation) set for the cross-validation round when the input query was in the test set. The result for each query was evaluated by a 4-grade score; a participant is asked to select one option among the following options: “the result 1 is similar to a query,” “the result 1 is weakly similar to a query,” “the result 2 is weakly similar to a query” or “the result 2 is similar to a query,” where either result 1 or result 2 corresponds to either the proposed method or the baseline method, which is determined at random. The order of query cases was also shuffled randomly for each participant. In total, 12 pathologists composed of three experienced hematopathologists, four standard pathologists, and five pathological trainees participated in the subjective evaluation.

Figure 5 shows the results of each of the 12 participants in pie charts. For all 12 participants, the proportion of responses in which the proposed method was more similar (thick blue) or weakly similar (thin blue) to the query case than the baseline method was significantly higher than the opposite responses (thick and thin orange colors). This result indicates that all 12 pathologists determined that IHC staining similarity was more appropriate than subtype similarity as a similarity measure for the pathological diagnosis of malignant lymphoma.

To aggregate the evaluation results, evaluation score was counted as 1 if the result of the proposed method was evaluated as “similar” or “weakly similar,” and 0 otherwise. We further compared “confident responses” by removing “weakly similar” responses. Table 3 lists the average evaluation scores of each of the 12 participants. The results demonstrate that the proposed method could retrieve more similar cases in which pathologists felt they were more similar to query cases. The superiority of the proposed method is more evident when we consider only the confident responses. We note that there is a possibility that both the proposed and baseline methods retrieve the same case but different image patches. Particularly, different cases were retrieved in the two methods for 204 cases over 249 cases, whereas the same case was retrieved (but with different image patches) for the remaining 45 cases. Table 4 lists the average scores for the former and latter cases. In all the presented results, the difference between the proposed and baseline methods is statistically significant with based on a randomized test777

To quantify the statistical significance of the results in the subjective evaluation, we performed a Monte Carlo statistical test with the null hypothesis that the proposed and baseline methods are same. Particularly, we generated 1,000,000 randomized results based on the null hypothesis. A

-value for each participant is listed in Table 3. Consequently, most of the 1,000,000 results are less than the actual scores listed in Table 3, and all the scores listed in Table 3 are statistically significantly larger than 0.5, with . .

Participants Score Confident Score -value
1 0.715    0.724 (232)
2 0.727    0.783 (120)
3 0.723    0.811 (53)
4 0.683    0.750 (64)
5 0.671    0.663 (240)
6 0.683    0.683 (249)
7 0.614    0.623 (138)
8 0.627    0.688 (96)
9 0.723    0.815 (145)
10 0.667    0.864 (22)
11 0.606    0.663 (92)
12 0.651    0.698 (129)
MeanS.E. 0.6740.012 0.7300.021
Table 3: Mean binary scores of all participants through subjective evaluation experiment. The “Score” indicates the results for all cases, whereas the “Confident Score” indicates the results answered with confidence, i.e., by excluding “weakly similar” responses. Each evaluation score was counted as 1 if the result of the proposed method was evaluated as “similar” or “weakly similar,” and 0 otherwise. The bracketed numbers indicate the number of “confident responses” by removing “weakly similar” responses. The -value for each result was computed by the Monte Carlo statistical test with the null hypothesis that the proposed and baseline methods are same, indicating that all the scores are highly statistically significant.
Retrieval results MeanS.E.
Different cases 0.6880.013
Same cases 0.6110.019
Table 4: Mean binary scores for 204 cases for which two methods retrieved different cases to the query case, and that for the remaining 45 cases in which two methods retrieved same cases but different patches to the query case. In both cases, the proposed method was evaluated as more suitable.

These results on subjective evaluation demonstrate that IHC staining similarity is more appropriate than subtype similarity for the pathological diagnosis of malignant lymphoma. Even when the two methods select the same cases, the image patches selected as the basis for the decision using the proposed method are more suitable than those selected using the baseline method.

Visualization of attention regions

In case-based SIR, it is desirable to be able to explain the selection of retrieval results as similar cases. To realize such explainable retrieval results, our proposed method provides the attention weights that indicate the regions that were focused as HA image patches in computing case distance . The color plots of all WSIs in Fig. 1 show the attention weights. When we compute the attention weights for visualization purposes, attention weights or for all image patches were computed and normalized to the range for each case . The red region in the heat map indicates HA image patches that were used to calculate the similarity between two cases, whereas the blue region in the heat map indicates image patches whose attention weights are low. In Fig 1, similar cases were retrieved with a multi-scale input of 40x & 5x, and multi-scale attention weights are visualized as a heat map. We observe that the selected patches are visually similar; in particular, they are quite similar in both low and high magnifications by considering the multi-scale input.

4.3 Examples

In the previously described experiments, we confirmed that the proposed method performed better than the baseline methods. We investigate the results of the proposed method that were evaluated as more similar, and the difference of the results of the proposed method and those of the baseline method. Figure 6 shows a histogram of the number of cases for which how many of the 12 participants responded that the proposed method is better than the baseline method in the subjective evaluation in §4.2. The horizontal axis represents the number of participants who voted for the proposed method as the more similar result, e.g., “12” shows the number of cases in which all participants voted for the proposed method as “similar” or “weakly similar.” In these aggregated results, 175-case results of the proposed method were evaluated as more suitable than the baseline method by the majority of the participants. In total, in 36 cases, the proposed method was evaluated as more similar by all 12 participants, whereas there were only two cases in which the baseline method was evaluated as more similar by all 12 participants.

Figure 7 shows examples of retrieval results where all participants evaluated the proposed method as more similar than the baseline method. In addition to the same image patches as shown in the subjective evaluation, the thumbnails of the retrieved similar cases are also shown to make it easy to confirm whether the two retrieval cases are the same. In the examples, the lower images show that both the proposed and baseline methods showed the same similar case (but different image patches). Even if both methods retrieved the same similar case, the proposed method could obtain more similar image patches and obtain a better evaluation by all 12 pathologists.

Figure 4: Example of the subjective evaluation tasks. The participants were asked to evaluate the result that was more similar to a query case by a 4-grade score. Five image patches of 40x and 5x magnifications that had top-5 attention weights were shown for an input query case, whereas the image patches that had the minimum distance from each query image patch obtained using the two methods are shown. Either “Retrieved result 1” or “Retrieved result 2” corresponds to either the proposed method or the baseline method, which is determined at random.
Figure 5: Pie charts for the proportions of the four answers by 12 participants. Each chart corresponds to a different participant, where “++” and “+” in legends mean “similar to a query” and “weakly similar to a query,” respectively. It can be confirmed that thick blue and thin blue area are clearly more than a half in all participants, which indicates that the results retrieved using the proposed method were more likely to be evaluated as “more similar.”
Figure 6: Histogram of the number of cases for which the number of the 12 participants responded that the proposed method is better than the baseline method. In total, in 36 cases, the proposed method was evaluated as more similar by all 12 participants, whereas there were only two cases in which the baseline method was evaluated as more similar by all 12 participants.
  
  
Figure 7: Examples of retrieval results that all 12 participants evaluated the proposed method as more similar than the baseline method. In addition to the image patches that were shown in the subjective evaluation, the thumbnails of similar cases are also shown. As shown in the two bottom examples, even when both methods retrieved the same case, the image patches selected using the proposed SIR method were evaluated as more relevant to the HA patches in the query case.

5 Conclusion

We proposed a case-based SIR method for unannotated large histopathological images of malignant lymphoma. The proposed method with attention-based MIL can automatically extract informative image patches from unannotated WSIs, and it enables a user to input a WSI as a query without the selection of an image patch. Moreover, we employed the similarity of IHC staining patterns as the similarity measure in contrastive DML, where the embedded features of the images that have similar IHC staining patterns are much closer. In the quantitative evaluation of 249 malignant lymphoma patients, we compared the proposed method with several baseline methods, and our proposed method exhibited the highest accuracy in both IHC staining patterns and subtypes between query and similar cases. Furthermore, we conducted a subjective evaluation experiment to verify our proposed similarity measure using IHC staining patterns and confirmed that our method could retrieve similar cases in which pathologists felt more similar in the observation of the H&E stained tissue slide than the baseline method. The proposed case-based SIR method is useful in malignant lymphoma pathology because it provides not only WSIs but also image patches and visualized attention weights that indicate the similarity of the image patches between a query case and a retrieved similar case and the regions of the entire WSI that were focused in the retrieval phase.

References

  • [1] B. E. Bejnordi, M. Veta, P. J. Van Diest, B. Van Ginneken, N. Karssemeijer, G. Litjens, J. A. Van Der Laak, M. Hermsen, Q. F. Manson, M. Balkenhol, et al. (2017) Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama 318 (22), pp. 2199–2210. Cited by: §2.1.
  • [2] J. C. Caicedo, F. A. González, and E. Romero (2011) Content-based histopathology image retrieval using a kernel-based semantic annotation framework. Journal of biomedical informatics 44 (4), pp. 519–528. Cited by: §2.1.
  • [3] G. Campanella, M. G. Hanna, L. Geneslaw, A. Miraflor, V. W. K. Silva, K. J. Busam, E. Brogi, V. E. Reuter, D. S. Klimstra, and T. J. Fuchs (2019) Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine 25 (8), pp. 1301–1309. Cited by: §2.1, §2.1.
  • [4] C. Chang (2012) A boosting approach for supervised mahalanobis distance metric learning. Pattern Recognition 45 (2), pp. 844–862. Cited by: §2.1.
  • [5] S. Chopra, R. Hadsell, and Y. LeCun (2005) Learning a similarity metric discriminatively, with application to face verification. In

    2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)

    ,
    Vol. 1, pp. 539–546. Cited by: §1, §2.5.
  • [6] D. C. Cireşan, A. Giusti, L. M. Gambardella, and J. Schmidhuber (2013) Mitosis detection in breast cancer histology images with deep neural networks. In International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 411–418. Cited by: §2.1.
  • [7] H. D. Couture, J. S. Marron, C. M. Perou, M. A. Troester, and M. Niethammer (2018) Multiple instance learning for heterogeneous images: training a cnn for histopathology. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 254–262. Cited by: §2.1.
  • [8] A. Cruz-Roa, A. Basavanhally, F. González, H. Gilmore, M. Feldman, S. Ganesan, N. Shih, J. Tomaszewski, and A. Madabhushi (2014)

    Automatic detection of invasive ductal carcinoma in whole slide images with convolutional neural networks

    .
    In Medical Imaging 2014: Digital Pathology, Vol. 9041, pp. 904103. Cited by: §2.1.
  • [9] K. Das, S. Conjeti, A. G. Roy, J. Chatterjee, and D. Sheet (2018) Multiple instance learning of deep convolutional neural networks for breast histopathology whole slide classification. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pp. 578–581. Cited by: §2.1.
  • [10] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Cited by: §3.2.
  • [11] H. El Achi, T. Belousova, L. Chen, A. Wahed, I. Wang, Z. Hu, Z. Kanaan, A. Rios, and A. N. Nguyen (2019) Automated diagnosis of lymphoma with digital pathology images using deep learning. Annals of Clinical & Laboratory Science 49 (2), pp. 153–160. Cited by: §1.
  • [12] Y. Gao, W. Liu, S. Arjun, L. Zhu, V. Ratner, T. Kurc, J. Saltz, and A. Tannenbaum (2016) Multi-scale learning based segmentation of glands in digital colonrectal pathology images. In Medical Imaging 2016: Digital Pathology, Vol. 9791, pp. 97910M. Cited by: §2.1.
  • [13] M. Gönen and E. Alpaydin (2008) Localized multiple kernel learning. In

    Proceedings of the 25th international conference on Machine learning

    ,
    pp. 352–359. Cited by: §2.1.
  • [14] M. Gönen and E. Alpaydın (2011) Multiple kernel learning algorithms. The Journal of Machine Learning Research 12, pp. 2211–2268. Cited by: §2.1.
  • [15] A. Goode, B. Gilbert, J. Harkes, D. Jukic, and M. Satyanarayanan (2013) OpenSlide: a vendor-neutral software foundation for digital pathology. Journal of pathology informatics 4. Cited by: §4.1.
  • [16] A. Gordo, J. Almazán, J. Revaud, and D. Larlus (2016) Deep image retrieval: learning global representations for image search. In European conference on computer vision, pp. 241–257. Cited by: §1, §2.1.
  • [17] H. Gouk, B. Pfahringer, and M. Cree (2016) Learning distance metrics for multi-label classification. In Asian Conference on Machine Learning, pp. 318–333. Cited by: §2.1.
  • [18] N. Hashimoto, D. Fukushima, R. Koga, Y. Takagi, K. Ko, K. Kohno, M. Nakaguro, S. Nakamura, H. Hontani, and I. Takeuchi (2020) Multi-scale domain-adversarial multiple-instance cnn for cancer subtype classification with unannotated histopathological images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3852–3861. Cited by: §1, §2.1.
  • [19] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §3.2.
  • [20] N. Hegde, J. D. Hipp, Y. Liu, M. Emmert-Buck, E. Reif, D. Smilkov, M. Terry, C. J. Cai, M. B. Amin, C. H. Mermel, et al. (2019) Similar image search for histopathology: smily. NPJ digital medicine 2 (1), pp. 1–9. Cited by: §1, §2.1, §2.1.
  • [21] L. Hou, D. Samaras, T. M. Kurc, Y. Gao, J. E. Davis, and J. H. Saltz (2016) Patch-based convolutional neural network for whole slide tissue image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2424–2433. Cited by: §2.1.
  • [22] M. Ilse, J. Tomczak, and M. Welling (2018) Attention-based deep multiple instance learning. In International conference on machine learning, pp. 2127–2136. Cited by: §1, §2.1, §2.4.
  • [23] O. Jimenez-del-Toro, S. Otálora, M. Atzori, and H. Müller (2017) Deep multimodal case–based retrieval for large histopathology datasets. In International Workshop on Patch-based Techniques in Medical Imaging, pp. 149–157. Cited by: §2.1, §2.1.
  • [24] R. Jin, S. Wang, and Z. Zhou (2009) Learning a distance metric from multi-instance multi-label data. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 896–902. Cited by: §2.1.
  • [25] S. Kalra, H. Tizhoosh, C. Choi, S. Shah, P. Diamandis, C. J. Campbell, and L. Pantanowitz (2020) Yottixel–an image search engine for large archives of histopathology whole slide images. Medical Image Analysis, pp. 101757. Cited by: §2.1, §2.1.
  • [26] D. Komura, K. Fukuta, K. Tominaga, A. Kawabe, H. Koda, R. Suzuki, H. Konishi, T. Umezaki, T. Harada, and S. Ishikawa (2018) Luigi: large-scale histopathological image retrieval system using deep texture representations. biorxiv, pp. 345785. Cited by: §2.1.
  • [27] C. Mercan, S. Aksoy, E. Mercan, L. G. Shapiro, D. L. Weaver, and J. G. Elmore (2017) Multi-instance multi-label learning for multi-class classification of whole slide breast histopathology images. IEEE transactions on medical imaging 37 (1), pp. 316–325. Cited by: §2.1.
  • [28] H. Miyoshi, K. Sato, Y. Kabeya, S. Yonezawa, H. Nakano, Y. Takeuchi, I. Ozawa, S. Higo, E. Yanagida, K. Yamada, et al. (2020) Deep learning shows the capability of high-level computer-aided diagnosis in malignant lymphoma. Laboratory Investigation, pp. 1–11. Cited by: §1.
  • [29] H. S. Mousavi, V. Monga, G. Rao, and A. U. Rao (2015) Automated discrimination of lower and higher grade gliomas based on histopathological image analysis. Journal of pathology informatics 6. Cited by: §2.1.
  • [30] N. Otsu (1979) A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics 9 (1), pp. 62–66. Cited by: §4.1.
  • [31] T. Peng, M. Boxberg, W. Weichert, N. Navab, and C. Marr (2019) Multi-task learning of a deep k-nearest neighbour network for histopathological image classification and retrieval. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 676–684. Cited by: §1, §2.1, §2.1, §2.1.
  • [32] N. Qian (1999) On the momentum term in gradient descent learning algorithms. Neural networks 12 (1), pp. 145–151. Cited by: §4.1.
  • [33] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet (2007) More efficiency in multiple kernel learning. In Proceedings of the 24th international conference on Machine learning, pp. 775–782. Cited by: §2.1.
  • [34] R. Schaer, S. Otálora, O. Jimenez-del-Toro, M. Atzori, and H. Müller (2019) Deep learning-based retrieval system for gigapixel histopathology cases and the open access literature. Journal of pathology informatics 10. Cited by: §2.1.
  • [35] C. Shen, J. Kim, and L. Wang (2010) Scalable large-margin mahalanobis distance metric learning. IEEE transactions on neural networks 21 (9), pp. 1524–1530. Cited by: §2.1.
  • [36] X. Shi, M. Sapkota, F. Xing, F. Liu, L. Cui, and L. Yang (2018) Pairwise based deep ranking hashing for histopathology image classification and retrieval. Pattern Recognition 81, pp. 14–22. Cited by: §1, §2.1.
  • [37] S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schölkopf (2006) Large scale multiple kernel learning. The Journal of Machine Learning Research 7, pp. 1531–1565. Cited by: §2.1.
  • [38] P. Sudharshan, C. Petitjean, F. Spanhol, L. E. Oliveira, L. Heutte, and P. Honeine (2019) Multiple instance learning for histopathological breast cancer image classification. Expert Systems with Applications 117, pp. 103–111. Cited by: §2.1.
  • [39] S. H. Swerdlow, E. Campo, N. L. Harris, E. S. Jaffe, S. A. Pileri, H. Stein, J. Thiele, J. W. Vardiman, et al. (2017) WHO classification of tumours of haematopoietic and lymphoid tissues. rev. 4th ed edition, World Health Organization classification of tumours, International Agency for Research on Cancer. Cited by: §1.
  • [40] K. Tanizaki, N. Hashimoto, Y. Inatsu, H. Hontani, and I. Takeuchi (2020) Computing valid p-values for image segmentation by selective inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9553–9562. Cited by: §2.1.
  • [41] H. Tokunaga, Y. Teramoto, A. Yoshizawa, and R. Bise (2019) Adaptive weighting multi-field-of-view cnn for semantic segmentation in pathology. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12597–12606. Cited by: §2.1.
  • [42] J. Wang, Y. Song, T. Leung, C. Rosenberg, J. Wang, J. Philbin, B. Chen, and Y. Wu (2014) Learning fine-grained image similarity with deep ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–1393. Cited by: §1, §2.1.
  • [43] S. Wang, Y. Zhu, L. Yu, H. Chen, H. Lin, X. Wan, X. Fan, and P. Heng (2019) RMDL: recalibrated multi-instance deep learning for whole slide gastric image classification. Medical image analysis 58, pp. 101549. Cited by: §2.1.
  • [44] K. Q. Weinberger and L. K. Saul (2009) Distance metric learning for large margin nearest neighbor classification.. Journal of machine learning research 10 (2). Cited by: §2.1.
  • [45] Y. Xu, Z. Jia, Y. Ai, F. Zhang, M. Lai, I. Eric, and C. Chang (2015) Deep convolutional activation features for large scale brain tumor histopathology image classification and segmentation. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 947–951. Cited by: §2.1.
  • [46] P. Yang, Y. Zhai, L. Li, H. Lv, J. Wang, C. Zhu, and R. Jiang (2020) A deep metric learning approach for histopathological image retrieval. Methods 179, pp. 14–25. Cited by: §1, §2.1.
  • [47] L. Zheng, A. W. Wetzel, J. Gilbertson, and M. J. Becich (2003) Design and analysis of a content-based pathology image retrieval system. IEEE transactions on information technology in biomedicine 7 (4), pp. 249–255. Cited by: §2.1.
  • [48] Y. Zheng, B. Jiang, J. Shi, H. Zhang, and F. Xie (2019) Encoding histopathological wsis using gnn for scalable diagnostically relevant regions retrieval. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 550–558. Cited by: §2.1.
  • [49] Y. Zheng, Z. Jiang, H. Zhang, F. Xie, Y. Ma, H. Shi, and Y. Zhao (2018) Histopathological whole slide image analysis using context-based cbir. IEEE transactions on medical imaging 37 (7), pp. 1641–1652. Cited by: §2.1.