Contrastive Representation Learning for Rapid Intraoperative Diagnosis of Skull Base Tumors Imaged Using Stimulated Raman Histology

08/08/2021 ∙ by Cheng Jiang, et al. ∙ University of Michigan 11

Background: Accurate diagnosis of skull base tumors is essential for providing personalized surgical treatment strategies. Intraoperative diagnosis can be challenging due to tumor diversity and lack of intraoperative pathology resources. Objective: To develop an independent and parallel intraoperative pathology workflow that can provide rapid and accurate skull base tumor diagnoses using label-free optical imaging and artificial intelligence (AI). Method: We used a fiber laser-based, label-free, non-consumptive, high-resolution microscopy method (< 60 sec per 1 × 1 mm^2), called stimulated Raman histology (SRH), to image a consecutive, multicenter cohort of skull base tumor patients. SRH images were then used to train a convolutional neural network (CNN) model using three representation learning strategies: cross-entropy, self-supervised contrastive learning, and supervised contrastive learning. Our trained CNN models were tested on a held-out, multicenter SRH dataset. Results: SRH was able to image the diagnostic features of both benign and malignant skull base tumors. Of the three representation learning strategies, supervised contrastive learning most effectively learned the distinctive and diagnostic SRH image features for each of the skull base tumor types. In our multicenter testing set, cross-entropy achieved an overall diagnostic accuracy of 91.5 contrastive learning 96.6 margins and detect regions of microscopic tumor infiltration in whole-slide SRH images. Conclusion: SRH with AI models trained using contrastive representation learning can provide rapid and accurate intraoperative diagnosis of skull base tumors.



There are no comments yet.


page 1

page 2

page 3

page 4

page 6

page 7

page 8

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Skull base tumors are a diverse group of lesions that span the full benign-malignant spectrum and require personalized surgical treatment strategies. Determining optimal surgical goals requires accurate tumor diagnosis at the time of surgery. While some skull base tumors have classic clinical presentation and radiographic features, diagnostic challenges in skull base neurosurgery arise from rare lesions that resemble common tumors (e.g., primary central nervous system lymphoma in the cerebellopontine angle [ierokomos1985primary]) and common tumors occurring in uncommon locations (ectopic suprasellar pituitary adenomas [zhu2020suprasellar]). Misdiagnosis can result in unnecessary subtotal resection or excessive morbidity due to overly aggressive surgical management [hollon2016surgical].

Our current standard-of-care for intraoperative diagnosis is based on hematoxylin and eosin (H&E) staining of processed surgical specimens and requires interpretation by a board-certified pathologist. Because this process is labor and resource intensive, many medical centers do not have intraoperative pathology consultation services. Moreover, the pathology workforce is contracting with an overall reduction of 18% between 2007 and 2017 and projected to worsen [metter2019trends] [robboy2013pathologist]. This reduction in qualified pathologists for intraoperative consultation can result in suboptimal surgical management due to the ongoing limitations of our current diagnostic workflow. Therefore, skull base neurosurgery, and neurosurgical oncology in general, warrant an alternative and innovative solution.

Stimulated Raman histology (SRH) is a rapid, label-free, high-resolution, optical imaging method used for intraoperative evaluation of fresh, unprocessed tissue specimens [freudiger2008label] [orringer2017rapid]. We have previously shown in a multicenter, prospective, clinical trial that SRH combined with artificial intelligence (AI) models can achieve human-level performance for the intraoperative diagnosis of the most common brain tumor subtypes and recurrent primary brain tumors [hollon2020near] [hollon2021rapid]

. Leveraging the latest advances in machine learning and computer vision, our models detect cytologic and histomorphologic features in brain tumors to provide near-real-time diagnoses (

2 mins) without the need for tissue processing or human interpretation. The combination of SRH and AI represents an alternative and parallel workflow to conventional H&E histology.

Here, we aim to develop an integrated computer vision system for rapid intraoperative diagnosis of skull base tumors using SRH and AI. To improve upon our previous methods, we applied a new AI training technique, contrastive representation learning, which boosted our model’s ability to robustly detect diagnostic features in SRH images, beating the previous state-of-the-art results. We show that our model can effectively segment tumor-normal margins and detect regions of microscopic tumor infiltration in grossly normal surgical specimens.

2 Methods

2.1 Study Design

The objectives of the study were to: (1) determine if SRH can capture the diagnostic features of skull base tumors and (2) develop an AI-based computer vision system that combines clinical SRH and deep neural networks to achieve human-level performance on the intraoperative classification of skull base tumors. Patient enrollment for intraoperative SRH imaging began June 1, 2015. Inclusion criteria for SRH imaging included: (1) male or female; (2) subjects undergoing central nervous system tumor resection, including skull base surgery, at Michigan Medicine and New York University (NYU); (3) subject or durable power of attorney able to give informed consent; and (4) subjects in whom there was additional specimen beyond what was needed for routine clinical diagnosis. Institutional Review Board approval was obtained prior to initiating patient imaging (HUM00083059). We then trained and validated a benchmarked convolutional neural network (CNN) architecture (ResNets [he2016deep]) on an image classification task to provide rapid and automated evaluation of fresh surgical specimens imaged with SRH. CNN performance was then tested using a held-out, multicenter (UM and NYU) prospective testing SRH dataset.

2.2 Stimulated Raman Histology

All images were obtained using a clinical stimulated Raman scattering (SRS) microscope [hollon2020near][freudiger2014stimulated]. Fresh, unprocessed, surgical specimens were excited with a dual-wavelength fiber laser with a wavelength pump beam fixed at 790 nm and a Stokes beam tunable from 1,015 to 1,050 nm. This configuration allows for imaging at Raman shifts in the range of 2,800 - 3,130 cm-1. Images were acquired via beam scanning with a spatial sampling of 450 nm pixel-1, 1,000 pixels per strip, and an imaging speed of 0.4 Mpixel(s) per Raman shift. The NIO Laser Imaging System (Invenio Imaging, Inc., Santa Clara, CA), a clinical fiber laser–based SRS microscope, was used to acquire all images in the testing set. For SRH, samples were imaged sequentially at the two Raman shifts: 2,850 cm-1 and 2,950 cm-1. Lipid-rich regions (for example, myelinated white matter) demonstrate high SRS signal at 2,845 cm-1 due to CH2 symmetric stretching in fatty acids. Cellular regions produce high 2,930 cm-1 intensity and large signal 2,930 to signal 2,845 ratios to high protein and nucleic acid content. A virtual H&E color scheme is applied to transform the raw SRS images into SRH images for clinical use and pathologic review. The NIO Imaging System (Invenio Imaging, Inc.) is delivered ready to use for image acquisition. SRH images can be reviewed locally using the integrated high-definition monitor, remotely via the health system’s picture archiving and communication system, or via a cloud-based image viewer that allows images to be reviewed anywhere with a high-speed internet connection of less than 30 s.

2.3 Image Dataset and Data Preprocessing

SRH imaging was completed using two imaging systems: a prototype clinical SRH microscope[orringer2017rapid]

and the NIO Imaging System. All collected clinical specimens were imaged in the operating room using our SRH imagers. In addition to the clinical specimens, we used cadaveric specimens of normal tissue (brain, dura, and pituitary gland) to improve our classifiers ability to detect normal tissue and avoid false positive errors. Specimens that were grossly inadequate due to hemorrhage, excessive coagulation, or necrosis were excluded from analysis. For image preprocessing, the 2,845 cm-1 image was subtracted from the 2,930 cm

-1 image, and the resultant image was concatenated to generate a three-channel SRH image (2,930 cm-1 minus 2,845 cm-1, red; 2,845 cm-1, green; and 2,930 cm-1, blue). A 300 × 300-pixel2 non-overlapping sliding window algorithm was used to generate image patches. Our laboratory has previously trained a neural network model that is able to filter images into three classes for automated patch-level annotation: normal brain, tumor tissue, and nondiagnostic tissue [hollon2020near][hollon2021rapid]. Normal dura was included in the nondiagnostic class because it lacks cytologic features (Figure 1).

2.4 Model Training

To effectively train the model, only tumor classes with greater than 15 patients were included for model training: pituitary adenomas, meningiomas, schwannomas, primary central nervous system lymphoma, and metastases. Normal classes included normal brain (grey and white matter) and normal pituitary gland (anterior and posterior gland). Six hundred patients were included in the training set. We implemented the ResNet50 CNN architecture with 25.6 million trainable parameters for our SRH feature extractor[he2016deep]

. Three loss functions were used for model training: supervised categorical cross-entropy, self-supervised contrastive

[chen2020simple], and supervised contrastive[khosla2020supervised]. The general contrastive loss function is:


where is the vector representation of image after a feedforward pass through the SRH feature extractor, is the representation of positive examples for image , and is the set of negative examples for image (Figure 1

B). Positive examples can be transformations of the same image (self-supervised), or different images sampled from the same class (supervised). The feature extraction model produces a 2048-dimension feature vector for each input image, and each feature vector is further projected down to 128 dimensions before the similarity metric is computed. The cosine similarity metric was used in our contrastive loss:


Contrastive loss functions have some theoretical advantages over cross-entropy (i.e., robustness to label noise), and we hypothesize that contrastive representation learning is ideally suited for patch-based classification. The contrastive learning models are optimized using stochastic gradient descent, and each model was trained using a batch size of 176 for 4 days on 8 Nvidia (Santa Clara, CA) GeForce RTX 2080 Ti GPUs. After the feature extraction model training was completed, these features were classified using a linear classifier trained using cross-entropy loss (see Figure

1C). The linear classification layers were trained using Adam optimizer, and each of them was trained using a batch size of 64 for 24 hours on 2 Nvidia GeForce RTX 2080 Ti GPUs. We compare our approaches to a conventional model trained using cross-entropy and a batch size of 64 for 24 hours on 2 Nvidia GeForce RTX 2080 Ti GPUs.

2.5 Model Testing

We randomly held out 20% of our data as a testing dataset and used the other 80% for training. The test set consists of 118 patients and 489 whole slides. Similar to our training data preparation,

-pixel patches were generated from a whole-slide image and each patch underwent a feedforward pass through our trained models to compute a probability distribution over the output classes. To compute the whole-slide–level or patient-level accuracy, we summed the patch-level probability distributions for each whole slide or patient, respectively. The aggregated probabilities were then renormalized to compute the final slide- or patient-level class probabilities. Our approach is a “soft” aggregation of the classification, and it is superior to “hard” aggregation of the patches, such as a simple majority voting procedure, because it takes into account the full probability distribution for each patch


2.6 SRH Semantic Segmentation of Skull Base Tumors

We have previously developed a method for segmenting SRH images using patch-level predictions [hollon2020near] [hollon2021rapid]. This technique integrates a local neighborhood of overlapping patch prediction to generate a high-resolution probability heatmap. In previous work, we implemented a three-channel (RGB) probability heatmap which included spatial information for tumor, normal brain, and nondiagnostic predictions. Here, we used a novel technique whereby we generate a two-channel image with the predicted tumor class (e.g., pituitary adenoma or craniopharyngioma) as the first channel (i.e., red) and the most probable nontumor class (e.g., normal pituitary, normal brain, nondiagnostic) as the second channel (i.e., blue). This method has an advantage in the setting of skull base tumors by allowing the nontumor class to vary depending on the surgical specimen. For example, it will automatically produce a meningioma-normal dura margin heatmap based on the predicted meningioma diagnosis.

Figure 2: SRH of skull base tumors shows cytologic and histoarchitectural features. Diagnostic features of normal skull base parenchyma and skull base tumors are imaged effectively using SRH. (A)

Normal grey matter shows pyramidal cell bodies of cortical neurons. Lipid-rich myelinated axons have high 2845 cm-1 signal and appear white in our virtual H&E color scheme.

(B) Normal anterior pituitary gland has acinar histoarchitecture with intact reticulin network. (C) Skull base dura is mainly acellular, fibrous tissue with collagenic and elastic fibers. (D) Schwannoma (vestibular schwannoma shown) shows classic spindle cell cytology combined with Antoni A and B histoarchitectural patterns. (E) Pituitary adenomas show monotonous cytology with loss of acinar structure. (F) Meningiomas have large nuclei and whorl patterns throughout the specimen. (G) Adamantinomatous craniopharyngiomas are complex specimens and uniquely show wet keratin. (H) Clival chordomas have bubbly, physaliferous cells. (I) Chondrosarcomas show chondrocytes embedded in a dense cartilaginous matrix.

3 Results

3.1 SRH Reveals Diagnostic Features of Skull Base Tumors

We first assessed the ability of SRH to effectively capture the diagnostic features of normal skull base parenchyma and skull base tumors. Figure 1A shows the general workflow for obtaining SRH images. Figure 2 shows the SRH images of normal brain, anterior pituitary gland, and skull base dura (Figure 2A-2C). Classic histologic features are seen, including neuronal cell bodies in grey matter, acinar histoarchitecture in pituitary gland, and dense collagen extracellular matrix in dura. Meningiomas, pituitary adenomas, and schwannomas are the most common skull base tumors encountered (Figure 2D-2F). SRH captures spindle cell cytology and Antoni histoarchitectural patterns in schwannomas, monotonous hypercellularity in pituitary adenomas, and meningioma whorls. Less common and malignant tumors are shown in Figure 2G-2I. Wet keratin is well visualized in adamantinomatous craniopharyngiomas. Bubble, physaliferous cells are abundant in clival chordomas. Chondrocytes embedded in a dense cartilaginous matrix are seen in skull base chondrosarcomas.

Patch Slide Patient
Acc Top 2 MCA Acc Top 2 MCA Acc Top 2 MCA
CE 0.830 0.930 0.822 0.871 0.951 0.899 0.915 0.958 0.931
SSL + Linear 0.599 0.781 0.567 0.769 0.894 0.772 0.831 0.924 0.824
SupCon + Linear 0.866 0.953 0.864 0.914 0.969 0.920 0.966 0.983 0.934
Table 1: Model performances on held-out, multicenter SRH testing set. Acc, accuracy; Top 2, correct class was predicted first or second most probable; MCA, mean class accuracy; CE, cross-entropy; SSL, self-supervised contrastive learning; and SupCon, supervised contrastive learning.
Figure 3: Automated intraoperative classification of skull base tumors. Confusion matrices for each of the three training strategies on our held-out, multicenter, testing set. Supervised cross-entropy achieved an overall diagnostic accuracy of 91.5%. The majority of errors occurred in the metastatic tumors class, with a class accuracy of 60.0%. Self-supervised contrastive learning (learning without class labels) performed expectedly worse, but still reached an accuracy of greater than 83%. Our top-performing model was trained using supervised contrastive learning, with an overall accuracy of 96.6% and two errors in the metastasis class.

3.2 Automated Classification of Skull Base Tumors Using SRH

After determining that SRH can effectively capture the diagnostic features in SRH images, we then trained our CNN using the three representation learning methods (Figure 1B). All models were trained for 4 days. We then tested the models on our held-out multicenter dataset, with the results shown in Table . We evaluated our model at the patch, slide, and patient levels using overall top-1 accuracy, top-2 accuracy, and mean class accuracy. Using these metrics, the model trained using supervised contrastive representation learning had the best overall performance, with top scores in all three metrics. Our supervised contrastive model achieves a patient-level diagnostic accuracy of 96.6% (114/118 patients) and a mean class accuracy of 93.4%. These results outperformed our cross-entropy model and significantly improved upon our previous results [hollon2020near]. Figure 3 summarizes the model performance for each class. An important finding was that the metastatic tumor class was a major source of diagnostic errors for the cross-entropy model. We believe that this represents the inability of cross-entropy to effectively represent classes with highly diverse image features (e.g., melanoma versus adenocarcinoma versus squamous cell carcinoma, etc.).

Figure 4: Contrastive representation learning t-distributed stochastic neighbor embedding (tSNE) of classes. This is a tSNE plot of SRH patch representations from convolutional neural networks trained using (A) self-supervised contrastive, (B) cross-entropy, and (C) supervised contrastive loss functions. Each point represents a single SRH patch randomly sampled from our testing set. Consistent with our diagnostic accuracy results, discrete class clusters are most discernible in our supervised contrastive representations, including the metastatic tumor class. Note that the tSNE algorithm does not depend on class labels, and the color coding is used to demonstrate that data clusters correspond to tumor classes.

3.3 Visualizing Learned SRH Representations

Given these test results, we aimed to qualitatively evaluate how effectively our models were able to represent our SRH images. We used a data visualization technique called t-distributed stochastic neighbor embedding (tSNE), which projects high-dimensional data onto a two-dimensional plane by preserving the local patterns in the data. Data points with similar representations are located in close proximity, forming discrete clusters. Figure

4 shows the tSNE plots for the three models. Compared to cross-entropy or self-supervised contrastive learning, the supervised contrastive model shows the most well-formed clusters that match tumor diagnoses. The most salient improvement is how much more effectively the metastatic class is clustered; contrastive representation learning explicitly enforces that the model learns image features that are common to each tumor class, regardless of how diverse the underlying pathology may be (e.g., melanoma versus adenocarcinoma).

Figure 5: Automated detection of microscopic tumor infiltration. Whole-slide SRH image of grossly normal dura sampled after resection of a tuberculum sellae meningioma. Microscopic tumor infiltration was detected by our training model, as shown by the predicted meningioma heatmap over the entire whole-slide image. The majority of the specimen is normal dura with the exception of several small regions of clear meningioma infiltration. Our predicted heatmaps can be converted into a colored transparency overlay to be used when reviewing the SRH images intraoperatively. Heatmaps provide spatial information and serve as an additional level of decision support for evaluating intraoperative specimens.
Figure 6: SRH semantic segmentation identifies tumor-normal margins and diagnostic regions. (A) SRH image of a meningioma-normal dura margin. Corresponding prediction heatmaps show excellent delineation of tumor regions adjacent to normal tissue. (B) SRH image of pituitary adenoma–normal pituitary gland margin. Clear region of monotonous, hypercellular pituitary adenoma adjacent to normal acinar structure of the anterior pituitary gland. (C) SRH image of a papillary craniopharyngioma. Even though our model was not trained on craniopharyngiomas, it is able to detect regions of tumor and discriminate them from nondiagnostic, acellular regions. Detection of diagnostic regions in SRH images can aid in intraoperative interpretation of large and complex tumor specimens.

3.4 Semantic Segmentation of SRH Images Detects Regions of Tumor Infiltration

Using a patch-based classification method allows for a computationally efficient whole-slide SRH semantic segmentation method. SRH segmentation allows for improved image interpretation by surgeons and pathologists by providing spatial information along with the predicted diagnosis. Regions of microscopic tumor infiltration can be automatically detected and highlighted in SRH images (Figure 5). Additionally, tumor-normal margins and diagnostic tumor regions can be identified using the patch-level predictions [hollon2020near] [hollon2021rapid], as shown in Figure 6.

4 Discussion

Here, we show that the combination of SRH and AI trained using contrastive representation learning can provide an innovative pathway for intraoperative skull base tumor diagnosis. We were able to achieve a +5.1% boost in diagnostic classification accuracy using contrastive representation learning compared to our previous AI training methods using cross-entropy. Our model was able to effectively identify regions of microscopy tumor infiltration and tumor-normal margins in whole-slide SRH images.

Over the previous decade, the applications of AI in clinical medicine and neurosurgery have grown tremendously. Human-level diagnostic accuracy for image classification tasks has been achieved in multiple medical specialties, including ophthalmology [gulshan2016development], radiology [titano2018automated], dermatology [esteva2017dermatologist], and pathology [coudray2018classification] [lu2021ai]. AI for intraoperative diagnostic decision support has been combined with mass spectrometry [calligaris2015maldi] [santagata2014intraoperative], optical coherence tomography [juarez2019ai], infrared spectroscopy [hollon2018shedding] [uckermann2018optical], and Raman spectroscopy [jermyn2015intraoperative] [kast2015identification]. We believe that the combination of advanced biomedical optical imaging and the latest discoveries in AI have the potential to provide accurate and real-time decision support for surgeons and pathologists alike. We believe that no surgeon should be uncertain of their patient’s tumor diagnosis at the time of surgery and that postoperative diagnostic surprises are unacceptable in modern neurosurgery.

One of the limitations of our study is that it contains a limited subset of skull base tumors. We aimed to include the most common skull base tumors and the most common “look-a-like” lesions found in the skull base. We aimed to determine if, given a sufficient amount of training data, we could develop an alternative diagnostic system using SRH and AI. As additional SRH training data become available for rare tumors, future studies will include additional skull base tumor diagnoses. Our proposed contrastive representation learning method is able to accommodate additional diagnostic classes without changing the training methodology described here.

Future directions include moving beyond histopathologic diagnosis towards phenotypic and molecular characterization of brain tumors. Tumor grade, proliferation indices, and molecular diagnostic mutations are much stronger determinants of long-term tumor behavior than coarse tissue diagnoses. Additionally, access to fresh tumor specimens provides a unique opportunity to develop optical imaging-based prognostic biomarkers that have the potential to predict response to treatment (e.g., immunotherapy) and long-term clinical outcomes better than standard diagnostic methods alone.