Deep Multi-Modal Classification of Intraductal Papillary Mucinous Neoplasms (IPMN) with Canonical Correlation Analysis

by   Sarfaraz Hussein, et al.

Pancreatic cancer has the poorest prognosis among all cancer types. Intraductal Papillary Mucinous Neoplasms (IPMNs) are radiographically identifiable precursors to pancreatic cancer; hence, early detection and precise risk assessment of IPMN are vital. In this work, we propose a Convolutional Neural Network (CNN) based computer aided diagnosis (CAD) system to perform IPMN diagnosis and risk assessment by utilizing multi-modal MRI. In our proposed approach, we use minimum and maximum intensity projections to ease the annotation variations among different slices and type of MRIs. Then, we present a CNN to obtain deep feature representation corresponding to each MRI modality (T1 and T2). As the final step, we employ canonical correlation analysis (CCA) to perform a fusion operation at the feature level, leading to discriminative canonical correlation features. Extracted features are used for classification. Our results indicate significant improvements over other potential approaches to solve this important problem. The proposed approach doesn't require explicit sample balancing in cases of imbalance between positive and negative examples. To the best of our knowledge, our study is the first to automatically diagnose IPMN using deep learning and multi-modal MRI.



There are no comments yet.


page 2

page 4


Image and Encoded Text Fusion for Multi-Modal Classification

Multi-modal approaches employ data from multiple input streams such as t...

A Multi-modal Fusion Framework Based on Multi-task Correlation Learning for Cancer Prognosis Prediction

Morphological attributes from histopathological images and molecular pro...

DeepStroke: An Efficient Stroke Screening Framework for Emergency Rooms with Multimodal Adversarial Deep Learning

In an emergency room (ER) setting, the diagnosis of stroke is a common c...

Multi-modal Sentiment Analysis using Deep Canonical Correlation Analysis

This paper learns multi-modal embeddings from text, audio, and video vie...

A Discriminative Vectorial Framework for Multi-modal Feature Representation

Due to the rapid advancements of sensory and computing technology, multi...

Multi-modal Fusion for Diabetes Mellitus and Impaired Glucose Regulation Detection

Effective and accurate diagnosis of Diabetes Mellitus (DM), as well as i...

Dysplasia grading of colorectal polyps through CNN analysis of WSI

Colorectal cancer is a leading cause of cancer death for both men and wo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Cancer is one of the main causes of death in the world with a mortality rate of 171.2 per 100,000 people per year (based on 2008-2012 stats) [1]. Among all cancers, pancreatic cancer has the poorest prognosis with a 5-year survival rate of just 7% in the United States [1]. To address the problem of automatic diagnosis of pancreatic cancer, we propose a new CAD framework for Intraductal Papillary Mucinous Neoplasms (IPMN). IPMN is a mucin-producing neoplasm found in the main and branch pancreatic ducts. They are radiographically identifiable precursors to pancreatic cancer [2, 3]. If left untreated, they can progress into invasive cancer. For instance, around one-third of resected IPMNs are found to be associated with invasive carcinoma [4]. In 2012, Tanaka et al. [5] published the International consensus guidelines for the preoperative management of IPMN using radiographic and clinical criteria. These guidelines can be used in the development of CAD approaches for the separation of IPMNs from normal pancreas. The CAD approaches can yield to identify important imaging bio-markers that may assist radiologists for improved diagnosis, staging, and treatment planning.

In the literature, there are a limited number of studies addressing the automatic diagnosis of IPMN using radiology images. Hanania et al. [6] studied the contribution of numerous low-level imaging features such as texture, intensity, and shape to perform low and high-grade IPMN classification. In the approach by Gazit et al. [7]

texture and component enhancing features were extracted from the segmented cysts. The process is then followed by a feature selection and classification framework. Both of these approaches 

[6, 7], however, are evaluated on CT images and require the segmentation of cysts or pancreas. In contrast to these methods, our approach doesn’t require prior segmentation of cysts or pancreas and is evaluated on multi-modal MRI scans rather than CT. In this work, we hypothesize and evaluate the influence of complementary information in T1-weighted and T2-weighted scans that can be utilized to perform improved diagnosis of IPMN.

Our Contributions:

  • [leftmargin=*]

  • To the best of our knowledge, this is the first study to use deep learning for the classification of IPMN.

  • We employ multiple imaging modalities of MRI (T1 and T2) and fuse the feature representation using Canonical Correlation Analysis (CCA) to obtain better discrimination between normal and subjects with IPMN. We also perform further stratification of IPMN in low-grade and high-grade categories.

  • Extensive experimental evaluations are performed on a dataset comprising 139 subjects, the largest study of IPMN to date.

Figure 1:

An overview of the proposed method. First, the minimum and maximum intensity projections are computed corresponding to T1 and T2 scans respectively. The intensity projections are then fed into a pre-trained Convolutional Neural Network (CNN) to obtain feature representation. Canonical Correlation Analysis (CCA) based feature fusion is performed in order to obtain discriminative and transformed feature representation. Finally, an SVM based classifier is employed to obtain the final label (normal or IPMN).

2 Materials

We evaluated our proposed approach for the classification of IPMN on a dataset comprising post-contrast volumetric T1 and T2 MRI scans from 139 subjects. The scans were labeled as normal or IPMN using pathology report obtained after surgery. Out of 139 scans, 108 were from subjects diagnosed with IPMN, whereas the rest of 31 subjects were normal. The in-plane spacing (xy-plane) of T1-weighted scans was ranging from 0.66 mm to 1.48 mm and that of T2-weighted scans from 0.47 mm to 1.41 mm. For pre-processing, we first apply N4 bias field correction [8] to each scan in order to minimize intensity inhomogeneity. Next, we use a curvature anisotropic image filter to smooth images while preserving edge information. For each image, a single slice which has a significant portion of the pancreas is annotated to be normal or IPMN.

3 Methods

3.1 CNN for Multi-modal Feature Representation:

Problem Formulation:
Our proposed approach consists of inputs from two different MRI image modalities T1 and T2. Let be the T1 scan whereas the corresponding T2 scan is represented as , with and number of slices.

Consider be the slice with pancreas from T1 scan and be the slice from T2. Predicting the label from a single slice, however, may yield to hypersensitivity in annotation labels as well as miss important contextual information from the other slices. In order to address these issues, we sample consecutive slices before and after and . Since the input to the deep network is a 2D image, we use Maximum and Minimum Intensity Projections to combine information across various slices into a single slice. We employ minimum intensity projection for T1 scans since IPMN and pancreatic cysts are hypo-intensity regions in T1 scans. In contrast, we use maximum intensity projection for T2 scans, because IPMN and pancreatic cysts correspond to hyper-intensity regions in these scans. The intensity projections corresponding to T1 and T2 scans can be represented as:


where and consists of slices around and respectively. Moreover, and represent the intensity projections from T1 and T2 scans, respectively. The overview of the proposed approach is shown in Figure 1

Network Architecture:

In order to obtain deep feature representation for our proposed IPMN classification approach, we use (fast) CNN-F architecture trained on ImageNet 

[9]. The architecture consists of 5 convolutional and 3 fully-connected layers. The input 2D image is resized to 224

224. The first convolutional layer contains 64 filters with stride 4 and there are 256 filters with stride 1 in the other 4 convolutional layers. Our input to the network are the 2D intensity projections


, whereas the features are extracted from the second fully connected layer without applying non-linearities such as ReLU (Rectified Linear Units). The features are

normalized to obtain the final representation.

3.2 Feature Fusion with Canonical Correlation Analysis:

The next step is to combine information from the two imaging modalities so as to improve the classification performance. As these two imaging modalities (T1 and T2) have complementary information, the fusion of features from these modalities can help improve IPMN diagnosis. Assume that and comprise the deep features from the intensity projections of training images from T1 and T2 scans respectively. Each sample has a corresponding binary label given by , where . Consider and represent the within sets covariance matrices of and respectively. Additionally, the between set covariance matrix is referred as such that . The covariance matrix can therefore be written as:


In this regard, CCA is employed to find the linear combinations, and such that the pair-wise correlation between the two sets is maximized [10]. CCA is a method that can help explore the relationship between the two multi-variate variables. The pairwise correlation between the two sets can be modeled as:


where , and . The covariances are then used to find the transformation matrices and using the following equations:


In the above equation, and

are the eigenvectors and

is the eigenvalue diagonal matrix.

Lastly, the final feature matrix can be represented as the sum of the transformed feature matrices from the two modalities:


The learned transformation is also applied to the features from test images in order to obtain the final transformed testing features.

4 Experiments and Results

In order to account for the mis-alignment between T1-weighted and T2-weighted scans, we performed Multi-resolution image registration using image pyramids [11]. The registration results were examined and images with mis-registration were removed from the final evaluation set. Our final evaluation set comprised 139 scans from each modality and we performed 10 fold cross validation over the dataset. The minimum (maximum) intensity projection images from T1 (T2) scans were fed into the deep CNN-F network and feature representation from each of these images was used to obtain the final CCA based discriminative representation (Eq. 5

). We then employed Support Vector Machine (SVM) classifier to obtain the final classification labels i.e. normal vs IPMN.

Methods Accuracy Sensitivity Specificity
(SEM %) (SEM %) (SEM %)
T1-weighted 84.23 (1.10) 89.16 (0.88) 55.00 (4.16)
T2-weighted 61.04 (1.35) 59.59 (2.25) 57.67 (2.95)
Concat. of T1 & T2 82.09 (1.01) 88.49 (0.90) 49.33 (3.48)
Feature Fusion (Proposed) 82.80 (1.17) 83.55 (1.13) 81.67 (2.53)
Table 1:

Results for accuracy, sensitivity and specificity of the proposed multi-modal fusion approach along with standard error of the mean (SEM) in comparison with single modality and feature concatenation based approaches.

We compared our proposed multi-modal feature fusion based approach with single modality and feature concatenation based approaches. Since there exists an imbalance between the number of positive and negative examples, we performed Adaptive Synthetic Sampling (ADASYN) [12] to generate synthetic samples. This sampling enabled to generate synthetic feature examples from the minority class (normal).

Methods Accuracy % (SEM %)
T1-weighted 58.30 (1.52)
T2-weighted 45.93 (1.60)
Concat. of T1 and T2 56.81 (1.51)
Feature Fusion (Proposed) 64.67 (0.83)
Table 2: Classification accuracy along with standard error of the mean (SEM) for three class classification (normal, low grade IPMN and high grade IPMN) of proposed approach in comparison with other approaches.

Table 1 shows the results of our proposed approach in varying conditions. It can be observed that the performance of our proposed approach significantly outperforms the single modality and feature concatenation based approaches. The T1 based classification yielded the highest sensitivity, but with very low specificity. For IPMN classification, low specificity can be a serious problem as that can lead to unwarranted surgery and resection. In this regard, our proposed approach reports more than 30% improvement in specificity in comparison with the feature concatenation based approach.

It is important to note that since our proposed approach is based on the correlation and covariance in the data, it doesn’t require explicit sample balancing using ADASYN. Moreover, for experiments, we also tried features from various layers of CNN-F as well as features from deeper residual networks such as ResNet-50, ResNet-101, and ResNet-152 [13]. The best classification results, however, were obtained using the second fully-connected layer of CNN-F. Figure 2 shows the qualitative results of our proposed approach with intensity projections from T1 and T2 scans. The cases shown in green are the correctly classified as IPMN whereas those shown in red are incorrectly classified as normal.

Figure 2: Qualitative results of our proposed approach, showing minimum and maximum intensity projected images for T1 and T2 scans on left and right respectively. Each row represents a different case where the images correctly classified as IPMN are shown in green, whereas the misclassification of IPMN as normal are shown in red. (Edit: The misclassified cases now correctly correspond to T1 and T2 projections)

Low and High grade IPMN classification:
We also performed 3-class classification using our proposed approach. Out of 108 IPMN subjects, 48 had low-grade IPMN whereas the remaining 60 had high-grade IPMN or invasive carcinoma. Using the features obtained from the CCA based fusion, we train a 3-class SVM classifier with classes normal, low-grade IPMN and high-grade IPMN. These diagnostic labels were obtained using the pathology report after surgery. Table 2 shows the performance of our proposed approach for normal, low-grade and high-grade IPMN classification. The proposed CCA based classification approach outperforms single modality and feature concatenation based approaches. The CCA based approach reports around 8% improvement in comparison to the feature concatenation based approach.

5 Discussion and Conclusion

Pancreatic cancer is projected to become the second leading cause of cancer-related deaths before 2030 [14]. IPMNs are the radiographically identifiable precursor to pancreatic cancer. In this paper, we proposed a multi-modal feature fusion framework to perform the classification of IPMN. Inspired by the clinical need to identify subjects with IPMN, our proposed approach can help radiologists in diagnosing invasive pancreatic carcinoma. In contrast to previous studies, this is the first approach to use deep CNN feature representation for IPMN diagnosis. Moreover, we empirically show the importance of feature level fusion of two different MRI imaging modalities i.e. T1 and T2 scans.

Another advantage of our proposed approach is that it doesn’t require manual segmentation of pancreas or cysts as in other approaches. We only need to identify a single slice where pancreatic tissues can be prominently observed. Additionally, by using the intensity projections across a consecutive set of slices, we can develop robustness to the manual selection of a single slice. As the CCA is used to learn the transformation, its use also circumvents the need to have explicit data balancing in the cases of imbalance between positive and negative examples.

As an extension to this study, our future work will involve joint detection and diagnosis of IPMN in MRI scans. As the number of subjects undergoing screening for IPMN increases, we can get sufficient data to perform an end-to-end training or fine-tuning of a 3D convolutional neural network. The use of Generative Adversarial Networks (GANs) [15] can assist in data augmentation by generating realistic examples to further improve the training of the network.

Furthermore, the segmentation of pancreas and IPMN cysts can help in localizing the regions of interest. These regions can be used not only to extract discriminative imaging features, but also to extract important measurements such as the diameters of main pancreatic duct and cysts [5]. The inclusion of additional imaging modalities such as CT scans along with demographic and clinical characteristics, including age, gender, family history, symptoms and body fat can help further improve diagnostic decision making in the future.


  • [1] American Cancer Society, “Cancer Facts & Figures,” American Cancer Society, 2016.
  • [2] Chanjuan Shi and Ralph H Hruban, “Intraductal papillary mucinous neoplasm,” Human pathology, vol. 43, no. 1, pp. 1–16, 2012.
  • [3] Eran Sadot, Olca Basturk, David S Klimstra, Mithat Gönen, Lokshin Anna, Richard Kinh Gian Do, et al., “Tumor-associated neutrophils and malignant progression in intraductal papillary mucinous neoplasms: an opportunity for identification of high-risk disease,” Annals of surgery, vol. 262, no. 6, pp. 1102, 2015.
  • [4] Hanno Matthaei, Richard D Schulick, Ralph H Hruban, and Anirban Maitra, “Cystic precursors to invasive pancreatic cancer,” Nature Reviews Gastroenterology and Hepatology, vol. 8, no. 3, pp. 141–150, 2011.
  • [5] Masao Tanaka, Carlos Fernández-del Castillo, Volkan Adsay, Suresh Chari, Massimo Falconi, Jin-Young Jang, Wataru Kimura, Philippe Levy, Martha Bishop Pitman, C Max Schmidt, et al., “International consensus guidelines 2012 for the management of ipmn and mcn of the pancreas,” Pancreatology, vol. 12, no. 3, pp. 183–197, 2012.
  • [6] Alexander N Hanania, Leonidas E Bantis, Ziding Feng, Huamin Wang, Eric P Tamm, Matthew H Katz, Anirban Maitra, and Eugene J Koay, “Quantitative imaging to evaluate malignant potential of IPMNs,” Oncotarget, vol. 7, no. 52, pp. 85776, 2016.
  • [7] Lior Gazit, Jayasree Chakraborty, Marc Attiyeh, Liana Langdon-Embry, Peter J Allen, Richard KG Do, and Amber L Simpson, “Quantification of CT Images for the Classification of High-and Low-Risk Pancreatic Cysts,” in SPIE Medical Imaging. International Society for Optics and Photonics, 2017, pp. 101340X–101340X.
  • [8] Nicholas J Tustison, Brian B Avants, Philip A Cook, Yuanjie Zheng, Alexander Egan, Paul A Yushkevich, and James C Gee, “N4ITK: Improved N3 bias correction,” IEEE Transactions on Medical Imaging, vol. 29, no. 6, pp. 1310–1320, 2010.
  • [9] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, “Return of the devil in the details: Delving deep into convolutional nets,” in British Machine Vision Conference, 2014.
  • [10] Mohammad Haghighat, Mohamed Abdel-Mottaleb, and Wadee Alhalabi,

    “Fully automatic face normalization and single sample face recognition in unconstrained environments,”

    Expert Systems with Applications, vol. 47, pp. 23–34, 2016.
  • [11] Hans J. Johnson, M. McCormick, L. Ibáñez, and The Insight Software Consortium, The ITK Software Guide, Kitware, Inc., third edition, 2013.
  • [12] Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” in Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on. IEEE, 2008, pp. 1322–1328.
  • [13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    , 2016, pp. 770–778.
  • [14] Lola Rahib, Benjamin D Smith, Rhonda Aizenberg, Allison B Rosenzweig, Julie M Fleshman, and Lynn M Matrisian, “Projecting cancer incidence and deaths to 2030: The unexpected burden of thyroid, liver, and pancreas cancers in the United States,” Cancer research, vol. 74, no. 11, pp. 2913–2921, 2014.
  • [15] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014, pp. 2672–2680.