Feasibility of Colon Cancer Detection in Confocal Laser Microscopy Images Using Convolution Neural Networks

12/04/2018 ∙ by Nils Gessert, et al. ∙ 0

Histological evaluation of tissue samples is a typical approach to identify colorectal cancer metastases in the peritoneum. For immediate assessment, reliable and real-time in-vivo imaging would be required. For example, intraoperative confocal laser microscopy has been shown to be suitable for distinguishing organs and also malignant and benign tissue. So far, the analysis is done by human experts. We investigate the feasibility of automatic colon cancer classification from confocal laser microscopy images using deep learning models. We overcome very small dataset sizes through transfer learning with state-of-the-art architectures. We achieve an accuracy of 89.1 detection in the peritoneum which indicates viability as an intraoperative decision support system.



There are no comments yet.


page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Colorectal cancer is one of the most common types of cancer [1]. Due to metastatic spread, peritoneal carcinomatosis can occur in later stages which often leads to substantially shorter survival times [2]. Therefore, reliable detection of metastases is important. Typical imaging modalities such as magnetic resonance imaging and computed tomography currently lack the required resolution and intraoperative availability. Therefore, an intraoperative device using confocal laser microscopy (CLM) has been proposed [3] which offers submicrometer resolution.

In the above-mentioned study, colon carcinoma cells were implanted into the colon and peritoneum of ten rats. After seven days of tumor growth, laparotomy was carried out for subsequent in-vivo CLM. For each subject, healthy colon tissue, malignant colon tissue, healthy peritoneum and malignant peritoneum were scanned. The study showed that different organs, as well as malignant and non-malignant regions could be distinguished by experts.

To further improve the intraoperative assessment by CLM, image processing methods can be used for automatic and fast tissue characterization. Recently, deep learning methods have shown remarkable success for a variety of medical segmentation and classification tasks [4] where human-level performance was achieved [5].

We investigate the feasibility of deep learning-based colon cancer detection from CLM images. We consider several classification problems with the four classes "colon normal", "colon malignant", "peritoneum normal" and "peritoneum malignant". In particular, we investigate both the differentiability of organs and also of malignant and non-malignant tissue both for the colon and peritoneum. As we are dealing with a very small dataset we employ transfer learning which has been shown to improve performance for a variety of medical learning problems [6, 7]. We use the state-of-the-art models Densenet121 [8] and SE-Resnext50 [9]

which are pretrained on the ImageNet dataset.

2 Methods

2.1 Dataset

Figure 1: Examples of the four different classes. From left to right, healthy colon tissue, malignant colon tissue, healthy peritoneum tissue and malignant peritoneum tissue.

The dataset we use was kindly provided to us by the authors of a previous study on CLM [3]. The dataset was acquired at the University Hospital Schleswig-Holstein in Lübeck using a custom intraoperative CLM device. The CLM device (Karl Storz GmbH & Co KG, Tuttlingen, Germany) covers a field of view of with a resolution of pixels. The images were obtained from ten rats where colon adenocarcinoma cells had been implanted into the colon and peritoneum seven days before scanning. For each subject, images of healthy colon tissue (HC), malignant colon tissue (MC), healthy peritoneum tissue (HP) and malignant peritoneum tissue (MP) were obtained. In total, there are 533 images of class HC, 309 images of class MC, 343 images of class HP and 392 images of class MP which results in a total dataset size of 1577 images. Note, that for one subject there are no images of class HC and for one subject there are no images of class MP. Example cases for each class are shown in Figure 1. The assignment of classes for each image was performed based on subsequent histological evaluation of resected tissue from the scanning area.

We split the dataset in a leave-one-subject-out cross-validation scheme, i.e., we consider ten different dataset splits where images from one subject are left out for evaluation. If a required class is missing, the subject’s validation split is omitted. We consider three classification problems in total. First, we address the binary classification task HC versus HP which provides information on whether the organs can be differentiated in principle. Next, we consider the learning problems HC versus MC and HP versus MP which investigates the feasibility of detecting malignant tissue from CLM images.

2.2 Models and Training

Figure 2: The key concepts of the architecture we employ. The shown modules replace sets of standard convolutional layers in the architecture. Left, a Densenet [8] block is shown. Right, an SE block is shown for the Resnext architecture [9].

We employ convolutional neural networks (CNNs) for the classification tasks at hand. The images are directly fed into a CNN which learns to extract relevant features and also perform classification at its output. We employ the two state-of-the-art architectures Densenet121 [8] and SE-Resnext50 [9]. Densenet121 follows the principle of densely connected layers, i.e., features computed within a convolutional layers are also reused in subsequent layers. In this way, the architecture is very efficient in terms of the number of learnable parameters as features are reused heavily. Considering the small dataset size at hand, this can be very beneficial. The SE-Resnext50 architecture is based on the Resnext principle [10]

where feature extraction is performed by multiple, parallel paths. In addition, squeeze and excitation (SE) modules are incorporated into the model which perform a feature recalibration step. In standard convolutions the aggregation of features is learned implicitly through a summation. Instead, the SE modules explicitly model dependencies between learned features which increases the models’ representational power. The building blocks of the two concepts are shown in Figure 


To overcome the general lack of data, we use transfer learning, i.e. the models are pretrained on the ImageNet dataset. During training we fine tune all weights. For comparison, we also consider training from scratch. The pretrained models’ input layer contains three channels. We put the gray-scale CLM images into one channel and set the other channels to zero. We cut off the last layer and add fully-connected layer with two outputs for binary classification.

During training, we use online data augmentation with unscaled random crops of size from the original images of size

. Also, we use random flipping along both dimensions and random changes in brightness and contrast. For stochastic gradient descent we employ Adam with a batch size of

and learning rate of and we train for epochs. For evaluation, we use multi-crop evaluation with

crops. The predictions of all crops are averaged into a final prediction for each image. The models are implemented in PyTorch.

3 Results

Accuracy Sensitivity Specificity F1-Score

HC vs. HP

Dense TL
Dense SRC

HC vs. MC

Dense TL
Dense SRC

HP vs. MP

Dense TL
Dense SRC
Table 1: The results of all our deep learning experiments. The mean values for leave-one-subject-out cross-validation are shown. Dense refers to the Densenet121 model, SE-RX refers to the SE-Resnext50 model. TL refers to transfer learning and SRC refers to training from scratch. For each training scenario, the best performing value is marked bold. All values are given in percent. The sensitivity is given with respect to the cancer class and for the case of organ differentiation it is given with respect to the peritoneum class.

All results are shown in Table 1. In terms of metrics, we report accuracy, sensitivity, specificity and the F1-score. For each of the three training scenarios, HC versus HP, HC versus MC and HP versus MP, we consider the architectures described in Section 2.2. Also, for each case we consider training from scratch and fine-tuning after pretraining on ImageNet. In general, the classification accuracy is high for the distinction of organs and also the differentiation between benign and malignant tissue of the peritoneum. However, the performance for cancer detection in the colon is significantly lower. Comparing the two architectures, the performance is very similar with Densenet121 generally performing slightly better. Using transfer learning with pretrained architectures improves performance substantially for most cases.

4 Discussion

In this study we investigate the feasibility of detecting colon cancer from confocal laser microscopy (CLM) images using deep learning models. This extends a previous study where the feasibility of cancer detection from CLM images by experts was shown [3]. Here, we use two state-of-the-art deep learning architectures to automatically detect cancer from CLM images. As a baseline, we consider the task of differentiating healthy tissue from the colon and the peritoneum. With an F1-score of , the best model, Densenet121, shows a high performance which indicates that different organs can be well distinguished in CLM images by deep learning models. It is notable that without pretraining performance drops substantially across all metrics. This highlights the effectiveness of transfer learning for a particularly small dataset [6]. Regarding the detection of malignant tissue in the peritoneum, the model performance is also very high with Densenet121 performing best. It is notable that Densenet121 generally performs better than SE-Resnext50 in our study while the latter clearly outperforms the former on the ImageNet dataset [9]. This is likely tied to Densenet121 having significantly fewer parameters which prevents overfitting with the small dataset. Also, the performance difference between training from scratch and transfer learning is larger for Densenet121. This indicates, that Densenet121 benefits more from the pretrained weights. Considering the detection of malignant tissue in the colon, the performance is significantly lower compared to the other tasks. It should be noted that the performance difference is most obvious in the specificity. Thus, most cases of cancer are detected but a lot of false positives occur as well. This might be tied to the heterogeneous appearance of the colon in different areas which makes the learning task very challenging due to the small dataset size. Also, carcinoma cells transform from healthy tissue via adenoma to carcinoma. Thus, healthy and malignant tissue can have a similar appearance which might complicate the learning problem.

Overall, we showed that automatic organ differentiation and cancer detection from CLM images is feasible using pretrained convolutional neural networks. For future work, more data could be acquired and the detection of malignant tissue in the colon area could be studied further.


  • [1] Torre LA, Bray F, Siegel RL, Ferlay J, Lortet-Tieulent J, Jemal A. Global cancer statistics, 2012. CA: A Cancer Journal for Clinicians. 2015;65(2):87–108.
  • [2] Franko J, Shi Q, Goldman CD, Pockaj BA, Nelson GD, Goldberg RM, et al. Treatment of colorectal peritoneal carcinomatosis with systemic chemotherapy: a pooled analysis of north central cancer treatment group phase III trials N9741 and N9841. Journal of Clinical Oncology. 2012;30(3):263.
  • [3] Ellebrecht DB, Kuempers C, Horn M, Keck T, Kleemann M. Confocal laser microscopy as novel approach for real-time and in-vivo tissue examination during minimal-invasive surgery in colon cancer. Surgical Endoscopy. 2018; p. 1–7.
  • [4] Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. A survey on deep learning in medical image analysis. Medical Image Analysis. 2017;42:60–88.
  • [5] Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115.
  • [6] Hoo-Chang S, Roth HR, Gao M, Lu L, Xu Z, Nogues I, et al. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Transactions on Medical Imaging. 2016;35(5):1285.
  • [7] Gessert N, Lutz M, Heyder M, Latus S, Leistner DM, Abdelwahed YS, et al. Automatic Plaque Detection in IVOCT Pullbacks Using Convolutional Neural Networks. IEEE Transactions on Medical Imaging. 2018; p. 1–9.
  • [8] Huang G, Liu Z, Weinberger KQ, van der Maaten L. Densely connected convolutional networks. In: Proceedings of the IEEE CVPR; 2017. .
  • [9] Hu J, Shen L, Sun G. Squeeze-and-Excitation Networks. In: Proceedings of the IEEE CVPR; 2018. .
  • [10] Xie S, Girshick R, Dollár P, Tu Z, He K. Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE CVPR. IEEE; 2017. p. 5987–5995.