We built a new database of IVOCT images using in-vivo patient data acquired with a St. Jude Medical Ilumien OPTIS. A trained expert with daily application experience with IVOCT provides the ground-truth labels for the images. Each B-Scan is assigned a binary label of “plaque” or “no plaque”. In total, the dataset contains 41 patients with 2868 labeled B-Scans. We split off a test set of 6 patients with 509 B-Scans. The dataset is slightly imbalanced with of the B-Scans being labelled as “no plaque”. In contrast to previous approaches ughi2013automated ; rico2016automatic , we do not apply extensive pre-processing for lumen segmentation and flattening or guide-wire artifact and catheter removal. Therefore, we force our models to be robust towards all kinds of artifacts which will often appear in clinical practice. This allows our models to deal with any raw data without having to rely on an automatic segmentation procedure which might fail if the artery wall is not consistently visible, as shown in Figure1. We employ convolutional neural networks (CNNs) to directly learn relevant features for plaque classification from the B-Scans. We make use of standard architectures for image classification and object detection in non-medical settings. The architectures are Resnet50, Resnet101, InceptionV3 and Inception-ResnetV2 litjens2017survey ; Szegedy.2017 . Moreover, we apply transfer learning which can help with the adaptation to new problem domains where data is limited ravishankar2016understanding
. Therefore, we pretrain the models on the ImageNet dataset which contains 1.2 million natural images with 1000 classes. We remove the models’ last layer and add a layer with two outputs for binary classification. In their original design, the Inception-based models employ dropout before the output. For comparability, we employ dropout with a probability of p=0.5 before the output of every model. The images that are fed into the model can be represented either in polar or cartesian form, see Figure1
. In polar form, the acquired A-Scans are aligned next to each other in temporal acquisition order. Each A-Scan represents a single depth scan at a certain angle of the artery cross-section. The polar image can be transformed into cartesian space by mapping the A-Scans to their respective angle and applying interpolation in between. This representation provides a more intuitive cross-sectional view of the artery and is therefore used by practitioners. From an image processing point of view, both should capture the same amount of information. We investigate whether either representation is more advantageous for deep feature learning. We resize the input images to a size of 300x300 pixels. For data augmentation, we apply random cropping with a patch size of 270x270 pixels during training. Furthermore, we apply random rotations to the cartesian images and we randomly flip polar images along the temporal axis. For evaluation, we use a single center crop of the training patch size without flipping or rotations.
The resulting prediction performance of our models on the test set is shown in Figure 2. The sensitivity and 1-specificity of each model for classification of an image as “plaque” is shown. The four models were each trained on polar and cartesian images, both with and without transfer learning. Models in the upper left corner perform better as they have a higher sensitivity and specificity. Overall, pretraining on ImageNet appears to improve performance significantly as the best model without pretraining shows an overall accuracy of (Inception-ResnetV2 with polar images) with a sensitivity of and a specificity of while the best model with pretraining (ResNet101 with cartesian images) shows an overall accuracy of with a sensitivity of and a specificity of . It appears, that meaningful feature transfer from the natural image domain to the IVOCT image domain was achieved. Also, using cartesian representations results in better classification performance. All models with pretraining achieve both a higher sensitivity and specificity when being trained on cartesian images. For example, the best model with polar images shows an accuracy of compared to for cartesian images (both Resnet101). This indicates that a cartesian real-world image representation helps CNN-based learning when employing transfer learning. Without pretraining, the difference is not as clear and some models perform better with polar representations. The different models all perform similar with Resnet101 standing out slightly as it performs best for the two pretrained cases. All in all, the choice of image representation and transfer learning has a larger impact on performance than the model choice.
We perform plaque classification from IVOCT images using CNNs for deep feature learning. For this purpose, we build a database with in-vivo patient image data that is labelled by a trained expert. We allow images with various artifacts in our dataset und force our models to learn robustness. We employ various standard CNN models with additional pretraining on ImageNet for transfer learning. Our results show that pretraining significantly boosts performance. Moreover, using cartesian image representations appears to be beneficial for CNN learning. Overall, our best model achieves an accuracy of for plaque classification.
- (1) Ughi, G.J., Adriaenssens, T., Sinnaeve, P., Desmet, W., D’hooge, J. (2013) Automated tissue characterization of in vivo atherosclerotic plaques by intravascular optical coherence tomography images. Biomedical optics express 4(7), 1014–1030
- (2) Rico-Jimenez, J.J., Campos-Delgado, D.U., Villiger, M., Otsuka, K., Bouma, B.E., Jo, J.A. (2016) Automatic classification of atherosclerotic plaques imaged with intravascular oct. Biomedical optics express 7(10), 4069–4085
- (3) Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak, J.A., van Ginneken, B., Sánchez, C.I. (2017) A survey on deep learning in medical image analysis. Medical image analysis 42, 60–88
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A. (2017) Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning.In: AAAI, pp. 4278–4284
- (5) Ravishankar, H., Sudhakar, P., Venkataramani, R., Thiruvenkadam, S., Annangi, P., Babu, N., Vaidya, V. (2016) Understanding the mechanisms of deep transfer learning for medical images. In: Deep Learning and Data Labeling for Medical Applications, pp. 188–196. Springer