Convolutional Neural Networks (CNNs) play an important role in visual image recognition. In the past few years, CNNs have achieved promising results in image classification [15, 22, 23, 12, 13] and semantic segmentation [5, 16, 4, 20, 19]. Fully Convolutional Networks (FCNs)  have become popular and used to solve the problem of making dense predictions at a pixel level. There are two major differences between FCN and the type of CNNs which are primarily designed for classification [15, 22, 24]. First, FCN does not have fully-connected layers therefore can accept any arbitrary size of inputs. Secondly, FCN consists of an encoder network that produces embedded feature maps that are followed by a decoder network to expand and refine the feature maps outputted by the encoder. Skip connections are also common in the architecture to connect corresponding blocks in the encoder and decoder [2, 7, 4].
Segmentation of the acquired IVUS images is a challenging task since IVUS images usually comes with artifacts. Particularly, a successful separation of the interior (lumen) and exterior (media) vessel walls in IVUS images plays a critical role to diagnose cardiovascular diseases. It also helps building the 3D reconstruction of the artery where the information of the catheter movements has been provided using another imaging modality such as X-Ray. The segmentation of IVUS images has been a well-investigated problem from a conventional perspective where numerous ideas and approaches of computer vision and image processing such as in[17, 18, 25, 30, 26] have been employed. One of the best segmentation results have been achieved in a very recent work  where authors proposed a two-fold IVUS segmentation pipeline based on traditional computer vision methods [10, 9]. Although no learning method was used, it outperforms existing methods from both the accuracy and efficiency perspective. Although the reported performance of  is very close to the ground truth label (0.30mm error of the segmented lumen and 0.22mm error of the segmented media from the gold standard), we believe that the deep learning technique has the potential to perform better.
In this paper, we propose an FCN-based pipeline that automatically delineates the boundary of the lumen and the media vessel walls. The pipeline contains two major components: a carefully designed FCN for predicting a pixel-wise mask which is called IVUS-Net, followed by a contour extraction post-processing step. In addition, the FCN is trained from scratch without relying on any pre-trained weights. We evaluated the proposed IVUS-Net on the test set of a publicly available IVUS B-mode benchmark dataset  which contains 326 20MHz IVUS images consist of various artifacts such as motion of the catheter after a heart contraction, guide wire effects, bifurcation and side-branches. Two standard metrics, namely, Jaccard Measure (JM), alternatively called Intersection over Union (IoU), and Hausdorff Distance (HD) were used for evaluation.
The contributions of the proposed work can be summarized as follows:
We propose a pipeline based on a FCN followed by a post-processing contour extraction to automatically delineate the lumen and media vessel walls.
We show that the proposed work outperforms the current state-of-the-art studies over a publicly available IVUS benchmark dataset  which contains IVUS images with a significant amount of artifacts. This shows that the proposed work has the potential to be generalized to other IVUS benchmarks as well.
To the best of our knowledge, there is no previous work based on deep architecture that can produce segmentation for both the lumen and media vessel walls in B-mode IVUS images.
2 Proposed Method
In this section, we first introduce the dataset we used to train the deep model. Then, we present the architecture, IVUS-Net, that produces binary prediction mask for either the lumen or media area, followed by a contour extraction step to delineate the vessel wall.
We used a publicly available IVUS dataset  that contains two sets (train and test) of IVUS gated frames using a full pullback at the end-diastolic cardiac phase from 10 patients. Each frame has been manually annotated by four clinical experts. The train and test sets consist of 109 and 326 IVUS frames, respectively. Also, test set contains a large number of IVUS artifacts including bifurcation (44 frames), side vessel (93 frames), and shadow (96 frame) artifacts. The remaining 143 frames do not contain any artifacts except for plaque.
IVUS-Net is designed based on fully convolutional network (FCN) , with inspirations from aggregated, multi-branch architectures such as ResNeXT  and the Inception model . Both SegNet  and U-Net  can be considered as a base version of our proposed work according to the network architecture design. It has two major components:
An encoder network that can downsample and process the input to produce a low-resolution deep feature map.
A decoder network that can restore the resolution of the deep feature map outputted by the encoder network towards original size.
The output feature map is sent to one more convolutional layer followed by a sigmoid activation to produce the final result.
The encoder network contains 4 encoding blocks where the decoder network contains 3 decoding blocks. Each decoding block receives feature map from its previous block and extra information from the encoder network by skip-connections. The entire architecture is therefore symmetric as shown in Fig. 1. There are minor differences among the blocks in the architecture. We give a brief illustration for the design and also show the intuitions behind.
Except for the first encoding block, each encoding block contains a downsampling branch that downsamples the input feature map, then followed by a two-branch convolution path, as shown in Fig. 2
(a). We build and expand downsampling branches in order to avoid losing information due to using the pooling. In fact, the downsampling branch facilitates reducing the spatial resolution of the input. It employs a 2-by-2 average pooling layer and a 2-by-2 convolutional layer with a stride of 2 at the same time, and finally concatenate the two outputs together, this aggregation idea is similar to[24, 27].
After the downsampled, aggregated feature map outputs by the downsampling branch is passed to two subsequent branches, namely the refining branch and the main branch. First, we follow the design in [2, 21]
to include a branch with consecutive convolutional layers followed by activation and batch normalization, here we call it “main branch”. A recent trend is to use small kernel size for the feature map refinement[4, 19]. So we intentionally design a “refining branch” that has one convolutional layer with a 3-by-3 kernel size followed by a convolutional layer with a 1-by-1 kernel size produces similar but refined feature map. The outputs from the main branch and refining will be summed up and pass to the next block and its corresponding decoding block.
Decoding blocks need a slightly different configuration, as shown in Fig. 2(b). Every decoding block receives the feature map from both its previous block and its corresponding encoding block. Only the feature map received from the previous block is upsampled by a 2-by-2 deconvolution and then concatenated with the feature map from its corresponding encoding block. Note that this concatenated feature map will only be passed to the main branch, where the refining branch handles the upsampled feature map only.
The activation used in the IVUS-Net is the Parametric Rectified Linear Unit (PReLU) .
Compared with the ordinary ReLU activation, PReLU allows a part of the gradients flow through when the neuron is not activated, where ReLU only passes gradients when the neuron is active. As suggested in[11, 28], PReLU outperforms ReLU in many benchmarks and also has a more stable performance.
Finally, the output feature map from the last decoding block is refined by a 5-by-5 convolutional layer, which is experimentally proved to be helpful on improving performance. As we want IVUS-Net to produce binary masks, the last activation is a sigmoid function.
Generally, as it has been proposed in , since the shape of the lumen and media regions of the vessel are very similar to conic sections, representing the predicted masks by fitting an ellipse on the masks can increase the accuracy of the segmentations. Therefore, we follow the same process explained in  to post-process the predicted masks in order to extract the final contours.
The evaluation is based on a publicly available IVUS B-mode dataset , which has been widely used in the IVUS segmentation literature [8, 6, 17, 30, 26]. There are 109 images in the training set, 326 images in the test set and no official validation set is provided. Models are trained end-to-end, based on only the given dataset without involving any other external resources such as extra training images and pre-trained model weights. Two metrics are used for the evaluation, namely Jaccard Measure (JM) and Hausdorff Distance (HD). The Jaccard Measure, sometimes called Intersection over Union, is calculated based on the comparison of the automatic segmentation from the pipeline () and the manual segmentation delineated by experts ().
The Hausdorff Distance between the automatic () and manual () curves is the “greatest distance of all points belonging to to the closest point”  in and is defined as follows:
3.1 Data Augmentation
The training set contains only 109 images, which is considered as a relatively small training set for training a deep model from scratch. We then employ data augmentation on all the available training images. The augmentation is twofold. First, every original IVUS image and its corresponding ground truth masks are flipped (1) left to right, (2) up to down, and (3) left to right then up to down, to generate three new image-mask pairs. Secondly, we add heavy noises to input images. The methods we use to add noises to the input image include giving additive Gaussian noise, as suggested by , or converting the input image to entirely black. No modification is done on the ground truth masks. The effectiveness of data augmentation is discussed in Section 3.3.
3.2 Training the Model
All the models are trained and evaluated on a computer with a Core i7-8700K processor, 16GB of RAM, and a GTX 1080 8GB graphics card. Training a model from scratch generally takes less than 2 hours to complete. To make the training faster and use a relatively large batch size, we downsized every frame of the dataset by a factor of 0.5.
We implement IVUS-Net with TensorFlow. The weights in the model are all initialized randomly. Then we train the model with Adam optimizer 
. The learning rate is set to be 0.0001 with no decay scheme. The augmented training set is used to train each model for 96 epochs, with a batch size of 6 and 144 iterations in total for each epoch. Note that we need two groups of models to predict the lumen area and the media area since the output activation is a sigmoid function:
For training each model, we randomly select 10 original IVUS images as the validation set to monitor the average Jaccard Measure without extracting contours. The given prediction by a single model is a probability map that has equal dimensions to the input image size. We follow the ensemble practice in to produce the final result.
|All||Proposed||0.90 (0.06)||0.26 (0.25)||0.86 (0.11)||0.48 (0.44)|
|Faraji et al. ||0.87 (0.06)||0.30 (0.20)||0.77 (0.17)||0.67 (0.54)|
|Downe et al. ||0.77 (0.09)||0.47 (0.22)||0.74 (0.17)||0.76 (0.48)|
|Exarchos et al. ||0.81 (0.09)||0.42 (0.22)||0.79 (0.11)||0.60 (0.28)|
|No Artifact||Proposed||0.91 (0.03)||0.21 (0.09)||0.92 (0.05)||0.27 (0.23)|
|Faraji et al. ||0.88 (0.05)||0.29 (0.17)||0.89 (0.07)||0.31 (0.23)|
|Bifurcation||Proposed||0.82 (0.11)||0.50 (0.58)||0.78 (0.11)||0.82 (0.60)|
|Faraji et al. ||0.79 (0.10)||0.53 (0.34)||0.57 (0.13)||1.22 (0.45)|
|Downe et al. ||0.70 (0.11)||0.64 (0.27)||0.71 (0.19)||0.79 (0.53)|
|Exarchos et al. ||0.80 (0.09)||0.47 (0.23)||0.78 (0.11)||0.63 (0.25)|
|Side Vessels||Proposed||0.90 (0.04)||0.23 (0.12)||0.83 (0.14)||0.59 (0.49)|
|Faraji et al. ||0.87 (0.05)||0.24 (0.11)||0.73 (0.60)||0.74 (0.18)|
|Downe et al. ||0.77 (0.08)||0.46 (0.19)||0.74 (0.16)||0.76 (0.47)|
|Exarchos et al. ||0.77(0.09)||0.53 (0.24)||0.78 (0.12)||0.63 (0.31)|
|Shadow||Proposed||0.87(0.06)||0.27 (0.25)||0.76 (0.12)||0.80 (0.45)|
|Faraji et al. ||0.86 (0.07)||0.29 (0.20)||0.58 (0.13)||1.24 (0.39)|
|Downe et al. ||0.76 (0.11)||0.55 (0.26)||0.74 (0.16)||0.77 (0.48)|
|Exarchos et al. ||0.80 (0.10)||0.46 (0.19)||0.82 (0.11)||0.57 (0.28)|
Performance of the proposed IVUS-Net with contour extraction. Measures represent the mean and standard deviation evaluated on 326 frames of the dataset and categorized based on the presence of a specific artifact in each frame. The evaluation measures are Jaccard Measure (JM) and Hausdorff Distance (HD).
3.3 The Effectiveness of Data Augmentation
We validate the effectiveness of data augmentation with a small experiment. In each case, 5 models with identical configurations are trained and we use the ensemble strategy illustrated in  to produce the final prediction. The result is shown in Table (a)a. Note that this result is based on the predictions produced directly by the ensembling without contour extraction. No matter which type of vessel segmentation the model predicts, we can safely conclude that the augmentation helps to improve the segmentation performance.
3.4 On Evaluating the Refining Branch
Does refining branch really help? We use the exact same configuration to train two groups of 5 models. One group includes the proposed model, another group is the proposed model without the refining branch. The evaluation procedures and metrics are as same as we did for the data augmentation evaluation, the result is shown in Table (b)b. There are, indeed, improvements made by the refining branch.
3.5 Segmentation Results
In this section, we present and discuss experimental results on the IVUS dataset . We train 10 models with the configuration mentioned in Section 3.2 and ensemble the predictions followed by contour extraction to produce the final prediction mask.
The quantitative result is shown in Table 2. As we can see, IVUS-Net outperforms existing methods by a significant margin. According to the Jaccard Measure, we achieve 4% and 8% improvement for the lumen and the media, respectively. If we look at the Hausdorff distance, IVUS-Net obtains 8% and 20% improvement for the lumen and the media, respectively.
IVUS-Net performs particularly well on images with no artifact. Furthermore, it improves the performance by a large margin for segmenting both the lumen and the media according to the Hausdorff distance. The reason why IVUS-Net does not exceed all the methods in every single categories of  can be addressed from two perspectives. First, the training set is too small to capture all the common artifacts in the real world and even the test set. But the architecture is still considerably effective as the training set contains only 1 image with side vessels artifact while the test set contains 93 frames with side vessel artifacts. Secondly, the shadow artifacts are generally overlapped with parts of the media area that makes the segmentation becomes much more challenging since the media regions leak to the background. Some predictions are illustrated in Fig. 3.
In this paper, we proposed IVUS-Net for the segmentation of arterial walls in IVUS images as well as a contour extraction post-processing step that specifically fits for the IVUS segmentation task. We showed that IVUS-Net can outperform the existing conventional methods on delineating the lumen and media vessel walls. This is also the first deep architecture-based work that achieves segmentation results that are very close to the gold standard. We evaluated IVUS-Net on a publicly available dataset containing 326 IVUS frames. The results of our evaluation showed the superiority of IVUS-Net output segmentations over the current state-of-the-arts. Also, IVUS-Net can be employed in real-world applications since it only needs 0.15 second to segment any IVUS frame.
The authors would like to thank the PhD students in the Multimedia Research Centre at University of Alberta. Special thanks to Xinyao Sun for the discussions on the related work and the network architecture figure design.
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado,
A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving,
M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg,
D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens,
B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan,
F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and
TensorFlow: Large-scale machine learning on heterogeneous systems, 2015.Software available from tensorflow.org.
-  V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12):2481–2495, 2017.
-  S. Balocco, C. Gatta, F. Ciompi, A. Wahle, P. Radeva, S. Carlier, G. Unal, E. Sanidas, J. Mauri, X. Carillo, et al. Standardized evaluation methodology and reference database for evaluating ivus image segmentation. Computerized medical imaging and graphics, 38(2):70–90, 2014.
-  L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915, 2016.
-  D. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber. Deep neural networks segment neuronal membranes in electron microscopy images. In Advances in neural information processing systems, pages 2843–2851, 2012.
-  R. Downe, A. Wahle, T. Kovarnik, H. Skalicka, J. Lopez, J. Horak, and M. Sonka. Segmentation of intravascular ultrasound images using graph search and a novel cost function. In Proc. 2nd MICCAI workshop on computer vision for intravascular and intracardiac imaging, pages 71–9. Citeseer, 2008.
-  M. Drozdzal, E. Vorontsov, G. Chartrand, S. Kadoury, and C. Pal. The importance of skip connections in biomedical image segmentation. In Deep Learning and Data Labeling for Medical Applications, pages 179–187. Springer, 2016.
-  M. Faraji, I. Cheng, I. Naudin, and A. Basu. Segmentation of arterial walls in intravascular ultrasound cross-sectional images using extremal region selection. Ultrasonics, 84:356–365, 2018.
-  M. Faraji, J. Shanbehzadeh, K. Nasrollahi, and T. B. Moeslund. Erel: extremal regions of extremum levels. In Image Processing (ICIP), 2015 IEEE International Conference on, pages 681–685. IEEE, 2015.
-  M. Faraji, J. Shanbehzadeh, K. Nasrollahi, and T. B. Moeslund. Extremal regions detection guided by maxima of gradient magnitude. IEEE Transactions on Image Processing, 24(12):5401–5415, 2015.
K. He, X. Zhang, S. Ren, and J. Sun.
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.In Proceedings of the IEEE International Conference on Computer Vision, pages 1026–1034, 2015.
K. He, X. Zhang, S. Ren, and J. Sun.
Deep residual learning for image recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778. IEEE Computer Society, 2016.
-  G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
-  D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
-  A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
-  J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
-  E. G. Mendizabal-Ruiz, M. Rivera, and I. A. Kakadiaris. Segmentation of the luminal border in intravascular ultrasound b-mode images using a probabilistic approach. Medical image analysis, 17(6):649–670, 2013.
-  G. Mendizabal-Ruiz and I. A. Kakadiaris. A physics-based intravascular ultrasound image reconstruction method for lumen segmentation. Computers in biology and medicine, 75:19–29, 2016.
-  C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun. Large kernel matters–improve semantic segmentation by global convolutional network. arXiv preprint arXiv:1703.02719, 2017.
-  P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, A. Bagul, C. Langlotz, K. Shpanskaya, et al. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225, 2017.
-  O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
-  K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (ICRL), pages 1–14, 2015.
-  R. K. Srivastava, K. Greff, and J. Schmidhuber. Highway networks. arXiv preprint arXiv:1505.00387, 2015.
-  C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Computer Vision and Pattern Recognition (CVPR), 2015.
-  A. Taki, Z. Najafi, A. Roodaki, S. K. Setarehdan, R. A. Zoroofi, A. Konig, and N. Navab. Automatic segmentation of calcified plaques and vessel borders in ivus images. International Journal of Computer Assisted Radiology and Surgery, 3(3-4):347–354, 2008.
-  G. Unal, S. Bucher, S. Carlier, G. Slabaugh, T. Fang, and K. Tanaka. Shape-driven segmentation of the arterial wall in intravascular ultrasound images. IEEE Transactions on Information Technology in Biomedicine, 12(3):335–347, 2008.
-  S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pages 5987–5995. IEEE, 2017.
-  B. Xu, N. Wang, T. Chen, and M. Li. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853, 2015.
-  C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals. Understanding deep learning requires rethinking generalization. In International Conference on Learning Representations (ICLR), 2017.
-  X. Zhu, P. Zhang, J. Shao, Y. Cheng, Y. Zhang, and J. Bai. A snake-based method for segmentation of intravascular ultrasound images and its in vivo validation. Ultrasonics, 51(2):181–189, 2011.