According to the World Health Organization 
, cardiovascular diseases (CVDs) are the first cause of death globally. About 17.7 million people died from CVDs in 2015, which was 31% of total global deaths from diseases. Almost 7.4 million of these deaths were due to CVDs and about 6.7 million were due to the stroke. Extensive research and clinical applications have shown that both CT and MRI have vital roles in non-invasive assessment of CVDs. CT is used more frequently than MRI due to its fast acquisition and cheaper cost. On the other hand, MRI has an excellent soft tissue contrast and no ionizing radiation. However, most commercially available image analysis methods have been either tuned for CT or MRI only. Furthermore, many studies are focused on only one substructure of the heart (for instance, the left ventricle or left atrium). Surprisingly, there is very little published research on segmenting all substructures of the heart despite the fact that clinically established markers rely on shape, volumetric, and tissue characterization of all the cardiac substructures. Our study is concerned with this open problem from a machine learning perspective. We have investigated architectural designs of deep learning networks to solve multi-label and multi-modality image segmentation challenges within the scope of a limited GPU and imaging data.
Related Works. Literature related to cardiac image segmentation is vast. Among these works, atlas-based methods have been quite popular and favored for many years. For instance, multi-atlas based whole-heart segmentation using MRI and CT by  and atlas propagation based method using prior information by  are a few key examples. Despite their accuracy, those methods often lack efficiency due to heavy computations on the registration algorithms (e.g., from 13 minutes to 11 hours of computations reported in the literature). Interested readers can find a survey paper on cardiac image segmentation methods in  for a full list of methods and their comparative evaluations.
More recently, deep learning based approaches are replacing the conventional methods in medical image segmentation fields in general, and cardiac field in particular. For instance, in , a multi-planar deep learning has been utilized to segment LA and pulmonary veins from MR images. A recurrent fully convolutional neural network has been proposed to segment LV from MRI in . In a similar fashion, a deep learning algorithm combined with a deformable-model approach was used to segment LV from MRI . In , RV segmentation has been accomplished through a joint localization and segmentation algorithm within a deep learning framework. To date, the majority of deep learning methods have segmented only one or two structures of the heart and constrained to only one modality, unlike what is presented herein.
Our Contributions. We have constructed a network structure similar to the one devised in , which segments the left atrium and proximal veins from MRI successfully. In this paper, we have extended this segmentation engine in several different ways as follows. (1) A deeper CNN has been utilized as compared to . (2) We have used both CT and MRI to test and evaluate the proposed system while Mortazi et al. used only MRI . (3) We have extended the binary segmentation problem into a multi-label segmentation problem. (4) We have devised a rank based adaptive fusion method to assess effective information from different planes for all delineated objects and select the best fusion strategies for highly accurate and efficient delineation results.
2 Multi-Object Multi-Planar CNN (MO-MP-CNN)
The proposed method is called multi-object multi-planar convolutional neural networks (MO-MP-CNN), and its modules are illustrated in Figure 1
. MO-MP-CNN takes 3D CT or MR scans as an input and parses it to three perpendicular planes: Axial(A), Coronal(C), and Sagittal(S). For each plane (and modality), a 2D CNN is trained to label pixels. CNNs have been trained from scratch to adapt into CT and MRI context. After training each of the 2D CNNs separately, adaptive fusion strategy is utilized by combining the probability maps of each of the CNNs. The details of the CNN and adaptive fusion method are explained in the following.
|Training images (CT)|
|CNN||# of images||Image size|
|Training images (MRI)|
|CNN||# of images||Image Size|
CNN network. The proposed encoder-decoder based network architecture is illustrated in Fig.2. Twelve convolution layers have been used in encoder and decoder separately. In the encoder part, two max-pooling layers have been used to reduce the dimension of the image by half and in decoder part two upsampling
layers (bilinear interpolation) have been used to get the image back to its original size. The size of all filters were set as. Each convolution layer is followed by a batch normalization and Rectified Linear Unit (ReLU)
as an activation function. The number of filters in the last convolution layer is equal to the number of classes (i.e., 8 (background+7 objects)) and is followed by asoftmax function to make a final probability map for each object. Similar to , the simplified z-loss  function has been used to train the network. To provide a sufficient number of training images for the networks, data augmentation has been applied to the training images by rotation and zoom-in operations. The details of the augmentation and the number of data for each CNN are summarized in Table 1.
Multi-object adaptive fusion.
An adaptive fusion strategy has been extended in the way that it can be applied to multi-object segmentation instead of binary segmentation. Let and denote an input and output image pair, where output is the probability map of the CNN. Also, let the final segmentation be denoted as . As shown in Fig.3, is obtained from the probability map by taking the maximum probability of each pixel in all classes (labels). Then, a connected component analysis (CCA) is applied to
to select reliable and unreliable regions, where unreliable regions are considered to come from false positive findings. Although this approach gives a “rough” estimation of the object, this information can well be used for assessing the quality of segmentations from different planes. If it is assumed thatis the number of classes (structures) in the images and is the number of components in each class, then connected component analysis can be performed as follows: . For each class , we can now assign reliability parameters (weights) to increase the influence of planes that have more reliable (trusted) segmentations as follows: , where indicates a weight parameter. In our interpretation of the CCA, the difference between trusted and non-trusted regions have been used to guide the reliability of the segmentation process: the higher the difference is, the more reliable the segmentation is (See Fig.3, weight distribution w.r.t the difference). In test phase, we have simply used those predetermined weights from the training stage.
3 Experimental Results
Dataset and preprocessing: For the experiments and evaluations of the proposed method, we used the STACOM 2017 for whole heart segmentation challenge dataset, containing 20 MR and 20 CT images for training (with ground-truth) and 40 test images without ground-truth for each modality. We performed a 4 fold cross-validation on the dataset such that 15 subjects were used for training and 5 subjects have been chosen for validation for each fold. The CT images were obtained from routine cardiac CT angiography and to cover the whole heart, extending from the upper abdomen to the aortic arch. Axial in-plane resolution was mm and slice thickness was 1.6 mm. The MR images were acquired by using 3D balanced steady state free precession (b-SSFP) sequences, with about 2 mm acquisition resolution in each direction. In preprocessing step, anisotropic smoothing filtering was applied to both CT and MR images prior to segmentation. In addition, histogram matching was used for MR images to alleviate intensity non-standardness issues.
Evaluation: Five metrics were assessed: sensitivity, specificity, precision, dice index (DI), and surface to surface (S2S) distance. A summary of the findings for each structure and also for the whole heart are reported in Table 1. The WHS is the average of all structures. The box-plot for sensitivity, precision, and DI for both CT and MRI and for all structures are shown in Fig. 5. The qualitative results (including difficult cases for segmentation) for CT and MR modalities are illustrated in Fig. 4
. Algorithms were implemented on the Nvidia TitanXp GPUs using Tensorflow. The average time for segmenting the whole heart from the CT volume using three TitanXp GPUs was about 50 seconds. Segmenting using MR volume took about 17 seconds. For comparison, the time on the Intel Xeon Processor E5-2620 with 8 cores for CT images was about 30 minutes and for MR images was about 8 minutes.
4 Discussion and Conclusion
The main goal of the current study is to develop a framework for accurately segmenting the all cardiac substructures from both CT and MR images with high efficiency. The main strength of the proposed method is to train multiple CNNs from scratch and to allow an adaptive fusion strategy for information maximization in pixel labeling despite the limited data and hardware support. Our findings indicate that MO-MP-CNN can be used as an efficient tool to delineate cardiac structures with high precision, accuracy, and efficiency.
Technically, one may question why we did not employ a completely 3D CNN approach instead of utilizing a multi-planar fusion of multiple 2D CNNs. As discussed in , the lack of a large number of 3D images restricts the depth of CNN training, which may highly likely result in sub-optimal implementation. Hence, training large number of 2D slices is much more feasible than utilizing 3D approach with the current setting. In the instance of plentiful GPU processing power and 3D imaging data, training would be optimized using a 3D CNN.
Another limitation of our work stems from the use of the softmax function in the last layer of the proposed network. To explore whether the information loss due to class normalization in this step is significant, further research should be undertaken using information from the layer before the softmax in fusion part and compared with the current system. Finally, further work is needed to establish comparative evaluation of different deep neural network approaches such as ResNet, U-net
, and others. While deeper networks are desirable to achieve higher precision in segmentation tasks, lack of 3D data is a significant limitation for training such a system. Data augmentation and transfer learning have been shown to adequately address such challenges to a certain degree, but there is currently no research proving the optimality of such networks relative to the availability of data at hand.
-  “Cardiovascular Diseases (cvds),” http://www.who.int/mediacentre/factsheets/fs317/en/, 2007, [Online; accessed 30-June-2017].
-  Xiahai Zhuang and Juan Shen, “Multi-scale patch and multi-modality atlases for whole heart segmentation of mri,” Medical image analysis, vol. 31, pp. 77–87, 2016.
-  X Zhuang, S Ourselin, R Razavi, DLG Hill, and DJ Hawkes, “Automatic whole heart segmentation based on atlas propagation with a priori anatomical information,” Medical Image Understanding and Analysis-MIUA, pp. 29–33, 2008.
-  Peng Peng, Karim Lekadir, Ali Gooya, Ling Shao, Steffen E Petersen, and Alejandro F Frangi, “A review of heart chamber segmentation for structural and functional analysis using cardiac magnetic resonance imaging,” Magma (New York, NY), vol. 29, pp. 155, 2016.
-  Aliasghar Mortazi, Rashed Karim, Kawal Rhode, Jeremy Burt, and Ulas Bagci, “CardiacNet: Segmentation of left atrium and proximal pulmonary veins from mri using multi-view cnn,” arXiv preprint arXiv:1705.06333, 2017.
-  Rudra PK Poudel, Pablo Lamata, and Giovanni Montana, “Recurrent fully convolutional neural networks for multi-slice mri cardiac segmentation,” arXiv preprint arXiv:1608.03974, 2016.
-  MR Avendi, Arash Kheradvar, and Hamid Jafarkhani, “A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac mri,” Medical image analysis, vol. 30, pp. 108–119, 2016.
-  Gongning Luo, Ran An, Kuanquan Wang, Suyu Dong, and Henggui Zhang, “A deep learning network for right ventricle segmentation in short-axis mri,” in Computing in Cardiology Conference (CinC), 2016. IEEE, 2016, pp. 485–488.
-  Alexandre de Brébisson and Pascal Vincent, “The Z-loss: a shift and scale invariant classification loss belonging to the spherical family,” arXiv preprint arXiv:1604.08859, 2016.
-  Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al., “TensorFlow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.