3D Cardiac Shape Prediction with Deep Neural Networks: Simultaneous Use of Images and Patient Metadata

07/02/2019 ∙ by Rahman Attar, et al. ∙ University of Leeds 0

Large prospective epidemiological studies acquire cardiovascular magnetic resonance (CMR) images for pre-symptomatic populations and follow these over time. To support this approach, fully automatic large-scale 3D analysis is essential. In this work, we propose a novel deep neural network using both CMR images and patient metadata to directly predict cardiac shape parameters. The proposed method uses the promising ability of statistical shape models to simplify shape complexity and variability together with the advantages of convolutional neural networks for the extraction of solid visual features. To the best of our knowledge, this is the first work that uses such an approach for 3D cardiac shape prediction. We validated our proposed CMR analytics method against a reference cohort containing 500 3D shapes of the cardiac ventricles. Our results show broadly significant agreement with the reference shapes in terms of the estimated volume of the cardiac ventricles, myocardial mass, 3D Dice, and mean and Hausdorff distance.



There are no comments yet.


page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Cardiovascular disease (CVD) is the most prevalent cause of death worldwide [1]. Early quantitative assessment of cardiac function and structure allow for proper preventive care, and early cardiovascular treatment. To support such an approach, analysis and interpretation of large-scale population-based cardiovascular magnetic resonance (CMR) imaging studies are of high importance in the medical image analysis community. This helps to identify patterns and trends across population groups, and accordingly, reveal insights into key risk factors before CVDs fully develop.

We believe that true 3D analysis is essential for the structural assessment of global and regional cardiac function. We propose a new approach that ensures the global coherence of the cardiac anatomy and naturally lends itself to further analysis in which full 3D anatomy is necessary; for example, in mechanical and flow simulations, or modelling the relationship between cardiac morphology and patient information such as: socio-demographic, lifestyle and environmental, family history, genetic, and omics data.

Though fully automatic 3D segmentation is required for further analysis, the complexity of anatomical structures and their local intensity variation across a population cohort make it challenging. Statistical 3D shape model-based approaches such as [1]

have been successfully used for automatically segmenting cardiac structures and generating associated function indexes. This is mainly attributed to the inclusion of prior knowledge of the cardiac shape into the segmentation method. These segmentation approaches typically use very simple features such as gradients on intensity profiles to fit a 3D model. This is an iterative process in which the goal is to minimise the Mahalanobis distance between an intensity profile sampled at a candidate position and its corresponding intensity appearance model by deforming the shape within its range of normal variation to match the image data. On the other hand, in the last decade, fully convolutional networks (FCN) have shown great potential in image-based pattern recognition in a variety of tasks, including cardiac segmentation. However, their output results are, by nature, 2D segmentations masks for every short axis (SAX) and long axis (LAX) CMR slices. Although these 2D masks are sometimes extended via a further step of non-rigid registration to a 3D atlas to produce a 3D cardiac shape 

[2], this is not efficient for learning topological shape information.

In this paper, we propose to exploit image features obtained using deep FCNs trained on both SAX and LAX views, along with the rich shape priors learned using statistical shape models, to jointly and simultaneously predict the parameters of 3D cardiac shapes, instead of a pixel-wise classification across each 2D slice. Another significant aspect of this work is the integration of patient metadata into the process of shape prediction using a Multilayer Perceptron (MLP). This information, which is currently ignored in cardiac segmentation or shape generation, has been shown in different clinical studies to have an impact on cardiac morphology and structure 

[3]. We hope this work inspires other researchers to exploit the priors offered by patient metadata in other applications for potentially more accurate and patient-specific models.

The contributions of this paper are three-fold:

we propose 1) an innovative end-to-end deep neural network that directly predicts 3D shape parameters derived from a Principal Component Analysis (PCA) space; 2) a novel approach using two CMR image views and patient metadata simultaneously to predict cardiac shape; 3) a creative loss function defined in the domain of 3D shape parameters which weights each PCA mode of variation independently, prioritising the more significant modes and leading to more accurate shape prediction.

2 Methods

2.1 Reference 3D Shapes of Cardiac Ventricles

We generated a reference cohort of 3D shapes through the non-rigid registration of a 3D biventricular model to a set of 3D points obtained from manual delineations using the Coherent Point Drift (CPD) method [4]. The 3D model is comprised of two structures; the Left Ventricle (LV) and the Right Ventricle (RV). The LV is a closed water-tight mesh comprising both endo and epicardial walls. The RV is an open mesh representing only the RV endocardium. The RV has two openings, the atrioventricular valve opening, and the pulmonary valve opening. Figure 1 shows a sample of manual 2D contours and its corresponding 3D shape obtained from the CPD method.

Figure 1: An example 3D shape of cardiac ventricles constructed from a stack of 2D manual contours on SAX view slices.

2.2 Point Distribution Mode (PDM)

The PDM encodes the mean and variance of the 3D cardiac shapes. The PDM is constructed during training using PCA on a set of generalised Procrustes-aligned shapes. Assume a training set of

shapes, each described by points in , i.e., with and . Further, let be the

-th vector representing the

-th shape. Finally, let be the set of all training shapes in matrix form. The shape class mean and covariance of is calculated as follows:


The shape covariance is represented in a low-dimensional PCA space. This provides eigenvectors

, and corresponding eigenvalues

computed through the Singular Value Decomposition of the covariance matrix . Hence, assuming the shape class follows a multi-dimensional Gaussian probability distribution, any shape in the shape class can be approximated from the following linear generative model:


where are shape parameters restricted to ; we typically set to capture of shape variability. The shape parameters of can then be estimated as follows:


Here, the entries of b are the projection coefficients of mean-centred shapes along the columns of .

2.3 Images and Metadata

Each CMR image volume was pre-processed as follows. Each 9-slice SAX stack was intensity normalised by saturating the top 0.2% of intensities and scaling between 0 and 1, and spatially normalised by aligning a fixed point, defined as the average point of intersection between the three LAX views and each SAX slice, and angle, defined as the angle of the 4-chamber LAX, to a standard location and angle respectively. A 6464 px region of interest (ROI) was then sampled from each slice at a 2 mm isotropic resolution. The corresponding 4-chamber LAX views were similarly intensity normalised, with a 80

60 px ROI sampled at the same 2 mm resolution around the point of intersection between the three LAX views and the basel SAX slice, and zero-padded to 80

80 px. Table 1

shows the summary of the metadata available for every image volume including both continuous and categorical variables. All variables were scaled to the range [0, 1], including categorical variables (viz.

sex/alcohol , smoking ).

Type   Metadata   Range
Continuous   Age (years)   61  7
  Weight (kg)   76  15
  Height (cm)   170  9
  Body mass index (kg/m2)   27  4
  Body surface area (m2)   1.8  0.2
  Heart rate (bpm)   68  11
  Diastolic blood pressure (mmHg)   79  11
  Systolic blood pressure (mmHg)   139  19
Categorical   Sex   malefemale
  Smoking status   neverpreviouscurrent
  Alcohol consumed   yesno
Table 1: Summary of the subject metadata used in this study.

2.4 Network Architecture and Loss Function

Fig. 2 shows a diagram of the proposed method. The network has three inputs: SAX view images, LAX view image, and metadata. The output is the predicted shape parameters . To train the proposed architecture, we introduce the following loss function for training:


where is the number of shape parameters, denotes the network parameters, denotes the absolute error of the difference between the reference value () and the value () predicted by the network. denotes a weighting function depending on the importance of the mode of variation on shape prediction, i.e. it assigns a higher weight to first modes of variation in the shape’s PCA space. The first modes of variations in the PCA space are critical as they are the main parameters to affect the shape structure. Predicting these accurately is, therefore, more important as they have the greatest control over the final predicted shape. Ultimately, having the mean shape, eigenvectors and predicted shape parameters, the final shape can be predicted using Eq. 2.

Figure 2:

The proposed method extracts a high-level representation of the image from SAX and LAX views using two FCNs, and concatenates the image features together along with the output of an MLP network applied to the metadata. Four fully connected layers with ReLU or Sigmoid activation functions and batch normalisation then produce the

parameters in PCA space which describe the 3D shape of the cardiac ventricles.
Figure 3:

The architecture of the two FCNs used in this study to obtain a vector of features representing the image information derived from the LAX and SAX views. A separate network is used for each view, with the LAX and SAX networks containing 9 and 15 layers respectively. The FCNs are composed of convolutional layers, with ReLUs and max-pooling.

is the image width and height, and are the number of slices in each image volume and the number of activation maps respectively.

As illustrated in Fig. 3, the FCN used in this work has been adapted from the down-sampling path of a U-Net [5]

architecture with an encoder depth of 2 for the LAX and 4 for the SAX images. The last layer of the FCNs have an extra convolutional layer with the kernel size of the current feature map dimensions to produce a vector of features - 1024 for SAX and 256 for LAX. The MLP has 11 inputs (size of metadata feature vector), 3 hidden layers (with 16, 32, and 64 neurons), and an output layer (with 128 neurons). ReLU is used in hidden and output layers.

The outputs of the three sub-networks are concatenated to construct one feature vector (with the size of 1024+256+128=1408 neurons) that contains the behavioural, phenotypic, and demographic information derived from the metadata in addition to visual information from the imaging data. This information is fed into four fully connected layers, with ReLU (first two layers) and Sigmoid (last two layers) activation functions and batch normalisation, so that, by minimising from Eq. 4, they produce the first parameters in PCA space which describe the 3D shape of the cardiac ventricles. To capture 99.7% of shape variability in the training dataset we set and regress only those parameters from randomly initialised weights.

3 Experiments and Results

3.1 Data and Annotations

We performed experiments on 3500 CMR image volumes from the UK Biobank (UKB) using both end-diastolic and end-systolic time points. In terms of population sample size, experimental setup, and quality control, the most reliable reference annotations of cardiovascular structure and function found in the literature are those reported by [6], in which CMR scans were manually delineated and analysed by a team of eight expert observers. These delineations were used to generate the reference 3D shapes, as explained in Sec. 2.1. The dataset was randomly split into a training (3000) and test set (500). The performance is reported on the test set with mean standard deviation.

3.2 Implementation and Training

The method was implemented using Python and Tensorflow. The network was trained using Adam for optimising the loss function (Eq. 

4) with the learning rate of 0.001 and iteration number of 50,000 with a batch size of 10 subjects, all of which were determined empirically. There was no data augmentation. Training took 10 hours on Nvidia Tesla V100 GPUs hosted by Amazon Web Service and accessed using the MULTI-X platform [7]. At test time, it took less than a second to predict the shape parameters.

3.3 Accuracy of Predicted Shapes

Fig. 4 shows some samples of ventricular shapes generated by our proposed method (in purple) overlaid with the corresponding reference shapes (in grey). It confirms that the network is capable of predicting accurate shape parameters to generate shapes very similar to the reference shapes obtained by manual delineations. To quantify the amount of similarity, we evaluated the performance of the proposed method by computing the Dice index (), and the mean () and Hausdorff distance () between reference and predicted shapes. Since this method outputs the parameters of a shape in the space, we first align the two shapes by removing their orientation and translation before computing the aforementioned metrics. is between 0 and 1, with a higher indicating a better match between the two shapes. and measure the mean and maximum distance, respectively, between the two surfaces, with a lower value indicating a better the agreement. Moreover, we report the effect of including the metadata in Table LABEL:IMG+MTDT. As expected, the use of the metadata alongside the image information improves the network, leading to a more accurate prediction in all cardiac substructures. In addition to comparing against reference measurements, we also compare against one baseline method proposed by Attar et al. [1] in which the authors carried out 3D analysis of the UKB CMR images using a shape model-based approach where the model is fitted during an iterative process using traditional intensity profiles.

Figure 4: Three samples of the generated 3D shapes of LV and RV. The gray shape is the reference whereas the purple shape is the predicted.
LV endocardium LV epicardium RV endocardium
0.910.05 0.820.09 0.900.04 0.920.05 0.830.08 0.930.05 0.880.08 0.790.09 0.900.08
1.850.75 3.450.94 1.810.70 1.800.62 3.020.84 1.820.66 2.020.72 3.000.91 2.000.70
3.761.52 8.781.96 3.111.49 3.321.38 7.691.77 3.551.49 8.323.12 12.115.21 7.053.03
Table 2: Comparison of shape prediction accuracy using only images (IMG) or images with metadata (IMG+MTDT) in terms of (%), (mm) and (mm) for LV endo-/epicardium and RV endocardium. Bold indicates best performing method.

As shown in Table LABEL:IMG+MTDT, values show excellent agreement between reference and predicted shapes (). values are comparable to the in-plane pixel spacing range of 1.8 mm to 2.3 mm found in the UKB. Although is larger, it is still within an acceptable range when compared with the distance range seen in [1] or [2]. Note that the performance of the proposed method on RV is consistently better than the other approaches. Furthermore, we report the absolute and relative difference of the main cardiac function indexes (viz. LV and RV volume (ml) and myocardium mass (g)) derived from the predicted and the reference shapes in Table LABEL:clinical. The proposed method achieved significantly lower error in volume and mass estimation, with p

0.001 in paired t-tests.

Absolute difference Relative difference (%)
LV Volume 7.515.42 9.806.33 6.014.98 9.508.80 10.319.45 8.035.05
LV Mass 8.425.22 10.118.14 7.115.14 9.108.01 12.039.22 8.127.54
RV Volume 10.597.16 12.6210.14 9.245.20 11.368.11 14.559.89 10.037.00
Table 3: Comparison of the absolute and relative difference between the reference and predicted shapes. Bold indicates best performing method.

Overall, the proposed method (IMG+MTDT) has superior accuracy to reference shapes than [1], while being on average 30 times faster during test time. This can be attributed to the combined use of image and patient metadata within a single network to directly predict shape parameters. The introduction of the metadata yielded a substantial positive impact on shape prediction with a 15% average improvement across all metrics. We believe that including this information provides the network with a variable prior by allowing it to learn the likely distributions of shape parameters across different populations.

4 Conclusion

In this study, we presented a fully automatic framework capable of producing 3D cardiac shapes via the simultaneous use of images and patient metadata. We validated our workflow on a reference cohort of 500 subjects for which ground truth shapes exist with promising results. In particular, we showed a significant positive impact from including the metadata. As future work, in addition to investigating the effect of other clinical variables on shape prediction, we would like to increase the robustness of our pipeline to locate the shape in the image space, and handle severe pathological morphology, variable image quality, and alternative modalities. We also plan to explore the use of patient metadata in other deep learning applications.

Acknowledgements  RA was funded by the School of Computing PhD Scholarship, University of Leeds. AFF acknowledges support from the Royal Academy of Engineering Chair in Emerging Technologies Scheme (CiET181919) and the MedIAN Network (EP/N026993/1) funded by the Engineering and Physical Sciences Research Council (EPSRC).


  • [1] R. Attar, M. Pereañez, A. Gooya, X. Albà, L. Zhang, S. K. Piechnik, S. Neubauer, S. E. Petersen, and A. F. Frangi, “High Throughput Computation of Reference Ranges of Biventricular Cardiac Function on the UK Biobank Population Cohort,” in International Workshop on Statistical Atlases and Computational Models of the Heart, pp. 114–121, Springer, 2018.
  • [2] J. Duan, G. Bello, J. Schlemper, W. Bai, T. J. Dawes, C. Biffi, A. de Marvao, G. Doumou, D. P. O’Regan, and D. Rueckert, “Automatic 3D Bi-ventricular Segmentation of Cardiac Images by a Shape-refined Multi-task Deep Learning Approach,” IEEE transactions on medical imaging, 2019.
  • [3] K. Gilbert, W. Bai, C. Mauger, P. Medrano-Gracia, A. Suinesiaputra, A. M. Lee, M. M. Sanghvi, N. Aung, S. K. Piechnik, et al., “Independent Left Ventricular Morphometric Atlases Show Consistent Relationships with Cardiovascular Risk Factors: A UK Biobank Study,” Scientific reports, vol. 9, no. 1, p. 1130, 2019.
  • [4] A. Myronenko and X. Song, “Point Set Registration: Coherent Point Drift,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 12, pp. 2262–2275, 2010.
  • [5] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in International Conference on Medical image computing and computer-assisted intervention, pp. 234–241, Springer, 2015.
  • [6] S. E. Petersen, N. Aung, M. M. Sanghvi, F. Zemrak, K. Fung, J. M. Paiva, J. M. Francis, M. Y. Khanji, E. Lukaschuk, A. M. Lee, et al., “Reference Ranges for Cardiac Structure and Function Using Cardiovascular Magnetic Resonance (CMR) in Caucasians from the UK Biobank Population Cohort,” Journal of Cardiovascular Magnetic Resonance, vol. 19, no. 1, p. 18, 2017.
  • [7] M. H. de Vila, R. Attar, M. Pereanez, and A. F. Frangi, “MULTI-X, a state-of-the-art cloud-based ecosystem for biomedical research,” in 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1726–1733, IEEE, 2018.