Automatic quantification of the LV function and mass: a deep learning approach for cardiovascular MRI

12/14/2018 ∙ by Ariel H. Curiale, et al. ∙ 2

Objective: This paper proposes a novel approach for automatic left ventricle (LV) quantification using convolutional neural networks (CNN). Methods: The general framework consists of one CNN for detecting the LV, and another for tissue classification. Also, three new deep learning architectures were proposed for LV quantification. These new CNNs introduce the ideas of sparsity and depthwise separable convolution into the U-net architecture, as well as, a residual learning strategy level-to-level. To this end, we extend the classical U-net architecture and use the generalized Jaccard distance as optimization objective function. Results: The CNNs were trained and evaluated with 140 patients from two public cardiovascular magnetic resonance datasets (Sunnybrook and Cardiac Atlas Project) by using a 5-fold cross-validation strategy. Our results demonstrate a suitable accuracy for myocardial segmentation (∼0.9 Dice's coefficient), and a strong correlation with the most relevant physiological measures: 0.99 for end-diastolic and end-systolic volume, 0.97 for the left myocardial mass, 0.95 for the ejection fraction and 0.93 for the stroke volume and cardiac output. Conclusion: Our simulation and clinical evaluation results demonstrate the capability and merits of the proposed CNN to estimate different structural and functional features such as LV mass and EF which are commonly used for both diagnosis and treatment of different pathologies. Significance: This paper suggests a new approach for automatic LV quantification based on deep learning where errors are comparable to the inter- and intra-operator ranges for manual contouring. Also, this approach may have important applications on motion quantification.



There are no comments yet.


page 6

page 10

page 12

page 13

page 14

page 18

page 19

page 23

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Some of the most relevant global structural features for quantification of the cardiac function are the left ventricular mass (LVM), the left ventricular volume (LVV) and the ejection fraction (EF), which is directly derived from the left ventricle (LV) at end-diastole (ED) and end-systole (ES). Left ventricle function and mass are of paramount importance for both prognosis and treatment of different cardiac pathologies such as mitral regurgitation, ischemia and myocarditis (21; 18; 11; 8; 31). For instance, LVM is considered as an independent predictor of cardiovascular events, while LVV is associated with adverse remodeling (1; 13; 32). Cardiovascular magnetic resonance (CMR) is one of the most accurate non-invasive diagnostic tools for imaging of cardiac structure and function (35). Usually, it is considered as the gold standard for LV mass and volume quantification (12). In this way, manual or semi-automatic delineation by experts is currently the standard clinical practice for chamber segmentation from CMR images. This step is essential for quantification of global features, such as the estimation of ventricular volume, ejection fraction and myocardial mass. Despite the efforts of researchers and medical vendors, global quantification and volumetric analysis still remain time consuming tasks which heavily rely on user interaction. For example, a diastolic functional evaluation performed by using dedicated software with manual segmentation of basal section commonly takes 25 minutes by an experienced radiologist (14). Thus, there is still a significant need for tools that allow automatic 3D quantification. The main goal of this work is to provide an accuracy and suitable approach for LV function and mass quantification for CMR based on deep convolutional neural networks.

Recently, convolutional neural networks (CNNs) (19) have been successfully used for solving challenging tasks like classification, segmentation and object detection, achieving state-of-the-art performance (20). In fact, fully convolutional networks trained end-to-end have been recently used for medical images, for example, for cell, prostate and myocardial tissue segmentation (27; 24; 7). These models, which serve as an inspiration for our work, employ different types of network architectures and were trained to generate a segmentation mask that delineates the structures of interest in the image. Which in our case are the myocardial tissue and blood pool for the LV.

Deep neural networks, such as CNN, are very useful tools for pattern recognition, however, one of the main disadvantages arises from the complexity of their training stage. For instance, the distribution of each layer’s inputs changes along this process, as the parameters of the previous layers change. This phenomenon is called internal covariate shift 

(29). As the networks start to converge, a degradation problem occurs: with increasing network depth, accuracy gets saturated. Unexpectedly, such degradation is not caused by overfitting, and adding more layers to a suitably deep model leads to higher training error, as it was reported by (15)

. To overcome this problem the technique of batch normalization has been proposed 


. This procedure involves the evaluation of the statistical properties of the neural activations that are present for a given batch of data in order to normalize the inputs to any layer to obtain some desired objective (such as zero mean and unit variance). At the same time, the architecture is modified in order to prevent loss of information that could arise from the normalization. This technique allows to use much higher learning rates and also acts as regularizer, in some cases eliminating the need for dropout 


Another recent performance improvement of deep neural networks has been achieved by reformulating the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions (16). In other words, the parameters to be determined at a given stage generate only the difference (or residual) between the objective function to be learned and some fixed function such as the identity. It was empirically found that this approach gives rise to networks that are easier to optimize, which can also gain accuracy from considerably increased depth (16).

The most straightforward way of improving the accuracy of deep neural networks is by increasing the number of levels (deep size) and units by level (width size). In this way, it is possible to train higher quality models. However, bigger size implies an increase of computational resources because a large number of parameters have to be trained which ends, among other things, in overfitting. Both issues can be overcome by introducing the notion of sparsity as it was proposed in the Inception network (33). The fundamental hypothesis behind Inception is that cross-channel correlations and spatial correlations are sufficiently decoupled that it is preferable not to map them jointly as it was pointed out in the Xception architecture (3). Indeed, the Xception is based entirely on depthwise separable convolution layers which allows to extremely reduce the network complexity regarding to the number of trainable parameters, but maintaining or improving the generalization power.

In this work, we explore the idea of information sparsity described in the Inception, and also, the notion of depthwise separable convolutions introduced by (3). Both ideas are introduced into a U-net  architecture (27) for myocardial tissue classification and three new deep learning networks are analyzed for this task. Then, the most accurate approach for myocardial tissue classification is used for quantification of the LV function and mass. Unlike previous works for myocardial tissue classification (7), we propose to use the generalized Jaccard distance (6) as optimization objective to properly handle fuzzy sets. Results demonstrate that the our approach outperforms previous methods based on the U-net  architecture (7) and provides a suitable automatic approach for myocardial segmentation and cardiac function quantification.

The paper is structured as follows: in Section 2 the fully automatic approach based on the U-net  network is introduced. Additionally, the training strategy and the optimization objective function is presented. In Section 3, the idea of information sparsity, depthwise separable convolutions and residual learning level to level are evaluated for myocardial tissue classification. Finally, we present the conclusions in Section 4.

2 Materials and methods


The proposed approach for automatic quantification of the LV function and mass was trained and evaluated with 140 patients from two public CMR datasets: the Sunnybrook Cardiac Dataset (SCD) (26) and the Cardiac Atlas Project (CAP) (9).

The SCD dataset, also known as the 2009 Cardiac MR Left Ventricle Segmentation Challenge data, consists of 45 cine-MRI images from a mix of patients and pathologies: healthy, hypertrophy, heart failure with infarction and heart failure without infarction. A subset of this dataset was first used in the automated myocardium segmentation challenge from short-axis MRI, held by a MICCAI workshop in 2009. The whole complete dataset is now available in the Cardiac Atlas Project dataset with public domain license111 The 45 cardiac cine-MR were acquired as cine steady state free precession (SSFP) MR short axis (SAX) with 1.5T General Electric Signa MRI. All the images were obtained during 10-15 second breath-holds with a temporal resolution of 20 cardiac phases over the heart cycle, a mean spatial resolution of 1.36  x 136 x 9.04 mm (255 x 255 x 11  pixels), and scanned from the ED phase. In these SAX MR acquisitions, endocardial and epicardial contours were drawn by an experienced cardiologist in all slices at ED, and only endocardial contours were provided at ES. All the contours were confirmed by another cardiologist.

The Cardiac Atlas Project provides a set of CMR for 95 patients from a prospective, multi-center, randomized clinical trials in patients with coronary artery diseases and mild-to-moderate left ventricular dysfunction. All cines SSFP were acquired during a breath-hold of 8-15 seconds duration with a typical thickness 10 mm, gap 2 mm, TR 30-50 ms, TE 1.6 ms, flip angle , FOV 360 mm, and mean spatial resolution of 1.48 x 1.48 x 9.3 mm (245 x 257 x 12  pixels). Sufficient short-axis slices were acquired to cover the whole heart in SAX. Also, in these acquisitions the myocardial manual segmentation were provided.


The method described in this section was intentionally designed to measure the LV function and mass by means of identifying the myocardial tissue an the blood pool for the LV as it is shown in Fig. 1. To this end, a deep learning approach is introduced for detecting a proper region of interest (ROI) around the LV, and also, for myocardial tissue and blood pool classification. Then, the LV function and mass is derived as follows (10):

Figure 1: Workflow of the proposed method.
  • Left ventricle mass (LVM): is directly derived from the myocardial segmentation by making two main assumptions: (a) the interventricular septum is assumed to be part of the LV and (b) the myocardial volumen is equal to the total volume contained within the epicardial borders of the ventricle, , minus the chamber volume, at end-diastolic frame ():

    where corresponds to the myocardial tissue density. LVM is usually normalized to total body surface area or weight in order to facilitate interpatient comparisons.

  • Stroke volume (SV): is defined as the volume ejected between the end of diastole and the end of systole:

  • Cardiac Output (CO): The blood flow which is delivered by the heart with oxygenated blood to the body is known as the cardiac output and is expressed in liters per minute.

    where is the heart rate. Since the magnitude of CO is proportional to body surface, if an interpatient comparison is required, it should be adjusted by the body surface area.

  • Ejection Fraction (EF): is a global index which is generally considered as one of the most meaningful measures of the LV pump function. It is defined as the ratio between the SV and the end-diastolic volumen:

Figure 2: Network Architecture proposed for myocardial segmentation in cardiac MRI. The number of channels/features is denoted on top of the box and the input layer dimension is provided at the lower left edge of the box. The arrows denote the different operations according to the legend. Withe blocks correspond to the differences with respect to previous approaches.

The LV-ROI detection and blood/myocardial tissue classification are performed by using the same CNN approach based on the U-net  . This CNN architecture is depicted in Fig. 2. The LV-ROI detection is carried out by using low spatial resolution images at end-diastole, while the blood pool and myocardial tissue classification is done by using the original information in the previously detected ROI around the LV (128 x 128 pixels with 190 x 190 mm). In particular, the ROI size was defined to ensure that the whole myocardial tissue will be covered.

In the proposed deep learning architecture, the network learns how to encode information about features presented in the training set (left branch on Fig. 2). Then, in the decode path the network learns about the image reconstruction process from the encoded features learned.

The specific feature of the U-net  architecture lies on the concatenation between the output of the encode path, for each level, and the input of the decoding path (denoted as big gray arrows on Fig. 2). These concatenations provide the ability to localize high spatial resolution features to the fully convolutional network, thus, generating a more precise output based on this information. As mentioned in (27) this strategy allows the seamless segmentation of large images by an overlap-tile strategy.

The encoding path consists of two 3 x 3 convolution, each followed by a batch normalization and a residual learning just before performing the 2 x 2 max pooling operation with stride 2 as it was described in 

(7). In fact, the batch normalization is performed right after each convolution and before activation, following (17)

. Also, non rectified linear unit (ReLU) is applied right after the addition used for residual learning as it is depicted in Fig. 

2. At each downsampling step we double the number of feature channels, that is initially set to 64.

Every step in the decoding path can be seen as the mirrored step of the encode path, i.e. each step in the decoding path consists of an upsampling of the feature map followed by a 2 x 2 convolution (“up-convolution”) that halves the number of feature channels, a concatenation with the corresponding feature map from the encoding path, and two 3 x 3 convolutions, each followed by a batch normalization and a residual learning. Finally, a residual output learning is introduced from the upsampled previous level (left side Fig. 2). At the final layer, a 1 x 1 convolution is carried out to map the 64 feature maps to the two classes used for the myocardial segmentation (myocardium and endocardium). The output of the last layer, after soft-max non-linear function, represents the likelihood of a pixel belongs to the myocardium or endocardium of the left ventricle. Indeed, only those voxels with higher likelihood () are considered as part of the left ventricle tissue.


The CNN used for detecting a ROI around the LV was trained with low spatial resolution images and their corresponding manual segmentation (64 x 64 pixels) with a mean spatial resolution of 5.76 x 5.76 mm. In contrast, the blood pool and myocardial tissue classification was performed by training a CNN with cropped images in short-axis (128 x 128 pixels) according to the previously LV-ROI detection with a mean spatial resolution of 1.44 x 1.44 mm. A 5 fold cross-validation strategy is used for training the CNN’s due to the reduced number of patients. In this way, 5 sets of training/validation were used for each dataset. A total of 4048 2D images were used for training (36 patients from SCD and 76 patients from CAP) and 1003 2D images for testing (8 patients from SCD and 19 patients from CAP). In both CNN’s the optimization was carried out by using the stochastic gradient descent (Adaptive Moment Estimation) implementation of Keras 

(2) with a learning rate of . Also, the generalized Jaccard distance (6) is used as the loss objective function to properly handle fuzzy sets:

where and are the myocardial segmentation prediction and the ground truth segmentation, respectively.

Annotated medical information like myocardial classification is not easy to obtain due to the fact that one or more experts are required to manually trace a reliable ground truth of the myocardial classification. So, in this work it was necessary to augment the original training dataset in order to increase the examples from 4048 to 20000 2D images. Also, data augmentation is essential to teach the network the desired invariance and robustness properties. Heterogeneity in the cardiac MRI dataset is needed to teach the network some shift and rotation invariance, as well as robustness to deformations. With this intention, the input of the network was randomly deformed by means of a spatial shift in a range of 10% of the image size, a rotation in a range of in the short axis, a zoom in a range of 2x or by using a gaussian deformation field ( and

randomly chosen) and B-spline interpolation (Fig. 


Figure 3: Examples of the data augmentation used for training the CNN for one of the five cross-validation set for the ROI detection (a) and tissue classification (b).

3 Results

Three sets of experiments are conducted to evaluate the proposed methodology on both datasets (Sunnybrooks and CAP). First, the our approach was evaluated to measure the accuracy of the LV-ROI detection. Second, the accuracy on myocardial tissue classification was study for different CNN’s. Finally, the third set of experiments were designed to measure the accuracy of the proposed automatic approach for quantification of the LV function and mass as it was described in Section 2.

In the experiments, the papillary muscles (PM) were excluded because only the Sunnybrooks dataset contains this information. So, the proposed method will avoid to detect the PM as part of the myocardial tissue. The mean squared error and the Dice’s coefficient were used to measure the accuracy of the proposed CNN for an easy comparison with others methods described in the bibliography.

ROI detection

The spatial difference and the mean squared error between the center of mass derived from the manual, , and predicted upsampling segmentation (LV + myocardial tissue) are used to measure the CNN accuracy for detecting a ROI around the LV. Results show that the proposed CNN approach reaches a proper accuracy for detecting a ROI around the LV with a mean error and a spatial difference below 2 and 4 pixels respectively (Fig. 3(b) and Fig. 3(a)). Also, a qualitative analysis shows that the proposed approach reaches a suitable precision for this task. An example of the LV-ROI detection approach is depicted in Fig. 5.

Figure 4: ROI detection accuracy. (a) 2D spatial error for detecting the center of the left ventricle. (b) Absolute error.
Figure 5: Example of the region around the left ventricle detected by the proposed approach. The myocardial tissue and blood pool classification is depicted in red as well as the center of the left ventricle in the original image.

Myocardial tissue clasification

The U-net  proposed in (7) was evaluated with respect to two new architectures on both datasets with a shrink factor of two (i.e. the original input size was reduced by a factor of 2). These new architectures were designed to introduce the main ideas behind Inception and Xception into the U-net  architecture.

The sparsity of the information can be covered by convolutions over larger patches as it was pointed out in the Inception architecture. This idea is introduced into the U-net  architecture (uInception) as it is shown in Fig. 6, where the main differences with respected to the proposed approach are depicted as white blocks. To increase the kernel size and maintain the same network complexity it is necessary to perform a dimension reduction/expansion as it was done in the Inception architecture. In particular, this step is performed by a 1 x 1 convolution (N/2 and N features) without activation (see box (a) in Fig. 6). It is important to note that the size of some of the convolution layers changes according to the level. These convolution layers are described in Fig. 6 as Z x 1 and 1 x Z where and for the encoding and decoding path, and for the level 5.

Figure 6: The proposed uInception architecture for myocardial segmentation in cardiac MRI which introduces the idea of sparsity described in the Inception architecture. The number of channels/features is denoted on top of the box and the input layer dimension is provided at the lower left edge of the box. The arrows denote different operations according to the legend. Withe blocks correspond to the differences with respect to the proposed approache. The Zx1 and 1xZ convolutions refer to convolutions with different sizes according to the level, i.e. and for the encoding and decoding path, and for the level 5.
Figure 7: The proposed uXception architecture for myocardial segmentation in cardiac MRI which introduces the idea of depthwise separable convolution used in the Xception and the sparcity described in the Inception. The number of channels/features is denoted on top of the box and the input layer dimension is provided at the lower left edge of the box. The arrows denote different operations according to the legend. Withe blocks correspond to the differences with respect to the proposed approache. The Zx1 and 1xZ convolutions refer to convolutions of different sizes according to the level, i.e. and for the encoding and decoding phases.

The other architecture studied combines the concepts of the depthwise separable convolution described in the Xception (3) and the sparsity idea of the Inception architecture. The new architecture, named uXception, is depicted in Fig. 7. In a similar way as it was done for the uInception, the main differences with respected to the proposed approach are depicted as white blocks. In this case, the first level is exactly the same as the uInception and U-net  with batch normalization (BN) and residual learning (RS). Then, each level of the encode path introduce a residual input learning with a 1 x 1 convolution without activation. Also, a mirrored residual learning is introduced into the decode path by means of an 1 x 1 up-convolution (see the red arrow after the ReLu activation for the decode path in Fig. 7). Finally, the last level consists of four blocks of two 5 x 5 separable convolution, followed each convolution with a batch normalization and summed up with a residual learning (Fig. 7 box (b)). Furthermore, the size of some of the convolution layers changes according to the level in the same way as it was described in the uInception architecture.

Empirical results show that the uInception and uXception outperform the U-net  accuracy on the first 25 epochs, and they achieve similar performance after 200 epochs (Fig. 

8 and Table 1). We believe that this similar behavior is due to the reduced number of images used for training. Indeed, if the size of the dataset increases, it is expected that the uInception and uXception outperform the U-net  with BN and RL because they present differences in the network capacity. Also, it is important to note that the uXception approach reduces the network complexity in about 25.5% with respect to the number of parameters to be trained (Table 1 parameter count), and it keeps the same accuracy for myocardial tissue classification. In the same way as the Xception improves the accuracy of the Inception when the dataset size increase (3), we expect that the uXception outperforms the uInception architecture too.

Figure 8: Mean accuracy over the 5 fold cross-validation for the architectures studied with an input shrink factor of 2 (i.e. the original input size was reduced by a factor of 2).
Architecture Parameter count Dice’s (std) MSE (std) MAE (std)
U-net  - BN - RL 32,455,682 0.870 (0.0053) 0.0135 (0.0006) 0.0137 (0.0006)
uInception 35,858,242 0.869 (0.0051) 0.0136 (0.0008) 0.0138 (0.0008)
uXception 24,181,570 0.868 (0.0047) 0.0138 (0.0007) 0.0140 (0.0007)
Table 1: Network complexity and myocardial segmentation accuracy over the 5-fold cross-validation (mean and std) for the architectures studied with an input shrink factor of 2 (i.e. the original input size was reduced by a factor of 2). MSE: mean squared error. MAE: mean absolute error. BN: Batch normalization. RL: Residual learning revisar….

The residual learning from level to level used in the Xception architecture was introduced into the uXception approach as the residual input and output learning (see Fig. 7 red arrows in the encode and decode path). Both residual reinforcement, input and output (RO), were evaluated first into the uXception architecture, and then into the U-net  . Results show that the residual input can be removed from the uXception approach without losing accuracy. Instead, it was observed a small improvement on the myocardial tissue classification if it is removed (Fig. 8(a)). Then, the effect of the residual output learning was analyzed for the U-net  architectures. The analysis of the residual input learning was omitted because it showed to degrade the myocardial tissue classification for the uXception architecture. Figure 8(b) shows that the use of RO improves the tissue classification in an early stage of training for the U-net  architecture. Due to the reduced number of patient used in this study, both architectures get similar accuracy after 200 epochs. But, we believe that U-net  with RO will outperform the U-net  architecture when the dataset size increases because the network capacity increases when the RO is introduced, in a similar way as what happened in (3) with the Xception and Inception architectures.

Figure 9: Mean accuracy over the 5 fold cross-validation for the uXception and U-net  architectures with an input shrink factor of 2 (i.e. the original input size was reduced by a factor of 2). RO: residual output.
Figure 10: Accuracy on training performance over the 5 fold cross-validation for the proposed U-net  and uXception architecture with residual output for myocardial segmentation in cardiac MRI.
Figure 11: Accuracy of the proposed architecture for myocardial segmentation in cardiac MRI for each dataset (left), and also for end-diastole and end-systole (right). Only the endocardial contours were used for the SB database at end-systole. CA: Cardiac Atlas dataset; SB: Sunnybrooks dataset; ED: end-diastole; ES: end-systole.

Last, the proposed U-net  architecture with RO and the uXception were evaluated on both dataset without using a shrinking factor. Figure 10 shows that the U-net  with RO architecture is slightly more accurate than uXception for the datasets used. And, the U-net  with RO approach reaches a suitable accuracy in few training iterations. i.e. in 100 epochs. In this case, the proposed CCN reaches a Dice’s median value of for both dataset (Fig. 11). However, it can be seen that the precision tends to be degraded at end-systole (Fig. 11 right plot). This degradation happens, among other things, because the myocardial tissue at end-diastole is generally more compact, and also, there is more end-diastole images than end-systole. In fact, the Sunnybrooks dataset provides the myocardial contours in all slices at ED, and only the endocardial contours at ED.

A 3D examples of the proposed fully automatic myocardial tissue segmentation on the CA dataset can be seen in Fig. 12. Figure 11(a) shows the myocardial segmentation in three orthogonal views and Fig. 11(b) shows the segmentation in two 3D views. Qualitative details of the proposed approach for myocardial segmentation for a patient on the Sunnybrooks dataset can be seen in several slices in Fig. 13. They show that the proposed myocardial segmentation provides a suitable approach for myocardial segmentation in cardiac MRI. In brown it depicts those voxels where the manual myocardial segmentation and the proposed automatic segmentation overlaps, in dark yellow and cian are depicted those pixels corresponding to the GT and the automated segmentation without overlapping respectively. As it can be seen only a reduced number of voxels correspond to a non-overlapping classification, especially for the LV ventricle.

Figure 12: Examples of the proposed method for myocardial tissue and endocardial (blood pool) classification on a patient of the CA dataset for end-diastole (up-row) and end-systole (low-row). (a) The proposed myocardial segmentation is presented in three orthogonal views. (b) 3D view of the proposed myocardial segmentation.
Figure 13: Qualitative results for a patient in the SB dataset. Three different short axis slices are plotted from apex (upper) to base (down) of the left ventricle for the myocardial tissue and the endocardium (blood pool) at end-diastole and only the endocardium at end-systole. In brown it depicts those voxels where the manual and the proposed automatic classification overlaps, in dark yellow and cian are depicted those pixels corresponding to the manual and automated classification without overlapping respectively.

Physiological validation

Six physiological measures have been derived from the myocardial tissue classification: end-diastolic volume (EDV), end-systolic volume (ESV), SV, LVM at ED, CO and EF. As it was described in the introduction, these measures are one of the most relevant global structural features. In particular, the ESV measures for three patients were identified as outliers and they will be discussed separately in the next section.

Table 2 summarizes the errors and the Pearson’s correlation coefficient of the six physiological measures derived from the proposed method with respect to those derived from the experts. Whereas in Table 3 a bibliographical comparison is performed. In general, our measures are comparable to those reported by other authors.

Parameters Mean error Std error Mean abs error Max. abs error
EDV [cm] 0.14 9.51 7.31 26.55 0.99
ESV [cm] 1.37 9.15 7.12 25.42 0.99
SV [cm] -1.23 8.38 6.82 21.54 0.93
EF [%] -0.71 4.76 3.73 15.84 0.95
LVM [g] -3.68 13.66 10.67 64.05 0.97
CO [L/min] -0.03 0.47 0.32 1.29 0.93
Table 2: Errors of the physiological measures studied for the proposed approach.
Mean std. errors
Reference EDV [cm] ESV [cm] EF [%] LVM [g] Methodology Comments
(5) active contours Nine patients
(25) Fuzzy objects, Hough Transform and minimum cost Six patients and eight healthy subjects
(34) Fuzzy objects, smooth convex hull and radial minimum cost Seven patients and three healthy subjects
(23) Bayesian flooding and weighted least-squares 18 patients; semiautomatic corrections
(4) MRF based deformable model 43 patients affected by an AMI
(22) Maximum likelihood based on active contours 35 patients and 15 healthy subjects
ours Deep learning 137 patients
Table 3: Errors (mean std.) in the measurement of the physiological parameters studied. Comparison with other approaches.

The Bland–Altman plots (Fig. 14) shows a minimal bias of  ml for EDV, a small bias of  ml for ESV, and

 % for EF (95% limits of agreement: -18.43 to 18.71 ml for EDV, -16.49 to 19.23 ml for ESV, and -10.02 to 8.59% for EF). Plots of the linear regression of the measured parameters and those derived from the experts (Fig. 


) show a high correlation between the manual and automatic measures, specially for the EDV and ESV. As it was described in the previous section, myocardial tissue and blood classification seems to be slightly more precise for end-diastole than for end-systole (see the standard deviation and the mean absolute erro in Table 

2). Nevertheless, the overall effect of ESV inaccuracies in the EF calculation is negligible, and the proposed method is one of the most accuracy for measuring the EF. Also, it is important to note that the EF and MM errors are comparable to the inter e intra-operator ranges for manual contouring reported in (28). The presence of outliers in the measures is mainly due to the segmentation of most basal and apical slices. The most basal slices are difficult to segment due to the myocardial tissue is not always complete in the slice and it changes the topology of the cavity. In the same way, the most apical slices are also difficult to segment because the blood pool is not always present in those slices which it changes the topology too.

Figure 14: Bland-Altman plots of the physiological measures studied.
Figure 15: Regression plots and the Pearson coefficient of the physiological measures studied.


The ESV measures for three patients were identified as outliers with error values grater than 30 . These errors were found extremely higher with respect to others patients. So, they were excluded from the previous analysis and they are discussed in what follows. An analysis on these patients reveals that the proposed approach for myocardial tissue classification fails (Fig. 16). However, is important to note that in one of these three patient, the method only fails in a basal slice as it can be seen in Figure 16 middle row. The effect of this misclassification introduce an error of 49 and 46  for EDV and ESV respectively.

On the other hand, the proposed approach fails in different slices for the other two patients as it is depicted in Figure 16 first and bottom row. The misclassification of the patient presented in the firs row results in a high error, but it is produced only at end-systole (30.5 ). In this case, the patient presents a severe dilated cardiomyopathy which made the proposed approach for tissue classification fails. Similarly, the patient depicted in the bottom row in Figure 16 presents an hypertrophic cardiomyopathy. This hypertrophic myocardial tissue made the proposed approach reaches an error of 45.2 . Nevertheless, we believe that the proposed approach will be able to overcome these misclassifications just by increasing the size of the dataset.

Figure 16: Patients excluded from the physiological analysis. The automatic and manual segmentation is presented by patient (rows) in three orthogonal views (columns) at end-diastole (first row) and end-systole (middle and bottom rows). The blood pool is depicted in blue and green for manual and automated segmentation respectively. Also, the myocardial tissue is depicted in brown and yellow for manual and automated segmentation. The manual myocardial contour is only depicted when it is available (i.e. at end-diastole).

4 Conclusions

In this paper, we have proposed an automatic LV function and mass quantification approach by using deep learning networks. Unlike previous approaches, our method makes use of the Generalized Jaccard distance as objective loss function and residual learning strategies level to level to provide a suitable approach for myocardial segmentation and cardiac functional quantification. Quantitative and qualitative results show that the proposed approach presents a high potential for being used to estimate different structural and functional features for both prognosis and treatment of different pathologies. Thanks to data augmentation with free form deformations, it only need very few annotated images (280 cardiac MRI) and has a very reasonable training time of only 9 hours on a NVidia Tesla C2070 (6 GB) for reaching a suitable accuracy of

Dice’s coefficient for both public datasets. Also, it is important to note that the methods achieves a strong correlation with the most relevant functional measures (0.99 for EDV and ESV, 0.97 for myocardial mass, 0.95 for EF and 0.93 for SV and CO). And the error are comparable to the inter e intra-operator ranges for manual contouring. Additionally, this work leads to extensions for automatic detection and tracking of the right and left ventricle. Myocardial motion is useful in the evaluation of regional cardiac functions such as the strain and strain rate.


This work was partially supported by Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) and by grants M028-2016 SECTyP, Universidad Nacional de Cuyo, Argentina; and PICTO 2016-0023, Agencia Nacional de Promoción Científica y Tecnológica, and Universidad Nacional de Cuyo, Argentina. German Mato acknowledges CONICET for the grant PIP 112 201301 00256.


  • Bluemke et al. (2008) Bluemke, D.A., Kronmal, R.A., Lima, J.A., Liu, K., Olson, J., Burke, G.L., Folsom, A.R., 2008. The relationship of left ventricular mass and geometry to incident cardiovascular events. Journal of the American College of Cardiology 52, 2148–2155. doi:
  • Chollet (2015) Chollet, F., 2015. Keras.
  • Chollet (2017) Chollet, F., 2017.

    Xception: Deep learning with depthwise separable convolutions, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807.

  • Cordero-Grande et al. (2011) Cordero-Grande, L., Vegas-Sánchez-Ferrero, Casaseca-de-la Higuera, P., San-Román-Calvar, J.A., Revilla-Orodea, A., Martin-Fernandez, M., Alberola-Lopez, C., 2011. Unsupervised 4d myocardium segmentation with a markov random field based deformable model. Medical Image Analysis 15, 283–301.
  • Corsi et al. (2006) Corsi, C., Veronesi, F., Lamberti, C., Mor-Avi, V., 2006. Improved automated quantification of left ventricular size and function from cardiac magnetic resonance images, in: 2006 Computers in Cardiology, pp. 53–56.
  • Crum et al. (2006) Crum, W.R., Camara, O., Hill, D.L.G., 2006. Generalized overlap measures for evaluation and validation in medical image analysis. IEEE Transactions on Medical Imaging 25, 1451–1461.
  • Curiale et al. (2017) Curiale, A.H., Colavecchia, F.D., Kaluza, P., Isoardi, R.A., Mato, G., 2017. Automatic myocardial segmentation by using a deep learning network in cardiac mri, in: 2017 XLIII Latin American Computer Conference (CLEI), pp. 1–6.
  • Edvardsen et al. (2002) Edvardsen, T., Urheim, S., Skulstad, H., Steine, K., Ihlen, H., Smiseth, O.A., 2002. Quantification of left ventricular systolic function by tissue doppler echocardiography. Circulation 105, 2071–2077.
  • Fonseca et al. (2011) Fonseca, C.G., Backhaus, M., Bluemke, D.A., Britten, R.D., Chung, J.D., Cowan, B.R., Dinov, I.D., Finn, J.P., Hunter, P.J., Kadish, A.H., Lee, D.C., Lima, J.A.C., Medrano-Gracia, P., Shivkumar, K., Suinesiaputra, A., Tao, W., Young, A.A., 2011. The cardiac atlas project—an imaging database for computational modeling and statistical atlases of the heart. Bioinformatics 27, 2288. doi:10.1093/bioinformatics/btr360.
  • Frangi et al. (2001) Frangi, A.F., Niessen, W.J., Viergever, M.A., 2001. Three-dimensional modeling for functional analysis of cardiac images, a review. IEEE Transactions on Medical Imaging 20, 2–25.
  • Friedrich et al. (2009) Friedrich, M.G., Sechtem, U., Schulz-Menger, J., Holmvang, G., Alakija, P., Cooper, L.T., White, J.A., Abdel-Aty, H., Gutberlet, M., Prasad, S., Aletras, A., Laissy, J.P., Paterson, I., Filipchuk, N.G., Kumar, A., Pauschinger, M., Liu, P., 2009. Cardiovascular magnetic resonance in myocarditis: A jacc white paper. Journal of the American College of Cardiology 53, 1475–1487.
  • Gerche et al. (2013) Gerche, A.L., Claessen, G., Van de Bruaene, A., Pattyn, N., Van Cleemput, J., Gewillig, M., Bogaert, J., Dymarkowski, S., Claus, P., Heidbuchel, H., 2013. Cardiac mri clinical perspective. Circulation: Cardiovascular Imaging 6, 329–338. doi:10.1161/CIRCIMAGING.112.980037.
  • Gjesdal et al. (2011) Gjesdal, O., Bluemke, D.A., Lima, J.A., 2011. Cardiac remodeling at the population level - risk factors, screening, and outcomes. Nature Reviews Cardiology 8, 673–685. doi:
  • Graça et al. (2014) Graça, B., Donato, P., Ferreira, M.J., Castelo-Branco, M., Caseiro-Alves, F., 2014. Left ventricular diastolic function in type 2 diabetes mellitus and the association with coronary artery calcium score: A cardiac mri study. American Journal of Roentgenology 202, 1207–1214. doi:10.2214/AJR.13.11325.
  • He and Sun (2015) He, K., Sun, J., 2015. Convolutional neural networks at constrained time cost, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5353–5360.
  • He et al. (2016) He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778.
  • Ioffe and Szegedy (2015) Ioffe, S., Szegedy, C., 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ArXiv e-prints .
  • Koelling et al. (2002) Koelling, T.M., Aaronson, K.D., Cody, R.J., Bach, D.S., Armstrong, W.F., 2002. Prognostic significance of mitral regurgitation and tricuspid regurgitation in patients with left ventricular systolic dysfunction. American Heart Journal 144, 524–529.
  • Lecun et al. (1998) Lecun, Y., Bottou, L., Bengio, Y., Haffner, P., 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324. doi:10.1109/5.726791.
  • Litjens et al. (2017) Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak, J.A., van Ginneken, B., Sánchez, C.I., 2017. A survey on deep learning in medical image analysis. Medical Image Analysis 42, 60–88.
  • Lowes et al. (1999) Lowes, B.D., Gill, E.A., Abraham, W.T., Larrain, J.R., Robertson, A.D., Bristow, M.R., Gilbert, E.M., 1999. Effects of carvedilol on left ventricular mass, chamber geometry, and mitral regurgitation in chronic heart failure. American Journal of Cardiology 83, 1201–1205.
  • Marino et al. (2016) Marino, M., Corsi, C., Maffessanti, F., Patel, A.R., Mor-Avi, V., 2016. Objective selection of short-axis slices for automated quantification of left ventricular size and function by cardiovascular magnetic resonance. Clinical Imaging 40, 617–623. doi:
  • Mazonakis et al. (2010) Mazonakis, M., Grinias, E., Pagonidis, K., Tziritas, G., Damilakis, J., 2010. Development and evaluation of a semiautomatic segmentation method for the estimation of lv parameters on cine mr images. Physics in Medicine & Biology 55, 1127.
  • Milletari et al. (2016) Milletari, F., Navab, N., Ahmadi, S.A., 2016. V-net: Fully convolutional neural networks for volumetric medical image segmentation, in: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. doi:10.1109/3DV.2016.79.
  • Pednekar et al. (2006) Pednekar, A., Kurkure, U., Muthupillai, R., Flamm, S., Kakadiaris, I.A., 2006. Automated left ventricular segmentation in cardiac mri. IEEE Transactions on Biomedical Engineering 53, 1425–1428. doi:10.1109/TBME.2006.873684.
  • Radau et al. (2009) Radau, P., Lu, Y., Connelly, K., Paul, G., Dick, A., Wright, G., 2009. Evaluation framework for algorithms segmenting short axis cardiac mri. The MIDAS Journal -Cardiac MR Left Ventricle Segmentation Challenge 49.
  • Ronneberger et al. (2015) Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for biomedical image segmentation, in: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (Eds.), Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pp. 234–241. doi:10.1007/978-3-319-24574-4_28.
  • Sardanelli et al. (2008) Sardanelli, F., Quarenghi, M., Di Leo, G., Leonardo, B., Schiavi, A., 2008. Segmentation of cardiac cine mr images of left and right ventricles: Interactive semiautomated methods and manual contouring by two readers with different education and experience. Journal of Magnetic Resonance Imaging 27, 785–792. doi:10.1002/jmri.21292.
  • Shimodaira (2000) Shimodaira, H., 2000. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference 90, 227–244.
  • Srivastava et al. (2014) Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R., 2014. Dropout: a simple way to prevent neural networks from overfitting.

    Journal of Machine Learning Research 15, 1929–1958.

  • Suffoletto et al. (2006) Suffoletto, M.S., Dohi, K., Cannesson, M., Saba, S., Gorcsan, J., 2006. Novel speckle-tracking radial strain from routine black-and-white echocardiographic images to quantify dyssynchrony and predict response to cardiac resynchronization therapy. Circulation 113, 960–968.
  • Suinesiaputra et al. (2015) Suinesiaputra, A., Bluemke, D.A., Cowan, B.R., Friedrich, M.G., Kramer, C.M., Kwong, R., Plein, S., Schulz-Menger, J., Westenberg, J.J.M., Young, A.A., Nagel, E., 2015. Quantification of lv function and mass by cardiovascular magnetic resonance: multi-center variability and consensus contours. Journal of Cardiovascular Magnetic Resonance 17, 63. doi:10.1186/s12968-015-0170-9.
  • Szegedy et al. (2015) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., 2015. Going deeper with convolutions, in: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9.
  • Van Geuns et al. (2006) Van Geuns, R.J.M., Baks, T., Gronenschild, E.H.B.M., Aben, J.P.M.M., Wielopolski, P.A., Cademartiri, F., de Feyter, P.J., 2006. Automatic quantitative left ventricular analysis of cine mr images by using three-dimensional information for contour detection. Radiology 240, 215–221. doi:10.1148/radiol.2401050471.
  • Weinsaft et al. (2007) Weinsaft, J.W., Klem, I., Judd, R.M., 2007. Mri for the assessment of myocardial viability. Cardiology Clinics 25, 35–56.