Log In Sign Up

Fully Convolutional Multi-scale Residual DenseNets for Cardiac Segmentation and Automated Cardiac Diagnosis using Ensemble of Classifiers

Deep fully convolutional neural network (FCN) based architectures have shown great potential in medical image segmentation. However, such architectures usually have millions of parameters and inadequate number of training samples leading to over-fitting and poor generalization. In this paper, we present a novel highly parameter and memory efficient FCN based architecture for medical image analysis. We propose a novel up-sampling path which incorporates long skip and short-cut connections to overcome the feature map explosion in FCN like architectures. In order to processes the input images at multiple scales and view points simultaneously, we propose to incorporate Inception module's parallel structures. We also propose a novel dual loss function whose weighting scheme allows to combine advantages of cross-entropy and dice loss. We have validated our proposed network architecture on two publicly available datasets, namely: (i) Automated Cardiac Disease Diagnosis Challenge (ACDC-2017), (ii) Left Ventricular Segmentation Challenge (LV-2011). Our approach in ACDC-2017 challenge stands second place for segmentation and first place in automated cardiac disease diagnosis tasks with an accuracy of 100 challenge our approach attained 0.74 Jaccard index, which is so far the highest published result in fully automated algorithms. From the segmentation we extracted clinically relevant cardiac parameters and hand-crafted features which reflected the clinical diagnostic analysis to train an ensemble system for cardiac disease classification. Our approach combined both cardiac segmentation and disease diagnosis into a fully automated framework which is computational efficient and hence has the potential to be incorporated in computer-aided diagnosis (CAD) tools for clinical application.


page 11

page 15

page 18

page 26

page 30

page 33

page 36


Unified Multi-scale Feature Abstraction for Medical Image Segmentation

Automatic medical image segmentation, an essential component of medical ...

Human-level CMR image analysis with deep fully convolutional networks

Cardiovascular magnetic resonance (CMR) imaging is a standard imaging mo...

L-CO-Net: Learned Condensation-Optimization Network for Clinical Parameter Estimation from Cardiac Cine MRI

In this work, we implement a fully convolutional segmenter featuring bot...

Segmentation and Classification of Cine-MR Images Using Fully Convolutional Networks and Handcrafted Features

Three-dimensional cine-MRI is of crucial importance for assessing the ca...

Automatically Designing CNN Architectures for Medical Image Segmentation

Deep neural network architectures have traditionally been designed and e...

EchoCP: An Echocardiography Dataset in Contrast Transthoracic Echocardiography for Patent Foramen Ovale Diagnosis

Patent foramen ovale (PFO) is a potential separation between the septum,...

Discriminative Parameter Estimation for Random Walks Segmentation

The Random Walks (RW) algorithm is one of the most e - cient and easy-to...

1 Introduction

Cardiac cine Magnetic Resonance (MR) Imaging is primarily used for assessment of cardiac function and diagnosis of Cardiovascular diseases (CVDs). Estimation of clinical parameters such as ejection fraction, ventricular volumes, stroke volume and myocardial mass from cardiac MRI is considered as gold standard. Delineating important organs and structures from volumetric medical images, such as MR and computed tomography (CT) images, is usually considered as the primary step for estimating clinical parameters, disease diagnosis, prediction of prognosis and surgical planning. In a clinical setup, a radiologist delineates the region of interest from the surrounding tissues/ organs by manually drawing contours encompassing the structure of interest. However, this approach becomes infeasible in a hospital with high footprint as it is time-consuming, tedious and also introduces intra and inter-rater variability (

Petitjean and Dacher (2011); Miller et al. (2013); Tavakoli and Amini (2013); Suinesiaputra et al. (2014)). Hence, a fully automatic method for segmentation and clinical diagnosis is desirable.

Segmentation of Left Ventricular (LV) Endocardium and Epicardium as well as Right Ventricular (RV) Endocardium from 4D-cine (3D+Time) MR datasets has received significant research attention over past few years and several grand challenges (Radau et al. (2009); Suinesiaputra et al. (2014); Petitjean et al. (2015); Booz Allen Hamilton Inc and Kaggle ; Bernard et al.

) have been organized for advancing the state of art methods in (semi-)/automated cardiac segmentation. These challenges usually provide expert-ground truth contours and provide set of evaluation metrics to benchmark various approaches. In this paper, we present a novel 2D Fully Convolutional Neural Network (FCN) architecture for medical image segmentation which is highly parameter and memory efficient. We also develop a fully automated framework which incorporates cardiac structures segmentation and cardiac disease diagnosis. Our contributions are summarized as follows:

  • We employed conventional computer-vision techniques like Circular Hough Transform and Fourier based analysis as a pre-processing step to localize the region of interest (ROI). The extracted ROI is used by the proposed network during training and inference time. This combined approach helps in reduction of GPU memory usage, inference time and elimination of False Positives.

  • The proposed network connectivity pattern was based on Densely connected convolutional neural networks (DenseNets) (Huang et al. (2016)). DenseNet facilitates multi-path flow for gradients between layers during training by back-propagation and hence does implicit deep-supervision. DenseNets also encourages feature reuse and thus substantially reduces the number of parameters while maintaining good performance, which is ideal in scenarios with limited data. In addition, we incorporated multi-scale processing in the initial layers of the network by performing convolutions on the input with different kernel sizes in parallel paths and later fusing them as in Inception architectures. We propose a novel long skip and short-cut connections in the up-sampling path which is much more computationally and memory efficient when compared to standard skip connections. We introduce a weighting scheme for both cross-entropy and dice loss, and propose a way to combine both the loss functions to yield optimal performance in terms of pixel-wise accuracy and segmentation metrics.

  • For the cardiac disease classification, the predicted segmentation labels were used to estimate relevant clinical and hand-crafted cardiac features. We identified and extracted relevant features based on clinical inputs and Random Forest based feature importance analysis. We developed an ensemble classifier system which processed the features in two-stages for prediction of the cardiac disease.

  • We extensively validated our proposed network on two cardiac segmentation tasks: (i) segmentation of Left Ventricle (LV), Right Ventricle (RV) and Myocardium (MYO) from 3D cine cardiac MR images for both End Diastolic (ED) and End Systolic (ES) phase instances and (ii) segmentation of MYO for the whole cardiac frames and slices in 4D cine MR images, by participating in two challenges organized at Statistical Atlases and Computational Modeling of the Heart (STACOM) workshops: (i) Automated Cardiac Disease Diagnosis Challenge (Bernard et al. ), and (ii) Left Ventricular Segmentation Challenge (Suinesiaputra et al. (2014)).

We achieved competitive segmentation results to state-of-the-art approaches in both the challenges, demonstrating the effectiveness and generalization capability of the proposed network architecture. Also, for the cardiac disease classification model, our approach gave % accuracy on the ACDC testing dataset. A preliminary version of this work was presented at STACOM workshop held in conjunction with Medical Image Computing and Computer Assisted Interventions (MICCAI-2017). In this paper, we have substantially improved the network architecture and methodology used for cardiac disease classification. The main modifications include elaborating proposed methods, analyzing underlying network connectivity pattern, loss function and adding experiments on LV-2011 challenge datasets. Since, our work is primarily based on CNNs, we focus on the recently published CNN-based algorithms for cardiac segmentation and also provide a comprehensive literature review on the other approaches.

1.1 Related Work

1.1.1 Literature survey on short-axis cardiac cine MR segmentation

Petitjean and Dacher (2011); Frangi et al. (2001); Tavakoli and Amini (2013); Peng et al. (2016) provide a comprehensive survey on cardiac segmentation using semi-automated and fully automated approaches. These approaches can be broadly classified as techniques based on:

  1. pixel or image-based classification such as intensity distribution modeling and image thresholding (Lynch et al. (2006); Pednekar et al. (2006); Nambakhsh et al. (2013); Jolly et al. (2011); Katouzian et al. (2006); Lu et al. (2009); Cousty et al. (2010))

  2. variational and level sets (Fradkin et al. (2008); Lynch et al. (2008); Paragios (2003); Ayed et al. (2008))

  3. dynamic programming (Pednekar et al. (2006); Üzümcü et al. (2006))

  4. graph cuts and image-driven approaches (Boykov and Jolly (2000); Lin et al. (2006b); Cocosco et al. (2008))

  5. deformable models such as active contours (Kaus et al. (2004); El Berbari et al. (2007); Billet et al. (2009); Cordero-Grande et al. (2011); Queirós et al. (2015))

  6. cardiac atlases based registration ( Lorenzo-Valdés et al. (2004); Lötjönen et al. (2004); Bai et al. (2015))

  7. statistical shape and active appearance models (Mitchell et al. (2001); Van Assen et al. (2006); Ordas et al. (2007); Zhu et al. (2010); Grosgeorge et al. (2011); Zhang et al. (2010); Albá et al. (2018))

  8. learning-based such as neural networks and combination of various other approaches (Margeta et al. (2011); Eslami et al. (2013); Tsadok et al. (2013); Tran (2016); Avendi et al. (2016); Tan et al. (2017); Zotti et al. (2017); Patravali et al. (2017); Isensee et al. (2017); Wolterink et al. (2017); Baumgartner et al. (2017))

1.1.2 Fully Convolutional Neural Networks for Medical Image Segmentation

In the field of medical image analysis, considerable amount of work has been done in the lines of automating segmentation of various anatomical structures, detection and delineation of lesions and tumors. Non-learning based algorithms such as statistical shape modeling, level sets, active contours, multi-atlas and graphical models have shown promising results on limited dataset, but they usually tend to perform poorly on data originating from a database outside the training data. Some of these techniques heavily relied on engineering hand-crafted features and hence required domain knowledge and expert inputs. Moreover, hand-crafted features have limited representational ability to deal with the large variations in appearance and shapes of anatomical organs. In order to overcome this limitations, learning based methods have been explored to seek more powerful features.
Convolutional Neural Networks (CNNs) which were first invented by LeCun et al. (1998)

is currently the technique that produces state of the art results for a variety of computer vision and pattern recognition tasks. Most common ones are image classification (

Krizhevsky et al. (2012); Simonyan and Zisserman (2014); He et al. (2016)) and semantic segmentation using fully convolutional networks (FCN) (Shelhamer et al. (2017)). In all these applications, CNNs have demonstrated greater representational and hierarchical learning ability. Recently in medical image analysis, FCN and its popular extensions like U-NET (Ronneberger et al. (2015)) have achieved remarkable success in segmentation of various structures in heart (Dou et al. (2016)), brain lesions (Havaei et al. (2017); Kamnitsas et al. (2017); Pereira et al. (2016)), liver lesions (Christ et al. (2016); Dou et al. (2016); Ben-Cohen et al. (2015); Deng and Du (2008)) from medical volumes. Also with availability of huge amounts of labeled data and increase in the computational capability of general purpose graphics processors units (GPUs), CNN based methods have the potential for application in daily clinical practice.

2 Material and Methods

2.1 Overview

Figure 1

illustrates our automated cardiac segmentation and disease diagnosis framework. The pipeline involves: (i) Fourier analysis and Circular Hough-Transform for Region of Interest (ROI) cropping, (ii) proposed network for cardiac structures segmentation and (iii) an ensemble of classifiers for disease diagnosis based on features extracted from the segmentation.

Figure 1: Proposed pipeline for Automated Cardiac Segmentation & Cardiac Disease diagnosis.

2.2 Experimental Datasets and Materials

In this section, we introduce the datasets employed in our experiments. In Section 3, we shall report our results of comparison study for analyzing the effectiveness of our proposed approach. In Section 4, we shall comprehensively present the results on these datasets.

2.2.1 ACDC-2017 Dataset

The Automated Cardiac Disease Diagnosis challenge dataset comprised of 150 exams of different patients and was divided into 5 evenly distributed subgroups (4 pathological and 1 healthy subject groups) namely- (i) normal- NOR, (ii) patients with previous myocardial infarction- MINF, (iii) patients with dilated cardiomyopathy- DCM, (iv) patients with hypertrophic cardiomyopathy- HCM, (v) patients with abnormal right ventricle- ARV. Each group was clearly defined according to cardiac physiological parameter, such as the left or right diastolic volume or ejection fraction, the local contraction of the LV, the LV mass and the maximum thickness of the Myocardium. The Cine MR images were acquired in breath hold with a retrospective or prospective gating and with a SSFP sequence in short axis orientation. A series of short axis slices cover the LV from the base to the apex, with a slice thickness of 5-8 mm and an inter-slice gap of 5 or 10 mm. The spatial resolution goes from 1.37 to 1.68 /pixel and 28 to 40 images cover completely or partially the cardiac cycle. For each patient, the weight, height and the diastolic and systolic phase instants were provided. The challenge organizers had evenly divided the patient database based on the pathological condition and was made available in two phases, 100 for training and 50 for testing. The manual annotations of LV, RV and MYO were done by clinical experts at systolic and diastolic phase instances only. The clinical diagnosis and manual annotations were provided for the training set and those of testing set were held out by the challenge organizers for their independent evaluation.

2.2.2 LV-2011 Dataset

LV segmentation challenge dataset was made publicly available as part of the STACOM 2011 challenge on automated LV Myocardium segmentation from short-axis cine MRI. The dataset comprised of 200 patients with coronary artery disease and myocardial infarction. The dataset comes with expert-guided semi-automated segmentation contours for the Myocardium. The dataset provided by the organizers was divided into two sets of 100 cases each: training and validation. The training set was provided with expert-guided contours of the Myocardium.

2.3 Region of Interest (ROI) detection

The cardiac MR images of the patient comprises of the heart and the surrounding chest cavity like the lungs and diaphragm. The ROI detection employs Fourier analysis (Lin et al. (2006a)) and Circular Hough Transform (Duda and Hart (1972); Korshunova et al. ) to delineate the heart structures from the surrounding tissues. The ROI extraction involves finding the approximate center of the LV and extracting a patch of size centered around it. The ROI patch was used for training CNN and also during inference. This approach helped in alleviating the class-imbalance problem associated with labels for heart structures seen in the full sized cardiac MR images. B gives a detailed overview of ROI detection steps.

2.4 Normalization

The 4D cine MR cardiac datasets employed slice-wise normalization of voxel intensities using Eq. (1).


where is voxel intensity. and are minimum and maximum of the voxel intensities in a slice respectively.

2.5 Network Architecture

The proposed network’s connectivity pattern was constructed by taking inspiration from DenseNet (Huang et al. (2016)) for semantic segmentation (Jégou et al. (2017)). A brief overview on DenseNets, Residual Networks and Inception Architectures is given in C

2.5.1 Fully Convolutional Multi-scale Residual DenseNets for Semantic Segmentation

Figure 2: The proposed architecture for semantic segmentation using Densely Connected Fully Convolutional Network (DFCN) was made up of several modular blocks which are explained as follows: (a) An example of Dense Block (DB) with 3 Layers. In a DB the input was fed to the first layer to create k new feature maps. These new feature maps are concatenated with the input and fed to the second layer to create another set of k new feature maps. This operation was repeated for 3 times and the final output of the DB was a concatenation of the outputs of all the 3 layers and thus contains 3*k feature maps, (b) A Layer in a DB was a composition of Batch Norm (BN), Exponential Linear Unit (ELU), Convolution and a Dropout layer with a drop-out rate of , (c) A Transition Down (TD) block reduces the spatial resolution of the feature maps as the depth of the network increases. TD block was composed of BN, ELU, Convolution, Dropout () and Max-pooling layers, (d) A Transition Up (TU) block increases the spatial resolution of the feature maps by performing

Transposed Convolution with a stride of


Figure 2 & 3 illustrates the building blocks and the schematic diagram of the proposed network architectures for segmentation respectively. A typical semantic segmentation architecture comprises of a down-sampling path (contracting) and an up-sampling path (expanding). The down-sampling path of the network is similar to the DenseNet architecture and is described in C.1. The last layer of the down-sampling path was referred to as bottleneck. The input spatial resolution was recovered in the up-sampling path by transposed convolutions, dense blocks and skip connections coming from the down-sampling path. The up-sampling operation was referred to as Transition-Up (TU). The up-sampled feature maps are added element-wise with skip-connections. The feature maps of the hindmost up-sampling component was convolved with a convolution layer followed by a soft-max layer to generate the final label map of the segmentation.

Figure 3: The figures shows the modifications introduced to DenseNets to mitigate the feature map explosion when extending to FCN. (a) DFCN-A (Jégou et al. (2017)), (b) The proposed architecture was referred to as DFCN-C. The main modifications include:- (i) replacing the standard copy and concatenation of skip connections from down-sampling path to up-sampling path with a projection layer and an element-wise addition of feature-maps respectively, aiding in reduction of parameters and GPU memory foot-print. (ii) introduction of short-cut connections (residual) in the up-sampling path, (iii) parallel pathways in the initial layer.

In the proposed network architecture several modifications were introduced in its connectivity pattern to further improve on in terms of parameter efficiency, rate of convergence and memory foot-print required. We discuss and compare 3 different architectural variants of Densely Connected Fully Convolutional Networks (DFCN). We refer to the architecture introduced by Jégou et al. (2017) as DFCN-A and other two variants as DFCN-B & DFCN-C (Fig. 3). The following architectural changes were introduced:

  • The GPU memory foot-print increases with the increase in number of feature maps of larger spatial resolution. In order to mitigate feature map explosion in the up-sampling path, the skip connections from down-sampling path to up-sampling path used element-wise addition operation instead of concatenation operation. In order to match the channel dimensions a projection operation was done on the skip connection path using BN-ELU--convolution-Dropout. This operation when compared to concatenation of feature maps helps in reduction of the parameters and memory foot-print without affecting the quality of the segmentation output. The proposed projection operation does dimension reduction and also allows complex and learnable interactions of cross channel information (Lin et al. (2013)

    ). Replacing the activation function from rectified linear units (ReLUs) to exponential linear units (ELUs) manifested in faster convergence.

  • In DenseNets, without pooling layers the spatial resolution of feature maps increases with depth and hence leads to memory explosion. So, Jégou et al. (2017) overcame this limitation in the up-sampling path by not concatenating the input to a Dense Block with its output (only exception was at the last dense block, see Fig. 3

    (a)). Hence, the transposed convolution was applied only to the feature maps obtained by the last layer of a dense block and not to all feature maps concatenated so far. However, we observed that by introducing short-cut (residual) connections (

    He et al. (2016)) in the dense blocks of up-sampling path by element-wise addition of dense Blocks input with its output would better aggregate a set of previous transformations rather than completely discarding the inputs of dense blocks. In order to match dimensions of dense blocks input and output a projection operation was done using BN-ELU--convolution-Dropout. We found that this was effective in addressing the memory limitations exposed by DenseNet based architecture for semantic segmentation. We also observed faster convergence and improved segmentation metrics due to short-cut connections. The DFCN-B refers to architecture got by the above modifications to DFCN-A.

  • DFCN-C (Figure 3(b)) is the final architecture that incorporates all the modifications of DFCN-B and additionally its initial layer included parallel CNN branches similar to inception module (Szegedy et al. (2015)). Incorporating multiple kernels of varying receptive fields in each of the parallel paths would help in capturing view-point dependent object variability and learning relations between image structures at multiple-scales (See C and 23 for more details).

2.5.2 Loss function

In CNN based techniques, segmentation of volumetric medical images is achieved by performing voxel-wise classification. The network’s soft-max layer output gives the posterior probabilities for each class. The anatomical organs or lesions of interest in volumetric medical images are sparsely represented in whole volume leading to class-imbalance, making it difficult to train. In order to address this issue different loss functions such as dice loss (

Milletari et al. (2016)) and weighted cross-entropy loss have been introduced.

Figure 4: The figure shows the spatial weight-map generated from the ground-truth image. The spatial weight map was used with voxel-wise cross-entropy loss. The contour pixels for each class was identified using canny-edge detector with and was followed by morphological dilation. The colors in spatial weight-map indicate weight distribution based on their relative class frequency.

For training the network, a dual loss function which incorporated both cross-entropy and dice loss was proposed. Additionally, two different weighting mechanisms for both the loss functions were introduced. The cross-entropy loss measures voxel-wise error probability between the predicted output class and the target class and accumulates the error across all the voxels. Spatial weight maps generated from the ground-truth images (Fig. 4) were used for weighting the loss computed at each voxel in the cross-entropy loss Eq. (2).

Let be the set of learn-able weights, where is weight matrix corresponding to the layer of the deep network, represent the probability prediction of a voxel after the soft-max function in the last output layer, the spatially weighted cross-entropy loss is formulated as:


where represents the training samples and is the target class label corresponding to voxel and is the weight estimated at each voxel .

Let be the set of all ground truth classes in the training set. For each ground-truth image, let be the set of all voxels, be the set of voxels corresponding to each class and be the set of contour voxels corresponding to each class .


where denotes the cardinality of the set and represents the indicator function defined on the subsets of , i.e. ,


For using dice overlap coefficient score as loss-function an approximate value of dice-score was estimated by replacing the predicted label ( in Eq. (12)) with its posterior probability . Since, dice-coefficient needs to be maximized for better segmentation output, the optimization was done to minimize its complement, i.e.

For multi-class segmentation, the dice loss was computed using weighted mean of for each class . The weights were estimated for every mini-batch instance from the training set. The dice loss for multi-class segmentation problem is given in Eq. (7):


where is the estimated weight for each class and is a small value added to both numerator and denominator for numerical stability. Let be the set of pixels in the mini-batch, be the set of pixels corresponding to each class and , then the weight estimate for the current mini-batch is given by:


The parameters of the network were optimized to minimize both the loss functions in tandem. In addition, an weight-decay penalty was added to the loss function as regularizer. The total loss function is given in Eq. (9).


where , and are weights to individual losses. The loss decay factor was set to .

2.6 Post-processing

The results of segmentation were subjected to 3-D connected component analysis followed by slice-wise 2-D connected component analysis and morphological operations such as binary hole-filling inside the ventricular cavity.

2.7 Cardiac disease diagnosis

The goal of the automated cardiac disease diagnosis challenge was to classify the cine MRI-scans of the heart into one of the five groups, namely:- (i) DCM, (ii)HCM, (iii) MINF, (iv) ARV and (v) NOR.

2.7.1 Feature extraction

From the ground truth segmentations of training data, several cardiac features pertaining to Left Ventricle (LV), Right Ventricle (RV) and Myocardium (MYO) were extracted. The cardiac features were grouped into two categories, namely primary feature & derived features. Primary features were calculated directly from the segmentations and DICOM tag pixel spacing and slice-thickness.

  1. Volume of LV at End Diastole (ED) and End Systole (ES) Phases.

  2. Volume of RV at ED and ES Phases

  3. Mass and volume of MYO estimated at ED and ES Phases respectively.

  4. Myocardial wall thickness at each slice.

Derived features were a combination of primary features:

  1. Ejection Fraction (EF) of LV and RV.

  2. Ratio of Primary features: , at ED and ES phases.

  3. Variation profile of Myocardial Wall Thickness (MWT) in Short-Axis (SA) and Long-Axis (LA). D gives the methodology in arriving at these set of features.

2.7.2 Two-stage ensemble approach for cardiac disease classification

Figure 5: Automated Cardiac Disease Diagnosis using Ensemble of Classifiers and Expert Classifier Approach

The cardiac disease diagnosis was approached as two stage classification using an ensemble system. Figure 5

illustrates the methodology adopted for cardiac disease classification. Ensemble classification is a process in which multiple classifiers are created and strategically combined to solve a particular a classification problem. Combining multiple-classifiers need not always guarantee better performance than the best individual classifier in the ensemble. The ensemble ensures that overall risk due to poor model selection is minimized. The accuracy of the classifiers were estimated based on 5-fold cross-validation scores. Based on the cross-validation scores only the top performing classifiers were selected for combining in the Ensemble-based system. The first stage of the ensemble comprised of four classifiers namely- (i) Support Vector Machine (SVM) with Radial Basis Function kernel, (ii) Multi-layer Perceptron (MLP) with 2 hidden layers with 100 neurons each, (iii) Gaussian Naive Bayes (GNB) and (iv) Random Forest (RF) with 1000 trees. All the classifiers were independently trained to classify the patient’s cine MR scan into five groups by extracting all the features listed in Table

1. In the first-stage of the Ensemble a voting classifier finalized the disease prediction based on majority vote.

Features LV RV MYO
Cardiac volumetric features
volume at ED
volume at ES
mass at ED
ejection fraction
volume ratio:
volume ratio:
volume ratio:
mass to volume ratio:
Myocardial wall thickness variation profile
Table 1: The table lists all the features extracted from the predicted segmentation labels. All the 20 features were used for training the classifiers in the first stage of the ensemble. The expert classifier was trained with only a subset of the listed features indicated by . :- Set of Myocardial Wall Thickness measures (in mm) per short-axis slice.

:- Set of statistic (like mean, standard deviation) for all short axis slices when seen along long-axis at a particular cardiac phase

In some of the cases, the first stage of the ensemble had difficulty in distinguishing between MINF and DCM groups. In-order to eliminate such mis-classifications, a two class expert” classifier trained only on myocardial wall thickness variation profile features at ES phase was proposed. The expert classifier re-assessed only those cases for which the first stage’s predictions were MINF or DCM. The expert classifier used was MLP with 2 hidden layers with 100 neurons each.

3 Experimental analysis

In this section, we experimentally analyze the effectiveness of our proposed network architecture, loss function, data-augmentation scheme, effect of ROI cropping and post-processing. For all the ablation studies we used ACDC training dataset. The metrics of evaluation were Dice score and Hausdorff Distance (HD) in mm (A

). The neural network architectures were designed using TensorFlow (

Abadi et al. (2016)) software. We ran our experiments on a desktop computer with NVIDIA-Titan-X GPU, Intel Core i7-4930K 12-core CPUs @ 3.40GHz and 64GB RAM.

3.1 ACDC-2017 dataset

The training dataset comprising of 100 patient cases ( 2D images) were split into for training, validation and testing. Stratified sampling was done so as to ensure each split comprised of equal number of cases from different cardiac disease groups. Each patient scan had approximately 20 images with ground-truth annotations for Left-Ventricle (LV), Right Ventricle (RV) and Myocardium (MYO) at the End Diastole (ED) and End Systole (ES) phases.

3.1.1 Segmentation network architecture and setting hyper-parameters of the network

Figure 6: The proposed network architecture (DFCN-C) comprises of a contracting path (down-sampling path) and an expanding path (up-sampling path). The arrows indicate network’s information flow pathways. The dotted horizontal arrows represent residual skip connections where the feature maps from the down-sampling path are added in an element-wise manner with the corresponding feature maps in the up-sampling path. In order to match the channel dimensions a linear projection is done using BN-ELU--convolution-Dropout in the residual connection path. In the down-sampling path, the input to a dense block was concatenated with its output, leading to a linear growth in the number of feature maps. Whereas in the Bottleneck and up-sampling path the features were added element-wise to enable learning a residual function.

Figure 6 describes the proposed network architecture used for segmentation. Based on experimental results, the following network hyper-parameters were fixed:- (i) Number of max-pooling operations were limited to three (), (ii) Growth-rate of Dense-Blocks (DBs) was set (), (iii) Number of initial feature maps (F) generated by the first convolution layers was ensured to be at-most 3 times the growth-rate ().

3.1.2 Training

The network was trained by minimizing the proposed loss function (Eq. 9) using ADAM optimizer (Kingma and Ba (2014)) with a learning rate set to . The network weights were initialized using He normal initializer (He et al. (2015)

) and trained for 200 epochs with data augmentation scheme as described in

section 3.1.6. The training batch comprised of ROI cropped 2D MR images of dimension . After every epoch the model was evaluated on the validation set and the final best model selected for evaluating on test set was ensured to have highest Dice score for MYO class on the validation set.

3.1.3 Evaluating the effect of growth rate

Table 2 shows the DFCN-C performance with varying growth-rate () parameter. For the same architecture the segmentation performance steadily improved with increasing value of also the network exhibited potential to work with extremely small number of trainable parameters.

Growth Rate
DICE k=2 k=4 k=6 k=8 k=10 k=12 k=14 k=16
LV 0.86 (0.08) 0.90 (0.06) 0.88 (0.09) 0.93 (0.05) 0.93 (0.04) 0.93 (0.05) 0.93 (0.05) 0.93 (0.05)
RV 0.76 (0.13) 0.82 (0.14) 0.85 (0.17) 0.87 (0.13) 0.91 (0.04) 0.91 (0.05) 0.91 (0.05) 0.92 (0.05)
MYO 0.78 (0.06) 0.80 (0.08) 0.82 (0.09) 0.88 (0.04) 0.88 (0.03) 0.89 (0.03) 0.90 (0.02) 0.90 (0.03)
Mean 0.80 (0.09) 0.84 (0.09) 0.85 (0.12) 0.89 (0.07) 0.91 (0.04) 0.91 (0.04) 0.92 (0.04) 0.92 (0.04)
HD k=2 k=4 k=6 k=8 k=10 k=12 k=14 k=16
LV 16.89 (10.51) 13.04 (9.51) 17.64 (11.33) 7.26 (8.97) 4.96 (5.15) 5.46 (6.39) 3.95 (4.21) 4.16 (4.26)
RV 17.96 (10.86) 9.41 (4.74) 7.49 (4.20) 6.01 (2.65) 5.60 (2.58) 5.65 (2.38) 5.49 (2.82) 4.79 (2.05)
MYO 14.27 (8.47) 14.72 (9.57) 16.71 (8.37) 5.58 (4.08) 7.72 (7.80) 5.18 (4.44) 4.32 (3.20) 4.25 (4.12)
Mean 16.37 (9.95) 12.39 (7.94) 13.95 (7.97) 6.29 (5.24) 6.09 (5.18) 5.43 (4.40) 4.59 (3.41) 4.40 (3.48)
Parameters 11,452 43,036 94,756 166,612 258,604 370,732 502,996 655,396
Table 2: Evaluation of segmentation results for various growth rates. The values are provided as mean (standard deviation).

3.1.4 Evaluating the effect of different loss functions

Table 3 compares the segmentation performance of six different loss functions. Because of ROI cropping, the heavy class-imbalance was already mitigated and hence the standard cross-entropy loss showed optimal performance in terms of both Dice score and Hausdorff measures. In terms of Hausdorff distance metric alone, the spatially weighted cross-entropy loss showed best performance suggesting heavy weight penalization for contour voxels aided in learning the contours precisely. The performance of standard dice loss was better than the mini-batch weighted dice loss, indicating weighting was not necessary when using only dice loss. The simple combination of dice and cross-entropy losses showed slight dip in the performance. It was observed that cross-entropy loss optimized for pixel-level accuracy whereas the dice loss helped in improving the segmentation quality/metrics. Weighted dice loss alone caused over segmentation at the boundaries whereas the weighted cross-entropy loss alone led to very sharp contours with minor under-segmentation. So, in order to balance between these trade-offs and combine the advantages of both the losses, we empirical observed that by setting and in the proposed loss (Eq. (9)) gave optimal performance in terms of faster convergence and better segmentation quality.

CE 0.96 (0.02) 0.91 (0.07) 0.94 (0.02) 0.88 (0.06) 0.87 (0.03) 0.88 (0.03) 0.91 (0.04)
sW-CE 0.96 (0.02) 0.91 (0.06) 0.92 (0.09) 0.85 (0.17) 0.89 (0.03) 0.89 (0.03) 0.90 (0.07)
D 0.96 (0.02) 0.91 (0.09) 0.95 (0.02) 0.88 (0.07) 0.89 (0.03) 0.90 (0.03) 0.91 (0.04)
mW-D 0.96 (0.02) 0.90 (0.08) 0.95 (0.01) 0.87 (0.06) 0.87 (0.02) 0.89 (0.03) 0.90 (0.04)
CE+D 0.96 (0.02) 0.90 (0.08) 0.93 (0.04) 0.86 (0.11) 0.88 (0.03) 0.88 (0.04) 0.90 (0.05)
sW-CE + mW-D 0.96 (0.02) 0.90 (0.08) 0.95 (0.02) 0.87 (0.08) 0.89 (0.03) 0.89 (0.03) 0.91 (0.04)
CE 2.98 (2.93) 4.73 (3.78) 4.81 (1.94) 6.73 (3.51) 3.57 (2.55) 7.90 (7.12) 5.12 (3.64)
sW-CE 3.82 (4.01) 4.04 (2.51) 5.09 (2.62) 6.19 (3.93) 4.46 (2.82) 4.85 (2.83) 4.74 (3.12)
D 4.70 (6.58) 7.82 (9.72) 4.82 (1.79) 6.41 (3.26) 4.59 (4.93) 6.16 (5.44) 5.75 (5.29)
mW-D 4.49 (8.35) 7.45 (9.01) 10.14 (8.10) 9.62 (6.65) 6.47 (7.38) 9.54 (10.37) 7.95 (8.31)
CE+D 7.09 (10.24) 9.47 (10.86) 4.85 (2.08) 6.93 (2.88) 7.67 (8.90) 11.56 (9.27) 7.93 (7.37)
sW-CE + mW-D 4.42 (6.39) 6.51 (6.40) 4.20 (1.81) 7.10 (2.95) 4.48 (4.52) 5.87 (4.35) 5.43 (4.40)
Table 3: Evaluation of segmentation results for different loss functions. Note:- CE: cross-entropy loss, D: dice loss, sW-CE : cross-entropy loss with weighting scheme based on spatial weight map, mW-D: dice loss with weighting scheme based on mini-batch. The values are provided as mean (standard deviation).

3.1.5 Evaluating the effect of ROI cropping and post-processing

For evaluating the effect of not using ROI cropped images, the proposed network was trained on the input images resized to

(zero-padding or center-cropping was done to ensure this image dimension). Table

4 compares the effect of using ROI-cropping and post-processing on segmentation results. Even though the non-ROI based technique resulted in better dice-score but it had higher Hausdorff distance, this was mainly because of of false positives at basal and apical slices. As shown in Fig. 11

post-processing steps aided in removing false positives and outliers. Table

4 indicates that the performance of ROI vs. non-ROI was comparable after post-processing steps. However, in the interest of reducing the GPU memory foot-print and time required for training and inference, ROI based method was proposed.

0.96 (0.02) 0.90 (0.08) 0.94 (0.04) 0.84 (0.13) 0.88 (0.03) 0.88 (0.03) 0.90 (0.05)
0.96 (0.02) 0.90 (0.08) 0.95 (0.02) 0.87 (0.08) 0.89 (0.03) 0.89 (0.03) 0.91 (0.04)
0.96 (0.02) 0.91 (0.07) 0.93 (0.03) 0.84 (0.10) 0.89 (0.02) 0.90 (0.03) 0.91 (0.04)
0.96 (0.02) 0.91 (0.07) 0.94 (0.03) 0.86 (0.08) 0.90 (0.02) 0.90 (0.02) 0.91 (0.04)
8.04 (11.26) 15.28 (13.45) 11.80 (19.17) 13.34 (11.92) 11.31 (12.84) 17.52 (14.10) 12.88 (13.79)
4.42 (6.39) 6.51 (6.40) 4.20 (1.81) 7.10 (2.95) 4.48 (4.52) 5.87 (4.35) 5.43 (4.40)
2.97 (3.15) 9.69 (16.88) 26.90 (42.44) 21.49 (35.86) 22.11 (34.04) 34.55 (44.34) 19.62 (29.42)
2.97 (3.15) 3.92 (2.71) 4.67 (1.96) 7.05 (4.29) 3.72 (2.58) 6.09 (6.22) 4.74 (3.48)
Table 4: Evaluation results with and without ROI cropping and post-processing. The values are provided as mean (standard deviation). ROI - Region of Interest, PP - Post-processing.
(a) Before LCCA
(b) After LCCA
(c) Before MBHF
(d) After MBHF
Figure 11: The figure shows the sequence of post-processing steps applied to eliminate false-positives and outliers in the predictions. Largest component component analysis (LCCA) retained only the largest common structure and discarded the rest as seen in (a) and (b). Morphological binary hole filling (MBHF) operation eliminated outliers as seen in (c) and (d)

3.1.6 Evaluating the effect of data augmentation scheme

Data augmentation is done to artificially increase the training set and to prevent the network from over-fitting on the training set. For analyzing the effect of data augmentation during the learning process of the proposed network, two separate models were trained with and without data-augmentation. For the model which incorporated data-augmentation scheme the training batch comprised of the mixture of original dataset and on the fly randomly generated augmented data which included: (i) rotation: random angle between and degrees, (ii) translation x-axis: random shift between and mm, (iii) translation y-axis: random shift between and mm, (iv) rescaling: random zoom factor between and , (v) adding Gaussian noise with zero mean and 0.01 standard deviation and (vi) elastic deformations using a dense deformation field obtained through a

grid of control-points and B-spline interpolation. Figure

12 summarizes the learning curves, both the models validation loss decreased consistently as the training loss went down, indicating less over-fitting on the training data. Closer observation on the validation curves with model trained on data-augmentation revealed minor improvement in the segmentation performance on the validation set which was also corroborated on the held-out test-set.

Figure 12: Comparison of the learning curves of the DFCN-C with and without data augmentation. (a) Loss curves, (b) Dice score curves for MYO class.

3.1.7 Comparing performance and learning curves with other baseline models

For analyzing and benchmarking the proposed network DFCN-C (F=36, P=3, k=12), we constructed 3 baseline models, namely:- (i) Modified U-NET (Ronneberger et al. (2015)) starting with 32 initial feature maps and 3 max-pooling layers (F=32, P=3), (ii) DFCN-A (Jégou et al. (2017)) (F=32, P=3, k=12), (iii) DFCN-B (F=32, P=3, k=12). The architecture of DFCN-B was similar to DFCN-C with only exception being at the initial layer. Its initial layer had only one CNN branch which learnt 32 filters. All the models were trained in the same manner.

Figure 13 summarizes and compares the learning process of DFCN-C with other three baseline models. In U-NET, it was observed that the loss associated with training and validation to decrease and increase respectively as the training progressed. Such patterns indicate the possibility of the network to over-fit on smaller datasets. Moreover, the number of parameters and GPU memory usage were highest for U-NET. For all the DFCN based architectures, the validation loss consistently decreased as the training loss decreased, hence showed least tendency to over-fit. When comparing amongst the DFCN variants, the validation curves of DFCN-C showed faster convergence and better segmentation scores when compared DFCN-B. Hence corroborating the effectiveness of proposed methodology of multi-scale feature extraction and feature fusion. When compared to DFCN-A, the number of parameters and GPU memory usage were relatively low for both DFCN-B and DFCN-C. Table 5 compares the results of the proposed DFCN-C architecture against the baseline models.

Figure 13: Comparison of the learning curves of the DFCN-A, DFCN-B, DFCN-C (proposed) and U-NET. (a) Training loss curves, (b) Dice score curves for MYO class during training, (c) Validation loss curves, (d) Dice score curves for MYO class during validation.
U-NET 0.94 (0.02) 0.90 (0.06) 0.90 (0.04) 0.82 (0.07) 0.85 (0.05) 0.86 (0.04) 0.88 (0.05) 1,551k
DFCN-A 0.96 (0.02) 0.91 (0.08) 0.94 (0.02) 0.88 (0.08) 0.89 (0.03) 0.90 (0.03) 0.91 (0.04) 435k
DFCN-B 0.96 (0.02) 0.90 (0.08) 0.92 (0.05) 0.84 (0.17) 0.88 (0.04) 0.88 (0.05) 0.90 (0.07) 360k
DFCN-C 0.96 (0.02) 0.90 (0.08) 0.95 (0.02) 0.87 (0.08) 0.89 (0.03) 0.89 (0.03) 0.91 (0.04) 371k
U-NET 10.99 (12.29) 11.90 (10.56) 12.56 (8.39) 12.67 (9.42) 9.64 (6.85) 12.22 (8.08) 11.66 (9.26) 3GB
DFCN-A 6.85 (10.45) 4.00 (2.73) 4.59 (2.09) 5.24 (1.67) 8.16 (8.95) 4.65 (3.17) 5.58 (4.84) 2GB
DFCN-B 8.28 (10.24) 6.68 (7.37) 5.19 (2.12) 7.88 (4.91) 8.15 (7.51) 9.18 (7.19) 7.56 (6.56) 1GB
DFCN-C 4.42 (6.39) 6.51 (6.40) 4.20 (1.81) 7.10 (2.95) 4.48 (4.52) 5.87 (4.35) 5.43 (4.40) 1GB
Table 5: Evaluation results for baseline models and DFCN-C. The values are provided as mean (standard deviation). The GPU memory usage shown in the table is when with input images are of dimension and the batch-size is .

3.1.8 Visualization of initial layer kernels, feature maps and posterior probabilities

Figure 14 visualizes intermediate feature maps of the trained DFCN-C network on ACDC-2017 dataset. Visualization of initial layer’s learned kernels were not intuitively interpretable, but their corresponding feature maps showed distinct pattern for different kernel sizes. In the feature maps the edges appeared sharp indicating attention on smaller regions and one of its kernels learnt the identity transformation. Whereas, feature maps of larger kernels had blurred out edges indicating attention over larger region. The posterior probability maps of all the four classes had sharp contours and were not ambiguous.

Figure 14: The figures compares illustrates the feature maps of trained model. (a) Input image fed to the network, (b) posterior probability maps after soft-max output, (c) The final prediction of labels, (d) - (f) visualization of the initial layers kernels- , and , (g) - (i) Filter response to the input image (a).

3.1.9 Classifier selection and feature importance study for cardiac disease diagnosis

(a) 5-Class Feature Importance
(b) 2-Class Feature Importance
Figure 17: Feature importance ranking by Random Forest classifier for two different classification tasks. The green bars are the feature importances of the forest, along with their inter-trees variability (standard deviation). The features highlighted in red color indicate hand-crafted myocardial wall thickness features at ES phase. (a) shows the feature importance for 5-class task, it can be seen that highlighted features have been given low importance, (b) shows the feature importance for the 2-class task, clearly it can be seen that the highlighted features have been ranked higher.

In order to validate the hypothesis that myocardial features alone were sufficient for distinguishing between MINF and DCM, feature importance study was done using Random Forest classifier trained to classify between MINF and DCM cases using all the features listed in Table 1. The Fig. 17((a) & (b)) compares the feature importance ranking by Random forest when trained to classify all the five groups vs. only DCM and MINF respectively. The expert classifier was trained on myocardial features at End Systole phase only. So, Table 6 lists the classifiers experimented for the First-stage and the Expert stage classification tasks. Only classifiers with accuracy scores above were selected for the ensemble.

Classifier 5-class 2-class
LR 0.94 (0.06) 0.82 (0.06)
RF 0.96 (0.02 0.85 (0.09)
GNB 0.96 (0.04) 0.82 (0.06)
XGB 0.93 (0.04) 0.88 (0.11)
SVM 0.95 (0.04) 0.85 (0.09)
MLP 0.97 (0.02) 0.97 (0.05)
K-NN 0.91 (0.04) 0.85 (0.09)
Vote 0.97 (0.04) 0.93 (0.06)
Table 6:

The table lists the classifiers evaluated and their corresponding five-fold cross-validation accuracy scores. The classifiers were evaluated for both first stage of the ensemble (5-class) and expert discrimination (2-class). The values are provided as mean (standard deviation). LR- Logistic Regression, RF- Random Forest with 1000 trees, GNB- Gaussian Naive Bayes, XGB- Extreme Gradient Boosting with 1000 trees, SVM- Support Vector Machine with Radial Basis kernel, MLP- Multi-layer perceptron with 2-hidden layers with 100 neurons each, K-NN- 5-Nearest Neighbors, Vote- Voting classifier based on SVM, MLP, NB & RF.

3.2 LV-2011 dataset

The training dataset comprised of 100 patients ( 2D images) and it was randomly split into for training, validation and testing.

3.2.1 Network architecture

The network architecture used for training on LV-2011 dataset was as per section 3.1.1). The network in Fig. 6 was modified to reflect the number of classes (n=2) in the final classification layer.

3.2.2 Training

The model was trained for 50 epochs as described in section 3.1.2. Additionally the data-augmentation scheme included random vertical and horizontal flips.

4 Results

4.1 Performance evaluation on ACDC challenge

In order to gauge performances on the held out test set by the challenge organizers, clinical metrics such as the average ejection fraction (EF) error, the average left ventricle (LV) and right ventricle (RV) systolic and diastolic volume errors, and the average myocardium (MYO) mass error were used. For the geometrical metrics, the Dice and the Hausdorff distances for all 3 regions at the ED and ES phases were evaluated. For the cardiac disease diagnosis the metrics used was accuracy.

4.1.1 Segmentation results

EF cor.
EF std.
Vol. ED
Vol. ED
Vol. ED
1 Isensee et al. (2017) 0.968 0.931 7.384 6.905 0.991 0.178 3.058 0.997 2.668 5.726
2 Ours 0.964 0.917 8.129 8.968 0.989 -0.548 3.422 0.997 0.576 5.501
3 Yeonggul Jang 0.959 0.921 7.737 7.116 0.989 -0.330 3.281 0.993 -0.440 8.701
4 Baumgartner et al. (2017) 0.963 0.911 6.526 9.170 0.988 0.568 3.398 0.995 1.436 7.610
1 Isensee et al. (2017) 0.946 0.899 10.123 12.146 0.901 -2.724 6.203 0.988 4.404 10.823
2 Zotti et al. (2017) 0.941 0.882 10.318 14.053 0.872 -2.228 6.847 0.991 -3.722 9.255
3 Ours 0.935 0.879 13.994 13.930 0.858 -2.246 6.953 0.982 -2.896 12.650
4 Baumgartner et al. (2017) 0.932 0.883 12.670 14.691 0.851 1.218 7.314 0.977 -2.290 15.153
1 Isensee et al. (2017) 0.902 0.919 8.720 8.672 0.985 -3.842 9.153 0.989 -4.834 7.576
2 Ours 0.889 0.898 9.841 12.582 0.979 -2.572 11.037 0.990 -2.873 7.463
3 Baumgartner et al. (2017) 0.892 0.901 8.703 10.637 0.983 -9.602 9.932 0.982 -6.861 9.818
4 Patravali et al. (2017) 0.882 0.897 9.757 11.256 0.986 -4.464 9.067 0.989 -11.586 8.093
Table 7: Comparisons with different approaches on cardiac segmentation. The evaluations of Left ventricle, Right Ventricle and Myocardium are listed in top, middle and bottom, respectively. Our proposed method had an overall ranking of 2. The values provided are mean (standard deviation). DC- Dice score, HD- Hausdorff distance, cor- correlation
Figure 18: Segmentation results at both ED& ES phases of cardiac cycles on a subset of ACDC training set reserved for testing. The columns from left to right indicate: the input images, segmentations generated by the model and their associated ground-truths. The rows from top to bottom indicate: short axis slices of the heart at basal, mid and apex. In all figures the colors red, green and blue indicate RV, MYO and LV respectively. The model did erroneous segmentation for RV in the basal slice (e) and the Myocardium was over-segmented in the apical slice (n).

Table 7 summarizes our segmentation results along with other top three performing methods. Our method was ranked 2nd for LV and MYO and 3rd for RV. Our method relied on automated localization of the LV center and cropping a patch of size around it. So, in cases of abnormally large RV, the model under-segmented RV region when its area extended beyond the patch size. The aforementioned reasons & irregular shape of RV when compared to LV lead to a dip in dice score & higher Hausdorff distance for RV as seen in MICCAI 2017 leader-board (Table 7)). Figure 18 shows the results of segmentation produced by our method at ED and ES phase of the cardiac cycle on held-out training dataset reserved for testing. Our network produced accurate results on most of the slices, but sometimes gave erroneous segmentation at basal and apical slices. All the top performing methods were based on CNNs and the top performing method (Isensee et al. (2017)) employed ensembling and label-fusion strategy to combine the results of multiple models based on 2D and 3D U-NET inspired architectures.

4.1.2 Automated disease diagnosis results

Table 8 summarizes our automated cardiac disease diagnosis results along with other methods. Our 2-stage disease classification approach along with hand-crafted features for myocardial wall thickness helped in surpassing other methods. Other methods mostly relied on cardiac volumetric features and single stage classification model for automated cardiac diagnosis.

Rank Method Accuracy
1 Ours 1
2 Isensee et al. (2017) 0.92
3 Wolterink et al. (2017) 0.86
4 Lisa Koch 0.78
Table 8: Comparisons with different approaches on automated cardiac disease diagnosis. Our accuracy score reported in the table is from post-MICCAI leader board.

4.2 Performance evaluation on LV-2011 Challenge

The challenge organizers evaluated our segmentation results on those images of final validation set for which reference consensus contours CS* Suinesiaputra et al. (2014) were available. The organizers categorized individual images of the final validation set into apex, mid and basal slices. The reported metrics were Jaccard index, Dice score, Sensitivity, Specificity, Accuracy, Positive Predictive Value and Negative Predictive Value (refer A).

4.2.1 Segmentation results

Method FA Jaccard Sensitivity Specificity PPV NPV
AU 0.84 (0.17) 0.89 (0.13) 0.96 (0.06) 0.91 (0.13) 0.95 (0.06)
CNR 0.77 (0.11) 0.88 (0.09) 0.95 (0.04) 0.86 (0.11) 0.96 (0.02)
FCN 0.74 (0.13) 0.83 (0.12) 0.96 (0.03) 0.86 (0.10) 0.95 (0.03)
Ours 0.74 (0.15) 0.84 (0.16) 0.96 (0.03) 0.87 (0.10) 0.95 (0.03)
AO 0.74 (0.16) 0.88 (0.15) 0.91 (0.06) 0.82 (0.12) 0.94 (0.06)
SCR 0.69 (0.23) 0.74 (0.23) 0.96 (0.05) 0.87 (0.16) 0.89 (0.09)
DS 0.64 (0.18) 0.80 (0.17) 0.86 (0.08) 0.74 (0.15) 0.90 (0.08)
INR 0.43 (0.10) 0.89 (0.17) 0.56 (0.15) 0.50 (0.10) 0.93 (0.09)
Table 9: Comparison of segmentation performance with other published approaches on LV2011 validation set using the consensus contours. AU (Li et al. (2010)), AO (Fahmy et al. (2011)), SCR (Jolly et al. (2011)), DS, and INR (Margeta et al. (2011)) values were taken from Table 2 of (Suinesiaputra et al. (2014) ). FCN values are taken from Table 3 of ( Tran (2016) ) and CNR regression was taken from Table 3 of Tan et al. (2017). Values are provided as mean (standard deviation) and in descending order by Jaccard index. FA- Fully Automated
Figure 19: From Right to Left: Input, Prediction and Ground Truth for basal, mid and apical slices. The segmentation results are for a subset of LV-2011 training set reserved for testing. The indicates the ROI center detected by Fourier-Circular Hough Transform approach. The red circle represents the best fitting circle across all slices (Myocardium contours of the mid-slices are more circular in shape and hence the best fitting Hough circle radii mostly encompass them). The blue bounding box indicates the patch cropped around ROI center for feeding the segmentation network. Some of the basal slices of the training data had ground truths with partial Myocardium. Even though we trained our model with such slices, our model showed the ability to generalize well.

Table 9 compares the results between our proposed approach and other published results using LV-2011 dataset. Our Jaccard index (mean standard deviation) was , , for the apex, mid and base slices respectively. The errors were mostly concentrated in apical slices. For most of the evaluation measures including Jaccard index, our approach was on par with other fully automated published methods. The AU method used manual guide-point modeling and required human expert approval for all slices and frames. The CNR (Tan et al. (2017)) method used manual input for identifying basal and apical slices and used convolutional regression approach to trace Endocardium and Epicardium contours from the estimated LV center and their network had about million parameters. When compared to FCN (Tran (2016)) our network performance was on par with it. But, in contrast to our network architecture and training regime, FCN had about million parameters and used of the dataset for training. Figure 19 illustrates Myocardium prediction at three different slice levels on a subset of training set reserved for testing.

4.3 Model generalization across different data distributions and effect of sparse annotations

Table 10 compares the Myocardium segmentation performance on LV-2011 final validation set between models trained on LV-2011 and ACDC dataset. Despite the ACDC-2017 model being trained to segment LV, RV and MYO at only ED and ES frames with just one-sixth of the training images used to train LV-2011 model, it was able to generalize segmentation for all cardiac frames. The results corroborates the model’s effectiveness with sparse annotations and its generalization across datasets arising from completely different data distribution.

Dataset Training images Jaccard Dice Accuracy Sensitivity Specificity PPV NPV
LV-2011 20,360 0.74 (0.15) 0.84 (0.14) 0.93 (0.04) 0.84 (0.16) 0.96 (0.03) 0.87 (0.10) 0.95 (0.03)
ACDC-2017 1,352 0.71 (0.13) 0.82 (0.11) 0.92 (0.04) 0.81 (0.15) 0.91 (0.06) 0.82 (0.12) 0.94 (0.06)
Table 10: Comparison of segmentation performance across different data distributions and effect of sparse annotation. The table compares the myocardium segmentation performance of model trained on only ACDC dataset against model trained on LV-2011 dataset. Both the models were evaluated on LV-2011 final validation set. The values indicate mean (standard deviation)

5 Discussion and conclusion

In this paper, we demonstrated the utility and efficacy of fully convolutional neural network architecture based on Residual DenseNets with Inception architecture in the initial layer for cardiac MR segmentation. Rather than using ensembles or cascade of networks for segmentation, we showed that a single efficient network trained end-to-end has the potential to learn to segment the Left & Right Ventricles and Myocardium for all the short-axis slices in all cardiac phases. Our approach achieved near state-of-the-art segmentation results on two benchmark cardiac MR datasets which exhibited variability seen in cardiac anatomical & functional characteristics across multiple clinical institutions, scanners, populations and heart pathology. Comprehensive evaluations based on multiple metrics revealed that our entire pipeline starting from classical computer vision based techniques for ROI cropping to CNN based segmentation was robust and hence yielded higher segmentation performance. For a volume of input dimension it took about 3 seconds for ROI detection and 7 seconds for network inference.

When compared to most of the other published CNN based approaches (Lieman-Sifry et al. (2017), Tran (2016), Tan et al. (2017)) for cardiac MR short-axis segmentation our model required least number of trainable parameters ( million, an order of 10 fold reduction when compared to standard U-NET based architectures). Our ablation studies indicated that FCN / U-NET based architectures showed higher tendencies to over-fit on small datasets and in absence of proper training strategy like data-augmentation the networks failed to generalize well. Our network’s connectivity pattern ensured optimal performance with least model capacity and showed better generalization on small datasets even without employing data-augmentation. By incorporating residual type long skip and short-cut connections in the up-sampling path we overcame the memory explosion seen in FCN based on DenseNets. For multi-scale processing of images, we introduced the idea of incorporating inception style parallelism in the segmentation network. Though we limited the inception structure to only first layer, this could be extended to deeper layers in a computationally efficient manner. The proposed weighting scheme in the dual objective function combined the benefits of both vanilla cross-entropy loss and dice loss.

The main limitation of our model lies in its inability to segment cardiac structures in extremely difficult slices of the heart such as apex and basal regions. Segmentation errors on apical slices have very minor impact on the overall volume computation and hence for cardiac disease diagnosis precise segmentation of all the slices was not always necessary. The demonstrated potential of the our 2D-DFCN model can be scaled to 3-D for volumetric segmentation tasks. Lastly, we present a fully automated framework which does both cardiac segmentation and precise disease diagnosis, which has potential for usage in clinical applications.

Appendix A Evaluation Metrics

A brief overview of the main metrics reported in the literature used for comparative purposes is listed in this section. Let P and G be the set of voxels enclosed by the predicted and ground truth contours delineating the object class in a medical volume respectively. The following evaluation metrics were used to assess the quality of automated segmentation methods using the ground truth as reference:

  1. Dice overlap coefficient is a metric used for assessing the quality of segmentation maps. It basically measures how similar predicted label maps are with respect to ground truth. The Dice score varies from zero to one (in-case of perfect overlap).


    where denotes the cardinality of the set, & indicate True and False Positives, & :- True and False Negatives.

    The dice coefficient between two binary volumes can be written as:


    where the sums run over the voxels, of the predicted binary segmentation volume and the ground truth binary volume .

  2. Jaccard index (also know as Intersection over Union) is a overlap measure used for comparing the similarity and diversity between two sets. The Jaccard index varies from zero to one (representing perfect correspondence).

  3. Sensitivity (TPR), Specificity (SPC), Positive Predictive Value (PPV) and Negative Predictive Value (NPV) are defined as:

  4. Hausdorff distance is a symmetric measure of distance between two contours and is defined as:


    A high Hausdorff value implies that the two contours do not closely match. The Hausdorff distance is computed in millimeter with spatial resolution obtained from the DICOM tag Pixel Spacing.

Appendix B Region of Interest Detection

b.1 Fourier Analysis

The discrete Fourier transform Y of an N-D array X is defined as


Each dimension has length for , and are complex roots of unity where is the imaginary unit. It can be seen that the N-D Fourier transform Eq.(20) of an N-D array is equivalent to computing the 1-D transform along each dimension of the N-D array.

The short-axis cardiac MR images of a slice were taken across entire cardiac cycle and these sequence of images can be treated as 2-dimensional signal varying over time (2D+T signal - ). The structures pertaining to heart like the Myocardium and ventricles show significant changes due to heart beat motion. Hence, by taking 3-D Fourier Transform along time axis and analyzing the fundamental Harmonic (also called H1 component, where H stands for Hilbert Space) it was possible to determine pixel regions corresponding to ROI which show strongest response to cardiac frequency.

The first harmonic of the 3-D FFT was transformed back into original signal’s domain (spatial) using 2-D inverse FFT. The result of the previous transformation lead to Complex valued signal. Since, the original signal was Real, the phase component was ignored and only the magnitude of the H1 component (see Figure 20) was retained. The H1 components were estimated for all the slices of the heart starting from base to apex and stacked to form a 3-D volume. The noise present throughout this whole volume was reduced by discarding pixel values which were less than 1% of the maximum pixel intensity in the whole volume.

Figure 20: The figure shows an example of H1 component image got from a series of MR images of a cardiac short axis slice. It can be seen that most of the chest cavity excluding heart and some adjacent structures have disappeared. If we consider a pixel just outside the LV blood pool at the end diastole, it goes from being bright when it was inside the blood pool to dark when it was in Myocardium at End Systole, because the region containing blood was contracting inwards. The pixel gets brighter again as the heart approaches end diastole. The said pixel’s intensity variation will resemble a waveform having frequency same as the heartbeat, hence the H1 component captures those structures of the heart which were responsible for heart beat.

b.2 Circular Hough Transform

The LV Myocardium wall resembles circular ring and this contracts and expands during the cardiac cycle. The pixel regions whose intensity varied because of this movement were captured by the H1 component (seen as bright regions in the image). On applying Canny Edge Detection on these H1 component images, two concentric circles were seen which approximate the myocardial wall boundaries at End Diastole and End Systole phases. Henceforth, the localization of the left ventricle center was done using Gaussian Kernel-based Circular Hough transform approach (See Figure 21).

Figure 21: Gaussian Kernel-based Circular Hough Transform approach was used for left ventricle (LV) localization. The steps involved in ROI detection were (a) Canny edge detection was done on each short axis slice’s H1 component image, (b) For each of the edge maps the Hough circles for a range of radii were found, (c) For each of the edge maps, only highest scoring Hough circles were retained, where was a hyper-parameter, (d) For each of the retained circles, votes were cast using an Gaussian kernel that models the uncertainty associated with the circle’s center. This approach makes the transform more robust to the detection of spurious circles (in the figure LV center’s likelihood surface is overlayed on a slice, the red and purple regions indicates high and low likelihood of LV center respectively), (e) The maximum across LV likelihood surface was selected as the center of the ROI and a square patch of fixed size () was cropped.

Appendix C Overview of CNN connectivity pattern variants

c.1 Overview of DenseNets

DenseNets are built from dense blocks and pooling operations, where each dense block (DB) is an iterative concatenation of previous feature maps whose sizes match. A layer in dense block is composition of Batch Normalization (BN) (

Ioffe and Szegedy (2015)), non-linearity (activation function), convolution and dropout (Srivastava et al. (2014)). The output dimension of each layer has k feature maps where k, is referred to growth rate parameter, is typically set to a small value (e.g. k=8). Thus, the number of feature maps in DenseNets grows linearly with the depth. For each layer in a DenseNet, the feature-maps of all preceding layers of matching spatial resolution are used as inputs, and its own feature-maps are passed onto subsequent layers. The output of the layer is defined as:


where represents the feature maps at the layer and represents the concatenation operation. In our case, H is the layer comprising of Batch Normalization (BN), followed by Exponential Linear Unit (ELU) (Clevert et al. (2015)), a convolution and dropout rate of . A Transition Down (TD) layer is introduced for reducing spatial dimension of feature maps which is accomplished by using a convolution (depth preserving) followed by a max-pooling operation. This kind of connectivity pattern has the following advantages:

  • It ensures that the error signal can be easily back-propagated to earlier layers more directly so this kind of implicit deep super vision, as earlier layers can get more direct supervision from the final classification layer.

  • Higher parameter and computation efficiency is achieved than a normal ConvNet. In a normal ConvNet the number of parameters is proportional to square of the number of channels () produced at output of each layer (i.e. ), however in DenseNets the number of parameters is proportional to where is the layer index and we usually set much smaller than so the number of parameters in each layer of the DenseNet is much fewer than that in normal ConvNet.

  • It ensures that there is maximum feature reuse as the features fed to each layer is a consolidation of the features from all the preceding layers and this leads to learning features which are more diversified and pattern rich.

  • DenseNets have shown to be well suited even when the training data is minimal this is because the connectivity pattern in DenseNets ensures that both low and high complexity features are maintained across the network. Hence, the final classification layer uses features from all complexity levels and thereby ensures smooth decision boundaries. Whereas in normal ConvNet the final classification layer builds on top of the last convolution layers which are mostly complex high level features composed of many non-linear transformations.

c.2 Overview of Residual Networks

Figure 22: Residual Learning: A building block

Residual Networks (ResNets) are designed to ease the training of very deep networks by introducing a identity mapping (short-cut connections) between input () and output of a layer () by performing element-wise summation of the non-linear transformation introduced by the layer and its input. Referring to Figure 22, the residual block is reformulated as , which consists of the residual function and input . The idea here is that if the non-linear transformation can approximate the complicated function , then it is possible for it to learn the approximate residual function .

c.3 Overview of Inception Architectures

The Inception modules were a parallel sub-networks (Figure 23 (a)) introduced in GoogLeNet architecture for ILSVRC 2014 competition, these modules are stacked upon each other, with occasional max-pooling layers with stride 2 to reduce the spatial dimension. The convolution allowed dimension reduction, thereby making the architecture computationally efficient. These modules were used only in higher layers, whereas the lower layers maintained traditional convolution architecture because of technical constraints (Szegedy et al. (2015)).

For the task of semantic segmentation we proposed to use modified version of the inception module (Figure 23(b)) only in the first layer, however this could be extended to higher layers. The inception architecture design allows the visual information to be processed at various scales and then aggregated so that the next stage can abstract features from the different scales simultaneously. The ratio of

convolutions could be skewed (like

) as larger kernels have larger spatial coverage and can capture higher abstractions.

Figure 23: Inception module introduces parallel paths with different receptive field sizes by making use of multiple filters with different sizes, e.g. : , , convolutions and max-pooling layer. These feature maps are concatenated at the end. These operations are meant to capture sparse patterns of correlations in the stack of feature maps. The figures: (a) shows the naive version of the inception module, (b) modified version of inception module by excluding max-pooling and introducing a larger kernel () to increase the receptive field. For the task of semantic segmentation a small kernel helps in detecting small target regions whereas a larger kernel contributes to not only detecting larger target regions but also effectively aids in eliminating false positive regions that have similar properties as the target of interest.

Appendix D Cardiac Disease Classification

d.1 Myocardial Wall Thickness Variation Profile Features

Figure 24: The figure illustrates the procedure adopted for estimation of myocardial wall thickness at each short-axis slice. (a) shows the Myocardium segmentation and the red cross-bar indicates the wall thickness at that particular location, (b) shows the binary hole-filling operation on (a), (c) shows the left ventricle cavity got by performing the image subtraction operation between (a) and (b), (d)& (e) Canny edge detection with was performed on (b) and (c) to get interior and exterior contours.

We adopted the following procedure for estimating the myocardial wall thickness variation profile features:

  1. The myocardial segmentation mask was subject to canny edge detection to detect the interior and exterior contours. Binary morphological operations like hole-filling and erosion was done to ensure contour thickness to be one pixel width. The Figure 24 illustrates the procedure adopted for finding contours.

  2. Let and be the set of pixels corresponding to the interior and exterior contours. Then the Myocardial Wall Thickness (MWT) is the set of shortest euclidean distance (d) measures from a pixel in interior contour to any pixel in the exterior contour . Formally, the MWT for a Short-Axis (SA) slice is given by:

  3. The mean and standard deviation of the MWT is estimated for each SA slice of the heart in ED and ES phases.

  4. From the above measurements, features were derived for quantifying the MWT variation profile at ED and ES phases. The idea was to mathematically quantify how smooth was the variation of average MWT when seen across Long-Axis (LA). Also, how uniform was the MWT in SA slices and to check whether this uniformity was preserved across the slices in LA. The below hand-crafted features were estimated from MWT per SA slices at each cardiac phase:

    • : Maximum of the mean myocardial wall thickness seen across slices in LA at ES phase.

    • : Standard deviation of the mean myocardial wall thickness seen across slices in LA at ES phase.

    • : Mean of the standard deviation of the myocardial wall thickness seen across slices in LA at ES phase.

    • : Standard deviation of the standard deviation of the myocardial wall thickness seen across slices in LA at ES phase.

    • Similar set of four features were estimated for ED phase.

We observed that all classifiers in the first stage of the ensemble had difficulty in distinguishing between DCM vs. MINF disease groups and often misclassified. The Figure 27 illustrates feature importance ranking of the Random Forest classifier (Liaw et al. (2002)

) and its confusion matrix on validation set. The feature importance of Random Forest classifier suggested that it was giving very low importance to myocardial wall thickness variation profile features and giving more importance to volumetric features like EF. In both MINF & DCM disease groups the EF of LV is very low and hence could be the reason for confusion.

(a) Confusion Matrix
(b) Feature Importance
Figure 27: The figure (a) Shows the confusion matrix of the Random Forest predictions on held-out validation set for cardiac disease classification. Rows correspond to the predicted class and columns to the target class, respectively, (b) Shows the feature importance of all the extracted features by Random Forest classifier. The importance of features on disease classification task was evaluated from individual trees of the Random Forest. The green bars are the feature importances of the forest, along with their inter-trees variability (standard deviation).

Clinically, the myocardial wall thickness and its variation profile are the key discriminators in distinguishing between MINF and DCM (Karamitsos et al. (2009)). The myocardial wall thickness peaks during End Systole (ES) phase and is minimal during End Diastole (ED) phase of the cardiac cycle. The myocardial wall thickness changes smoothly in Normal cases. In patients with MINF the wall thickness variation profile is not smooth in both Short-Axis (SA) and Long-Axis (LA) views. Whereas, with DCM cases the wall thickness is extremely thin. Figure 28 illustrates the myocardial wall thickness variation in Normal, DCM and MINF cases.

Figure 28: The figures shows the short-axis cardiac segmentation of Normal, DCM and MINF cases in axial ((a)-(c)) and sagittal ((d)-(f)) views at End Systole phase. The segmentation labels for RV, LV and MYO are red, yellow and white respectively. For the Normal case the myocardial wall thickness is uniform through out as seen in axial view and its variation along long axis is also smooth. For DCM the myocardial wall is extremely thin when compared to Normal. For MINF, certain sections of the Myocardium wall are extremely thin when compared to rest and hence non-uniformity of thickness is seen and also in long-axis the variation is rough.
(a) Confusion Matrix
Figure 30: The figure shows the Confusion matrix of the Expert classifier (MLP) predictions on DCM vs. MINF classification task on held-out validation set. Rows correspond to the predicted class and columns to the target class, respectively. The MLP accuracy is 100%



  • Abadi et al. (2016) Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.

    Tensorflow: Large-scale machine learning on heterogeneous distributed systems.

    arXiv preprint arXiv:160304467 2016;.
  • Albá et al. (2018) Albá, X., Lekadir, K., Pereañez, M., Medrano-Gracia, P., Young, A.A., Frangi, A.F.. Automatic initialization and quality control of large-scale cardiac mri segmentations. Medical Image Analysis 2018;43:129–141.
  • Avendi et al. (2016) Avendi, M., Kheradvar, A., Jafarkhani, H.. A combined deep-learning and deformable-model approach to fully automatic segmentation of the left ventricle in cardiac mri. Medical image analysis 2016;30:108–119.
  • Ayed et al. (2008) Ayed, I.B., Lu, Y., Li, S., Ross, I.. Left ventricle tracking using overlap priors. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2008. p. 1025–1033.
  • Bai et al. (2015) Bai, W., Shi, W., Ledig, C., Rueckert, D.. Multi-atlas segmentation with augmented features for cardiac mr images. Medical image analysis 2015;19(1):98–109.
  • Baumgartner et al. (2017) Baumgartner, C.F., Koch, L.M., Pollefeys, M., Konukoglu, E.. An exploration of 2d and 3d deep learning techniques for cardiac mr image segmentation. arXiv preprint arXiv:170904496 2017;.
  • Ben-Cohen et al. (2015) Ben-Cohen, A., Klang, E., Diamant, I., Rozendorn, N., Amitai, M.M., Greenspan, H.. Automated method for detection and segmentation of liver metastatic lesions in follow-up ct examinations. Journal of Medical Imaging 2015;2(3):034502–034502.
  • (8) Bernard, O., Lalande, A., Zotti, C., Cervenansky, F., Humbert, O., Jodoin, P.M.. Automated cardiac diagnosis challenge (acdc) miccai challenge 2017 in conjunction with the stacom workshop. URL:; accessed: 16- Nov- 2017.
  • Billet et al. (2009) Billet, F., Sermesant, M., Delingette, H., Ayache, N.. Cardiac motion recovery and boundary conditions estimation by coupling an electromechanical model and cine-mri data. Functional Imaging and Modeling of the Heart 2009;:376–385.
  • (10) Booz Allen Hamilton Inc, , Kaggle, .

    Second annual data science bowl.

    URL:; accessed: 06- Dec- 2017.
  • Boykov and Jolly (2000) Boykov, Y., Jolly, M.P.. Interactive organ segmentation using graph cuts. In: MICCAI. Springer; volume 1935; 2000. p. 276–286.
  • Christ et al. (2016) Christ, P.F., Elshaer, M.E.A., Ettlinger, F., Tatavarty, S., Bickel, M., Bilic, P., Rempfler, M., Armbruster, M., Hofmann, F., D’Anastasi, M., et al. Automatic liver and lesion segmentation in ct using cascaded fully convolutional neural networks and 3d conditional random fields. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2016. p. 415–423.
  • Clevert et al. (2015) Clevert, D.A., Unterthiner, T., Hochreiter, S.. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:151107289 2015;.
  • Cocosco et al. (2008) Cocosco, C.A., Niessen, W.J., Netsch, T., Vonken, E.j., Lund, G., Stork, A., Viergever, M.A.. Automatic image-driven segmentation of the ventricles in cardiac cine mri. Journal of Magnetic Resonance Imaging 2008;28(2):366–374.
  • Cordero-Grande et al. (2011) Cordero-Grande, L., Vegas-Sánchez-Ferrero, G., Casaseca-de-la Higuera, P., San-Román-Calvar, J.A., Revilla-Orodea, A., Martín-Fernández, M., Alberola-López, C.. Unsupervised 4d myocardium segmentation with a markov random field based deformable model. Medical image analysis 2011;15(3):283–301.
  • Cousty et al. (2010) Cousty, J., Najman, L., Couprie, M., Clément-Guinaudeau, S., Goissen, T., Garot, J.. Segmentation of 4d cardiac mri: Automated method based on spatio-temporal watershed cuts. Image and Vision Computing 2010;28(8):1229–1243.
  • Deng and Du (2008) Deng, X., Du, G.. 3d segmentation in the clinic: a grand challenge ii-liver tumor segmentation. In: MICCAI Workshop. 2008. .
  • Dou et al. (2016) Dou, Q., Chen, H., Jin, Y., Yu, L., Qin, J., Heng, P.A.. 3d deeply supervised network for automatic liver segmentation from ct volumes. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2016. p. 149–157.
  • Duda and Hart (1972) Duda, R.O., Hart, P.E.. Use of the hough transformation to detect lines and curves in pictures. Communications of the ACM 1972;15(1):11–15.
  • El Berbari et al. (2007) El Berbari, R., Bloch, I., Redheuil, A., Angelini, E., Mousseaux, E., Frouin, F., Herment, A.. An automated myocardial segmentation in cardiac mri. In: Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE. IEEE; 2007. p. 4508–4511.
  • Eslami et al. (2013) Eslami, A., Karamalis, A., Katouzian, A., Navab, N.. Segmentation by retrieval with guided random walks: application to left ventricle segmentation in mri. Medical image analysis 2013;17(2):236–253.
  • Fahmy et al. (2011) Fahmy, A.S., Al-Agamy, A.O., Khalifa, A.. Myocardial segmentation using contour-constrained optical flow tracking. In: International Workshop on Statistical Atlases and Computational Models of the Heart. Springer; 2011. p. 120–128.
  • Fradkin et al. (2008) Fradkin, M., Ciofolo, C., Mory, B., Hautvast, G., Breeuwer, M.. Comprehensive segmentation of cine cardiac mr images. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2008. p. 178–185.
  • Frangi et al. (2001) Frangi, A.F., Niessen, W.J., Viergever, M.A.. Three-dimensional modeling for functional analysis of cardiac images, a review. IEEE transactions on medical imaging 2001;20(1):2–5.
  • Grosgeorge et al. (2011) Grosgeorge, D., Petitjean, C., Caudron, J., Fares, J., Dacher, J.N.. Automatic cardiac ventricle segmentation in mr images: a validation study. International journal of computer assisted radiology and surgery 2011;6(5):573–581.
  • Havaei et al. (2017) Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Pal, C., Jodoin, P.M., Larochelle, H.. Brain tumor segmentation with deep neural networks. Medical image analysis 2017;35:18–31.
  • He et al. (2015) He, K., Zhang, X., Ren, S., Sun, J..

    Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.

    In: Proceedings of the IEEE international conference on computer vision. 2015. p. 1026–1034.
  • He et al. (2016) He, K., Zhang, X., Ren, S., Sun, J.. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 770–778.
  • Huang et al. (2016) Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.. Densely connected convolutional networks. arXiv preprint arXiv:160806993 2016;.
  • Ioffe and Szegedy (2015) Ioffe, S., Szegedy, C.. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning. 2015. p. 448–456.
  • Isensee et al. (2017) Isensee, F., Jaeger, P., Full, P.M., Wolf, I., Engelhardt, S., Maier-Hein, K.H.. Automatic cardiac disease assessment on cine-mri via time-series segmentation and domain specific features. arXiv preprint arXiv:170700587 2017;.
  • Jégou et al. (2017) Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., Bengio, Y.. The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In: Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on. IEEE; 2017. p. 1175–1183.
  • Jolly et al. (2011) Jolly, M.P., Guetter, C., Lu, X., Xue, H., Guehring, J.. Automatic segmentation of the myocardium in cine mr images using deformable registration. In: STACOM. Springer; 2011. p. 98–108.
  • Kamnitsas et al. (2017) Kamnitsas, K., Ledig, C., Newcombe, V.F., Simpson, J.P., Kane, A.D., Menon, D.K., Rueckert, D., Glocker, B.. Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation. Medical image analysis 2017;36:61–78.
  • Karamitsos et al. (2009) Karamitsos, T.D., Francis, J.M., Myerson, S., Selvanayagam, J.B., Neubauer, S.. The role of cardiovascular magnetic resonance imaging in heart failure. Journal of the American College of Cardiology 2009;54(15):1407–1424.
  • Katouzian et al. (2006) Katouzian, A., Prakash, A., Konofagou, E.. A new automated technique for left-and right-ventricular segmentation in magnetic resonance imaging. In: Engineering in Medicine and Biology Society, 2006. EMBS’06. 28th Annual International Conference of the IEEE. IEEE; 2006. p. 3074–3077.
  • Kaus et al. (2004) Kaus, M.R., Von Berg, J., Weese, J., Niessen, W., Pekar, V.. Automated segmentation of the left ventricle in cardiac mri. Medical image analysis 2004;8(3):245–254.
  • Kingma and Ba (2014) Kingma, D., Ba, J.. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980 2014;.
  • (39) Korshunova, I., Burms, J., Degrave, J., Dambre, J.. Diagnosing heart diseases with deep neural networks. URL:; accessed: 01- Nov- 2017.
  • Krizhevsky et al. (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. 2012. p. 1097–1105.
  • LeCun et al. (1998) LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.. Gradient-based learning applied to document recognition. Proceedings of the IEEE 1998;86(11):2278–2324.
  • Li et al. (2010) Li, B., Liu, Y., Occleshaw, C.J., Cowan, B.R., Young, A.A.. In-line automated tracking for ventricular function with magnetic resonance imaging. JACC: Cardiovascular Imaging 2010;3(8):860–866.
  • Liaw et al. (2002) Liaw, A., Wiener, M., et al. Classification and regression by randomforest. R news 2002;2(3):18–22.
  • Lieman-Sifry et al. (2017) Lieman-Sifry, J., Le, M., Lau, F., Sall, S., Golden, D.. Fastventricle: Cardiac segmentation with enet. In: International Conference on Functional Imaging and Modeling of the Heart. Springer; 2017. p. 127–138.
  • Lin et al. (2013) Lin, M., Chen, Q., Yan, S.. Network in network. arXiv preprint arXiv:13124400 2013;.
  • Lin et al. (2006a) Lin, X., Cowan, B., Young, A.. Automated detection of the left ventricle from 4d mr images: validation using large clinical datasets. Advances in Image and Video Technology 2006a;:218–227.
  • Lin et al. (2006b) Lin, X., Cowan, B., Young, A.. Model-based graph cut method for segmentation of the left ventricle. In: Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference of the. IEEE; 2006b. p. 3059–3062.
  • Lorenzo-Valdés et al. (2004) Lorenzo-Valdés, M., Sanchez-Ortiz, G.I., Elkington, A.G., Mohiaddin, R.H., Rueckert, D.. Segmentation of 4d cardiac mr images using a probabilistic atlas and the em algorithm. Medical Image Analysis 2004;8(3):255–265.
  • Lötjönen et al. (2004) Lötjönen, J., Kivistö, S., Koikkalainen, J., Smutek, D., Lauerma, K.. Statistical shape model of atria, ventricles and epicardium from short-and long-axis mr images. Medical image analysis 2004;8(3):371–386.
  • Lu et al. (2009) Lu, Y., Radau, P., Connelly, K., Dick, A., Wright, G.A.. Segmentation of left ventricle in cardiac cine mri: An automatic image-driven method. In: International Conference on Functional Imaging and Modeling of the Heart. Springer; 2009. p. 339–347.
  • Lynch et al. (2006) Lynch, M., Ghita, O., Whelan, P.F.. Automatic segmentation of the left ventricle cavity and myocardium in mri data. Computers in biology and medicine 2006;36(4):389–407.
  • Lynch et al. (2008) Lynch, M., Ghita, O., Whelan, P.F.. Segmentation of the left ventricle of the heart in 3-d+ t mri data using an optimized nonrigid temporal model. IEEE Transactions on Medical Imaging 2008;27(2):195–203.
  • Margeta et al. (2011) Margeta, J., Geremia, E., Criminisi, A., Ayache, N.. Layered spatio-temporal forests for left ventricle segmentation from 4d cardiac mri data. In: International Workshop on Statistical Atlases and Computational Models of the Heart. Springer; 2011. p. 109–119.
  • Miller et al. (2013) Miller, C.A., Jordan, P., Borg, A., Argyle, R., Clark, D., Pearce, K., Schmitt, M.. Quantification of left ventricular indices from ssfp cine imaging: Impact of real-world variability in analysis methodology and utility of geometric modeling. Journal of Magnetic Resonance Imaging 2013;37(5):1213–1222.
  • Milletari et al. (2016) Milletari, F., Navab, N., Ahmadi, S.A.. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 3D Vision (3DV), 2016 Fourth International Conference on. IEEE; 2016. p. 565–571.
  • Mitchell et al. (2001) Mitchell, S.C., Lelieveldt, B.P., Van Der Geest, R.J., Bosch, H.G., Reiver, J., Sonka, M.. Multistage hybrid active appearance model matching: segmentation of left and right ventricles in cardiac mr images. IEEE Transactions on medical imaging 2001;20(5):415–423.
  • Nambakhsh et al. (2013) Nambakhsh, C.M., Yuan, J., Punithakumar, K., Goela, A., Rajchl, M., Peters, T.M., Ayed, I.B.. Left ventricle segmentation in mri via convex relaxed distribution matching. Medical image analysis 2013;17(8):1010–1024.
  • Ordas et al. (2007) Ordas, S., Oubel, E., Leta, R., Carreras, F., Frangi, A.F.. A statistical shape model of the heart and its application to model-based segmentation. Prog Biomed Opt Imaging Proc SPIE, 2007;6511:K65111.
  • Paragios (2003) Paragios, N.. A level set approach for shape-driven segmentation and tracking of the left ventricle. IEEE transactions on medical imaging 2003;22(6):773–776.
  • Patravali et al. (2017) Patravali, J., Jain, S., Chilamkurthy, S.. 2d-3d fully convolutional neural networks for cardiac mr segmentation. arXiv preprint arXiv:170709813 2017;.
  • Pednekar et al. (2006) Pednekar, A., Kurkure, U., Muthupillai, R., Flamm, S., Kakadiaris, I.A.. Automated left ventricular segmentation in cardiac mri. IEEE Transactions on Biomedical Engineering 2006;53(7):1425–1428.
  • Peng et al. (2016) Peng, P., Lekadir, K., Gooya, A., Shao, L., Petersen, S.E., Frangi, A.F.. A review of heart chamber segmentation for structural and functional analysis using cardiac magnetic resonance imaging. Magnetic Resonance Materials in Physics, Biology and Medicine 2016;29(2):155–195.
  • Pereira et al. (2016) Pereira, S., Pinto, A., Alves, V., Silva, C.A.. Brain tumor segmentation using convolutional neural networks in mri images. IEEE transactions on medical imaging 2016;35(5):1240–1251.
  • Petitjean and Dacher (2011) Petitjean, C., Dacher, J.N.. A review of segmentation methods in short axis cardiac mr images. Medical image analysis 2011;15(2):169–184.
  • Petitjean et al. (2015) Petitjean, C., Zuluaga, M.A., Bai, W., Dacher, J.N., Grosgeorge, D., Caudron, J., Ruan, S., Ayed, I.B., Cardoso, M.J., Chen, H.C., et al. Right ventricle segmentation from cardiac mri: a collation study. Medical image analysis 2015;19(1):187–202.
  • Queirós et al. (2015) Queirós, S., Vilaca, J., Morais, P., Fonseca, J., D’hooge, J., Barbosa, D.. Fast left ventricle tracking in cmr images using localized anatomical affine optical flow. In: Proceedings of the SPIE Medical Imaging Meeting. 2015. .
  • Radau et al. (2009) Radau, P., Lu, Y., Connelly, K., Paul, G., Dick, A., Wright, G.. Evaluation framework for algorithms segmenting short axis cardiac mri. The MIDAS Journal-Cardiac MR Left Ventricle Segmentation Challenge 2009;49.
  • Ronneberger et al. (2015) Ronneberger, O., Fischer, P., Brox, T.. U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2015. p. 234–241.
  • Shelhamer et al. (2017) Shelhamer, E., Long, J., Darrell, T.. Fully convolutional networks for semantic segmentation. IEEE transactions on pattern analysis and machine intelligence 2017;39(4):640–651.
  • Simonyan and Zisserman (2014) Simonyan, K., Zisserman, A.. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556 2014;.
  • Srivastava et al. (2014) Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.. Dropout: a simple way to prevent neural networks from overfitting. Journal of machine learning research 2014;15(1):1929–1958.
  • Suinesiaputra et al. (2014) Suinesiaputra, A., Cowan, B.R., Al-Agamy, A.O., Elattar, M.A., Ayache, N., Fahmy, A.S., Khalifa, A.M., Medrano-Gracia, P., Jolly, M.P., Kadish, A.H., et al. A collaborative resource to build consensus for automated left ventricular segmentation of cardiac mr images. Medical image analysis 2014;18(1):50–62.
  • Szegedy et al. (2015) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. p. 1–9.
  • Tan et al. (2017) Tan, L.K., Liew, Y.M., Lim, E., McLaughlin, R.A.. Convolutional neural network regression for short-axis left ventricle segmentation in cardiac cine mr sequences. Medical Image Analysis 2017;39:78–86.
  • Tavakoli and Amini (2013) Tavakoli, V., Amini, A.A.. A survey of shaped-based registration and segmentation techniques for cardiac images. Computer Vision and Image Understanding 2013;117(9):966–989.
  • Tran (2016) Tran, P.V.. A fully convolutional neural network for cardiac segmentation in short-axis mri. arXiv preprint arXiv:160400494 2016;.
  • Tsadok et al. (2013) Tsadok, Y., Petrank, Y., Sarvari, S., Edvardsen, T., Adam, D.. Automatic segmentation of cardiac mri cines validated for long axis views. Computerized Medical Imaging and Graphics 2013;37(7):500–511.
  • Üzümcü et al. (2006) Üzümcü, M., van der Geest, R.J., Swingen, C., Reiber, J.H., Lelieveldt, B.P.. Time continuous tracking and segmentation of cardiovascular magnetic resonance images using multidimensional dynamic programming. Investigative radiology 2006;41(1):52–62.
  • Van Assen et al. (2006) Van Assen, H.C., Danilouchkine, M.G., Frangi, A.F., Ordás, S., Westenberg, J.J., Reiber, J.H., Lelieveldt, B.P.. Spasm: a 3d-asm for segmentation of sparse and arbitrarily oriented cardiac mri data. Medical Image Analysis 2006;10(2):286–303.
  • Wolterink et al. (2017) Wolterink, J.M., Leiner, T., Viergever, M.A., Isgum, I.. Automatic segmentation and disease classification using cardiac cine mr images. arXiv preprint arXiv:170801141 2017;.
  • Zhang et al. (2010) Zhang, H., Wahle, A., Johnson, R.K., Scholz, T.D., Sonka, M.. 4-d cardiac mr image analysis: left and right ventricular morphology and function. IEEE transactions on medical imaging 2010;29(2):350–364.
  • Zhu et al. (2010) Zhu, Y., Papademetris, X., Sinusas, A.J., Duncan, J.S.. Segmentation of the left ventricle from cardiac mr images using a subject-specific dynamical model. IEEE Transactions on Medical Imaging 2010;29(3):669–687.
  • Zotti et al. (2017) Zotti, C., Luo, Z., Humbert, O., Lalande, A., Jodoin, P.M.. Gridnet with automatic shape prior registration for automatic mri cardiac segmentation. arXiv preprint arXiv:170508943 2017;.