Cardiovascular disease (CVD), especially coronary heart
disease (CHD), is one of the leading cause of death in the western
world . It has been demonstrated that including
coronary artery calcium score in traditional risk factors,
results in a significant improvement in the classification
of risk for the prediction of CHD events .
Usually, in clinical practice coronary calcium is manually detected by expert operators on ECG-triggered non-contrast enhanced CT images, i.e. without the use of contrast medium, with the aid of semi-automatic software. Typically, the identified lesions are characterized by intensity values above a standard threshold of 130 HU and belonging to coronary artery’s structures. Subsequently, on the basis of all detected objects, the Agatston score  is computed with the aim of defining the corresponding cardiac risk. Even though this manual approach represents the clinical standard, it remains a time-consuming and operator dependent task.
In recent years a remarkable progress has been made in automatic image recognition, primarily due to a class of deep learning algorithms namely convolutional neural networks 
(CNNs or ConvNets), which have shown great performance in computer vision tasks, such as object detection and image analysis. The main advantage of a CNN is related to its deep architecture, characterized by a series of layers able to extract from raw data a set of discriminative features, not designed by human engineers , in order to operate classifications tasks or image captions.
In this work we exploited CNN to develop an automatic calcium scoring system,able to identify true coronary calcifications and discard other lesions, on ECG-gated basal CT acquisitions. The performance of the CNN-based automatic method for coronary calcium scoring was evaluated and compared with reference standard calcium score values obtained with manual annotations.
2 Materials and Methods
2.1 Data description
The acquisition protocol for calcium scoring involved ECG-triggered non-contrast enhanced CT images with a slice thickness of 3.0 mm, acquired with a tube voltage of 120 kVp and with different in-plane fields of view according to the patient heart’s dimensions.
This study included 152 exams coming from a screening study (MHELP) 
and randomly divided as follow. The first 33 CT scans were used to define a cardiac atlas for the extraction of the region of the heart. The remaining volumes were intended for the operations of the neural network and were split into a set of 45 exams to train the CNN from scratch, a set of 18 exams for the optimization of the network’s hyperparameters and finally a set of 56 exams were used to evaluate the predictions performed by the system.
As reference, manual annotations of coronary calcifications
were provided by an expert operator, with the Agatston
score calculated for each patient according to ,
where is the area () of the i-th coronary lesion, is
a density factor depending on the largest intensity pixel in
that plaque (130-199 HU: =1, 200-299: =2, 300-399: =3,
400 HU: =4), and represents a slice correction factor
linked to the slice thickness of the acquisition ().
The final score was then used to categorize the patient’s risk according to five classes: 0: no evidence (class A), 1-10: minimal (class B), 11-100: mild (class C), 101-400: moderate (class D), 400: severe (class E).
2.2 Image processing
A preliminary processing was performed on all the CT scans with the aim of reducing the computational load the CNN would had to deal with. This stage involved the definition of a cardiac atlas to be registered with the patients volumes, in order to restrict the identification of all the possible candidate lesions, only to the region of the heart.
For the creation of the cardiac atlas we based our approach on the image registration proposed by , a method which mainly relies on a multi-resolution strategy and performs iteratively alignment of volumes by means of affine and diffeomorfic registrations. To obtain a cardiac region of interest (ROI) for each patient, a single manual segmentation on the final atlas was performed to identify the heart. The resulted structure was then registered to all the CT volumes to remove everything outside the cardiac ROI. At this point to define all the possible candidate coronary lesions, a threshold of 130 HU was applied on the segmented CT scan. The detected candidate lesions were so reduced to true coronary calcifications, aortic and valves lesions, noise and possible image artifacts.
2.3 Patch extraction and CNN design
A specific classifier based on convolutional neural networks was implemented to correctly discriminate all candidate lesions extracted.
the designed ConvNet was fed with bidimensional patches, taken only from the axial projections of each scan and centered at the candidate pixel we want to classify as belonging or not to a coronary calcification. Since the original CT acquisitions were characterized by a wide range of in-plane resolutions (from 0.33 mm to 0.51 mm) all the volumes were resampled to achieve a 0.5 mm resolution in x-y plane. Nearest-Neighbor interpolation was used to avoid the creation of new intensity values which could affect the density factorin the Agatston score formula. Pixel spacing along z-axis was left to the routinely used 3 mm. After that, a series of patch sized at 25.5 x 25.5 mm (51 x 51 pixels) was cropped around each pixel belonging to that candiate lesions detected by the previous thresholding operation.
Due to the high imbalance between positive and negative samples, the ConvNet was trained with balanced batches. In addition, for each batch, half of the negative cases corresponded to patches centered at aortic calcifications, which represented the main source of false positive considering the heart as region of interest. In both training and test phase all the patches were normalized by mean value and standard deviation to allow a more robust training and a better prediction.
The architecture of the convolutional network used in this study was inspired by 
and implemented using Theano framework
. The feed-forward neural network consisted of seven convolutional layers (Fig.1
). Each layer was characterized by 16 kernels of size 3x3, with the exception of the last one in which 32 kernels were applied. All the convolutions were valid mode, which means we retained the middle part of the full result of the convolution, thus avoiding from facing boundary situations. The inclusion of 2x2 max-pooling after the first two convolutional layers guaranteed the possibility to reduce the number of the network parameters and to introduce spatial invariance. Finally a single fully connected layer with 64 units was connected to the output of the ConvNet through a softmax function.
13] and uniform He initialization for weights 
were used throughout. The default value of the learning rate was set to 0.001, while the weights update was computed using Stochastic Gradient Descent (SGD) with Adagrad optimizer. To prevent network overfitting, two regularization strategies were employed: a Dropout method with a probability of 0.5 was applied on the fully connected layer and an early stopping strategy was set during the validation phase.
For a better management and retrieving of the data, the extracted patches were stored in a database together with the label of the central pixel, assigned by an expert operator, and its coordinates.
2.4 Evaluation metrics
Sensitivity or true positive rate, was computed on a single scan as the number of true positives divided by the total number of positives. The result was then averaged on the entire test set. To evaluate the number of false positives, which represented the main source of error in the network performance, positive predictive value (PPV) and specificity were employed.
In addition to the above statistics, used to evaluate the ability of the proposed method to correctly classify individual coronary lesions, Agatston score was computed by the system to quantify the amount of coronary artery calcium. The agreement in determining the Agatston-based class risk, between the automatic prediction and the expert operator’s evaluation, was quantified by a linearly weighted Cohen’s . On the other hand, to compare the predicted Agatston score with the manually defined ones, we computed the Pearson coefficient and we generated a Bland-Altman plot with a 95% confidence interval (Fig. 2).
The presented method yielded a good sensitivity (91.24%) in detecting coronary lesions per scan, and reached high specificity (95.37%) with a satisfying PPV (90.50%) in discriminating false coronary lesions from real ones.
The Agatston-based risk assessment of 56 patients was calculated and compared with the manual annotations provided by an expert operator, considered as the ground truth reference. Our automatic method achieved 91.1% risk categorization accuracy with a linearly weighted Cohen’s value of 0.879 and a Pearson coefficient of 0.983 (Fig. 2).
The system trained on a single NVIDIA Tesla 2070 GPU for about 1200 epochs with a time step of 3000 s/epoch. On the same GPU it reached fast performance in prediction task, with an average classification time of 0.780.12 s per CT volume.
The proposed method, exploiting the predictive power of CNNs, provides an automatic system for the quantification of calcium score.
As we observe in Fig. 2 the network had a better accuracy for the first four classes compared to the last one. However, a lower prediction sensitivity did not imply a wrong risk classification for the cases belonging to the last class. Within this context the patch undersampling to 0.5 mm of in-plane resolution, adopted in this work, did not affect negatively the Agatston-based risk assessment.
The analysis focused on the cardiac region, instead of processing the entire image, helped to reduce false positive errors, typically generated on the ribs and in some cases on the descending aorta. Moreover, the definition of a limited region of interest guaranteed a faster prediction on a single volume, thanks to the reduced number of pixels to be classified.
The classification task was demanded to a CNN with a slightly different architecture from . In our CNN the presence of two max pooling prevented undersegmentation errors in some coronary lesions. Interestingly, the network was able to distinguish between an ascending aorta lesion and a proximal coronary ones (Fig. 3). This is probably due to the proposed strategy in the definition of the training set, where half of the negative cases had aortic lesions . As a result of this specific choice of negative candidates, instead of a completely random selection, the network learned to better recognize and discharge most of the false positives coming from the ascending aortic.
Despite the proposed CNN was trained with a small sample, the results obtained highlight that this method can effectively be used for the automatic segmentation and classification of coronary calcifications and can potentially achieve improved results in larger case studies.
In this study an automatic system based on a convolutional neural networks was successfully applied to the automatic segmentation and classification of coronary calcifications from ECG-gated non contrast enhanced CT acquisition.
Conflict of Interest
The authors declare that they have no conflict of interest.
-  Palmieri L, Donfrancesco C, Lo Noce C, et al. Il progetto CUORE: 15 anni di attività per la prevenzione e la riduzione del rischio cardiovascolare Not Ist Super Sanità. 2013;26:3–8.
-  Polonsky Tamar S, McClelland Robyn L, Jorgensen Neal W, et al. Coronary artery calcium score and risk classification for coronary heart disease prediction Jama. 2010;303:1610–1616.
-  Agatston Arthur S, Janowitz Warren R, Hildner Frank J, Zusmer Noel R, Viamonte Manuel, Detrano Robert. Quantification of coronary artery calcium using ultrafast computed tomography Journal of the American College of Cardiology. 1990;15:827–832.
-  LeCun Yann, Bottou Léon, Bengio Yoshua, Haffner Patrick. Gradient-based learning applied to document recognition Proceedings of the IEEE. 1998;86:2278–2324.
Krizhevsky Alex, Sutskever Ilya, Hinton Geoffrey E. Imagenet classification with deep convolutional neural networks inAdvances in neural information processing systems:1097–1105 2012.
-  LeCun Yann, Bengio Yoshua, Hinton Geoffrey. Deep learning Nature. 2015;521:436–444.
-  Pastormerlo Luigi E, Maffei Stefano, Latta Daniele Della, et al. N-terminal prob-type natriuretic peptide is a marker of vascular remodelling and subclinical atherosclerosis in asymptomatic hypertensives European journal of preventive cardiology. 2016;23:366–376.
-  Avants Brian B, Epstein Charles L, Grossman Murray, Gee James C. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain Medical image analysis. 2008;12:26–41.
Prasoon Adhish, Petersen Kersten, Igel Christian, Lauze François, Dam Erik, Nielsen Mads. Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network inInternational conference on medical image computing and computer-assisted intervention:246–253Springer 2013.
Long Jonathan, Shelhamer Evan, Darrell Trevor. Fully convolutional networks for
semantic segmentation in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition:3431–3440 2015.
-  Wolterink Jelmer M, Leiner Tim, Vos Bob D, Hamersvelt Robbert W, Viergever Max A, Išgum Ivana. Automatic coronary artery calcium scoring in cardiac CT angiography using paired convolutional neural networks Medical image analysis. 2016;34:123–136.
-  Team The Theano Development, Al-Rfou Rami, Alain Guillaume, et al. Theano: A Python framework for fast computation of mathematical expressions arXiv preprint arXiv:1605.02688. 2016.
-  Glorot Xavier, Bordes Antoine, Bengio Yoshua. Deep Sparse Rectifier Neural Networks. in Aistats;15:275 2011.
-  He Kaiming, Zhang Xiangyu, Ren Shaoqing, Sun Jian. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification in Proceedings of the IEEE international conference on computer vision:1026–1034 2015.