Direct Automatic Coronary Calcium Scoring in Cardiac and Chest CT

by   Bob D. de Vos, et al.

Cardiovascular disease (CVD) is the global leading cause of death. A strong risk factor for CVD events is the amount of coronary artery calcium (CAC). To meet demands of the increasing interest in quantification of CAC, i.e. coronary calcium scoring, especially as an unrequested finding for screening and research, automatic methods have been proposed. Current automatic calcium scoring methods are relatively computationally expensive and only provide scores for one type of CT. To address this, we propose a computationally efficient method that employs two ConvNets: the first performs registration to align the fields of view of input CTs and the second performs direct regression of the calcium score, thereby circumventing time-consuming intermediate CAC segmentation. Optional decision feedback provides insight in the regions that contributed to the calcium score. Experiments were performed using 903 cardiac CT and 1,687 chest CT scans. The method predicted calcium scores in less than 0.3 s. Intra-class correlation coefficient between predicted and manual calcium scores was 0.98 for both cardiac and chest CT. The method showed almost perfect agreement between automatic and manual CVD risk categorization in both datasets, with a linearly weighted Cohen's kappa of 0.95 in cardiac CT and 0.93 in chest CT. Performance is similar to that of state-of-the-art methods, but the proposed method is hundreds of times faster. By providing visual feedback, insight is given in the decision process, making it readily implementable in clinical and research settings.


page 2

page 3

page 4

page 5

page 8

page 10

page 11

page 12


Automatic Calcium Scoring in Cardiac and Chest CT Using DenseRAUnet

Cardiovascular disease (CVD) is a common and strong threat to human bein...

Generative Models for Reproducible Coronary Calcium Scoring

Purpose: Coronary artery calcium (CAC) score, i.e. the amount of CAC qua...

PHT-bot: Deep-Learning based system for automatic risk stratification of COPD patients based upon signs of Pulmonary Hypertension

Chronic Obstructive Pulmonary Disease (COPD) is a leading cause of morbi...

Coronary Artery Plaque Characterization from CCTA Scans using Deep Learning and Radiomics

Assessing coronary artery plaque segments in coronary CT angiography sca...

Direct and Real-Time Cardiovascular Risk Prediction

Coronary artery calcium (CAC) burden quantified in low-dose chest CT is ...

Automated Quantification of CT Patterns Associated with COVID-19 from Chest CT

Purpose: To present a method that automatically segments and quantifies ...

Automated Evaluation of Standardized Dementia Screening Tests

For dementia screening and monitoring, standardized tests play a key rol...

I Introduction

Cardiovascular disease (CVD) is the global leading cause of death[1]. To reduce the burden of cardiovascular disease the World Health Organization underlines the need for early detection and treatment of individuals with CVD or those who are at high cardiovascular risk due to the presence of one or more risk factors [2]. A strong and independent risk factor for CVD events, e.g. myocardial infarction, is the quantity of coronary artery calcium (CAC) [3, 4, 5]. Quantification of CAC, i.e. calcium scoring, is typically performed in dedicated non-contrast-enhanced ECG-synchronized cardiac CT scans[4]. Alternatively, calcium scoring can be performed in other non-contrast-enhanced CTs that visualize the heart; e.g. in low-dose CT attenuation correction scans acquired in hybrid PET/CT and SPECT/CT [6, 7], or in radiation therapy planning CTs of breast cancer patients [8]. Furthermore, it has been shown that calcium scoring in lung screening low-dose chest CT scans is a predictor for all-cause mortality [9, 10]. In fact, in the National Lung Screening Trial (NLST) CVD was the leading cause of mortality [11]. Thus, CAC quantification, especially as an unrequested finding, has garnered much attention.

Clinically, calcium scoring is performed by experts who manually identify CAC in CT image slices. This is a tedious process of finding and selecting high density voxels in the coronary arteries—commonly defined as two or more connected voxels above 130 Hounsfield Units (HU). In scans not dedicated to calcium scoring this can be particularly cumbersome because of high noise, low resolution, and motion artifacts. Subsequently, when lesions are identified, region growing is used to fully segment the calcified lesions. Finally, after all CAC lesions have been segmented, CAC is quantified using the Agatston score [12]. The Agatston score takes into account the lesion area and the weighted maximum density of the lesion. This score can be used to stratify patients into risk categories [13].

The additional cost involved with manual calcium scoring makes the process prohibitive in settings where it is not the primary request. To simplify the task, qualitative stratification into CVD risk groups was proposed [14, 10]. Qualitative calcium scoring is faster and it demonstrates good inter-rater agreement. However, such an analysis still demands experts who closely inspect the scans. With the ever-increasing amount of scans and the increasing interest in calcium scoring, especially as an unrequested finding, the use of fully-automatic methods might be the preferred direction.

Several automatic methods have been introduced for calcium scoring in non-contrast-enhanced CT, ranging from rule-based approaches [15, 16]

, to the better performing conventional machine learning approaches

[17, 18, 19, 20] and recent deep learning approaches [21, 22, 23, 24]. The main difficulty in automatic calcium scoring is to differentiate CAC from other dense structures. Obviously, CAC exclusively resides in the walls of the coronary arteries, thus most of the automatic methods exploit this prior knowledge.

Išgum et al. [17]

introduced the first method for automatic calcium scoring in chest CT. CAC lesions were described with features and subsequently classified using a two-stage classification approach of k-nearest neighbor and support vector classification. Among texture, size, and shape features, highly important for CAC identification, were the location features. Location features were determined by registering an input image to an atlas image and by extracting the location features from a map of a priori spatial probabilities of CAC. The probability map was created from known CAC locations in 237 chest CTs that were registered to a single priorly chosen atlas image. Shahzad et al. 

[18] used a similar machine learning approach for calcium scoring in cardiac CT, but they employed pair-wise deformable image registration to ten atlases that encoded the coronary arteries. The atlases were made from 85 contrast enhanced CT angiography scans with annotated coronary arteries. The methods of Išgum et al. [17] and Shahzad et al. [18]

relied on feature selection methods to reduce dimensionality. Wolterink et al. 

[19] circumvented feature selection by using an extremely randomized trees classifier. Their method also depended on location features that were obtained by deformable image registration of ten atlases with encoded coronary arteries, but these were obtanied from non-contrast-enhanced CTs. Durlak et al. [20]

combined the principles of the aforedescribed methods: they employed a random forest and made an a priori probability map of coronary arteries locations, made from automatically extracted coronary arteries from cardiac CT angiography images. Instead of using time-consuming deformable image registration to align input images and atlas images, they achieved a speed-up by using affine registration. Similarly, other methods employed information from CTA to aid calcium scoring in cardiac CT. These methods were specifically designed for the coronary calcium score (orCaScore) challenge, and employed rule-based image analysis or conventional machine learning


Most recently proposed methods employ deep learning methods for automatic calcium scoring, in particular convolutional neural networks (ConvNets). ConvNets are known for their automatic feature extracting capabilities and alleviate the need for handcrafting features. Wolterink et al. 

[22] used ConvNets to classify CAC in cardiac CT angiography scans. All voxels were classified using a pair of ConvNets. One ConvNet identified voxels likely to be CAC and discarded the majority of non-CAC-like voxels such as lung and fatty tissue. The other ConvNet more precisely discriminated between CAC and CAC-like negatives. In the method of Lessmann et al.  [23] a single ConvNet was used that classified candidate CAC lesions in lung screening chest CTs. To simplify the classification tasks, both these deep learning methods used an additional ConvNet that localized the heart with a bounding box [26]. More recently, the method of Lessmann et al.[24] fully exploited the feature extraction capabilities of ConvNets without dedicated localization methods. They employed two sequential ConvNets to classify CAC as well as aortic valve, mitral valve, and aorta calcifications in chest CT. The first ConvNet identified candidate calcifications based on their location, and the second ConvNet refined the classification results by reducing false positive errors.

Fig. 1: In a typical automatic calcium scoring workflow, CAC is first identified and subsequently quantified. The proposed method uses ConvNet regression to quantify CAC in image slices directly.

While all aforementioned methods use different strategies, they all follow a workflow similar to current clinical calcium scoring: CAC is first identified and thereafter quantified. The automatic methods show high accuracy, but often at considerable computational cost. Employing these methods on large datasets would require dedicated servers. To alleviate computational cost, we propose a workflow that circumvents intermediate identification and that performs direct quantification (see Figure 1). Direct quantification has proven to be useful for atrial and ventricle volume quantification [27, 28, 29]. Furthermore, attempts are being made to use it for calcium scoring. In our preliminary study we presented a direct calcium scoring method that uses 2-D ConvNet regression [30, 31]. The method performs direct calcium scoring in extracted image slices from bounding boxes cropped around the heart. In a recently proposed method, Cano-Espinosa et al. used a 3-D regression ConvNet for direct calcium scoring in downsampled CT volumes also cropped around the heart. However, their method could not be used in 14% of the scans, because heart localization failed. Furthermore, previously proposed automatic calcium scoring methods are dedicated to either cardiac CT or chest CT. These methods required retraining for application in other types of CT [8, 32].

We present an automatic method that performs real-time direct calcium scoring in different types of non-contrast-enhanced CT. Unlike previous methods that focused on a single type of CT, the proposed method is able to perform calcium scoring directly in multiple types of CT by using an unsupervised deep learning atlas-registration method to align their fields of view (FOVs). For this we employ two ConvNets: one for atlas-registration and one for calcium scoring, as shown in Figure 2. The atlas-registration ConvNet makes the FOV of input CT images alike using Deep Learning Image Registration (DLIR) [33, 34] further developed to facilitate atlas-registration. Subsequently, a calcium scoring ConvNet predicts the calcium score in image slices mimicking clinical calcium scoring with the Agatston score. When desired, decision feedback can be queried for every slice with a predicted calcium score. For this purpose, a visual attention heatmap accurately reveals the regions that contributed to the calcium score. The method provides robust and accurate predictions of calcium scores and it is computationally efficient, obtaining an Agatston score in less than 0.3 s in cardiac and chest CT.

Ii Data

This study included two datasets used in previous studies that presented automatic coronary calcium scoring in cardiac CT [19] and in chest CT [24]. To allow a direct comparison of methods, the original training, validation, and test set distributions were used.

Ii-a Cardiac CT

The set of 903 cardiac CT scans (age range: 18 to 88 years, 31% women) originates from a set of routinely acquired scans for clinical calcium scoring of the University Medical Center Utrecht, Utrecht, The Netherlands. The need for informed consent was waived by the local Medical Research Ethics Committee. Scans were acquired with a 256-detector row Philips Brilliance iCT scanner (tube voltage 120 kVp, tube current 55 mAs) during a single breath-hold, with ECG-triggering and without contrast enhancement. The images were reconstructed to 3 mm slice thickness and slice increment with in-plane resolution ranging from 0.29 mm to 0.49 mm, depending on patient size. The dataset was divided into 237 scans for training, 136 scans for validation, and 530 scans were in the hold-out test set only used for final evaluation.

Ii-B Chest CT

The set of 1,687 chest CT scans (age range: 43 to 74 years, 39% women) originates from a set of 6,000 available baseline scans from the National Lung Screening Trial (NLST) [11]. All scans were acquired during inspiratory breath-hold without contrast enhancement. Scans were acquired in 31 different hospitals with 120 or 140 kVp tube voltage and 30-160 mAs tube current. Axial images slices were reconstructed with varying kernels, varying slice thickness (1.00-3.00 mm), varying slice increments (0.63-3.00 mm), and with varying in-plane resolutions (0.49-0.98 mm per voxel). In our study, scans with less than 100 slices or slices thicker than 3.00 mm were not considered, because they were not adequate for calcium scoring. Furthermore, the scans were resampled to 3.00 mm slice thickness and 1.50 mm slice increment to make the scans suitable for calcium scoring [35]. The dataset was divided into 1,012 scans for training, 169 scans for validation, and 506 scans were in the hold-out test set only used for final evaluation.

Cardiac CT Training 120 14 33 29 41
Validation 68 14 28 15 11
Test 260 49 89 70 62
Chest CT Training 272 76 207 205 252
Validation 39 14 46 30 40
Test 128 42 99 112 125
TABLE I: Number of scans per CVD risk category for training, validation, and test sets. CVD risk categorization is based on the total Agatston score per scan: I: very low , II: low , III: moderate , IV: moderately high , V: high

Ii-C Reference standard

The reference standard was defined by experts who manually identified CAC lesions in the scans. CAC lesions were segmented following a standard procedure: region growing was used to select 26-connected voxels 130 HU. In the chest CTs with low radiation dose this procedure could lead to faulty segmentations (i.e. leakage) because of excessive noise. In such cases annotations were manually corrected by voxel painting [24]. Agatston scores were calculated in each axial slice for training. Total Agatston scores for each scan were calculated for final evaluation. Additionally, each subject was assigned to one of five CVD risk categories [13] based on the Agatston score: very low: 1; low: [1, 10), moderate: [10, 100), high: [100, 400), very high: 400. Table I provides an overview of the number of scans per risk category per dataset.

Iii Methods

The method employs two ConvNets in sequence (Figure 2). The first ConvNet registers input CTs to an cardiac CT atlas-image. The second ConvNet performs calcium scoring. When desired, visual feedback can be queried for image slices with a score. For this purpose an attention heatmap reveals the regions that contributed to the calcium score.

Fig. 2: Schematics of the proposed method. Input CTs of varying FOV are first aligned using an atlas-registration ConvNet. Subsequently, a calcium scoring ConvNet is used for direct calcium scoring in image slices. Finally, decision feedback can be visualized when desired.

Iii-a Atlas-registration strategy

An atlas-registration ConvNet ensures that all input images have a similar FOV and resemble a cardiac CT. The ConvNet is trained with a modified version of our framework for Deep Learning Image Registration (DLIR) [33]

. The DLIR framework uses an end-to-end unsupervised approach that trains a ConvNet for image registration. Similar to a conventional intensity-based image registration framework it exploits optimization of an image similarity metric. Figure 


shows the schematics of training an atlas-registration ConvNet using the atlas image as a static fixed image. The task of the ConvNet is to analyze moving images and predict the transformation parameters that warp the moving images to the atlas-image. Image similarity between the atlas and the warped image, is used for backpropagation during training. By optimizing image similarity (e.g. minimizing negative cross correlation) with gradient descent, the atlas-registration ConvNet learns the registration task in an unsupervised manner. After training, the ConvNet can register unseen moving images in one shot.

A cardiac atlas-image is created using an iterative inter-subject registration strategy [36]. With this strategy an initial atlas image is made by averaging multiple images. The atlas image is iteratively refined by registering the individual images to the atlas. Subsequently, the final atlas image is used to train the atlas-registration ConvNets for cardiac and chest CT alignment used for subsequent calcium scoring.

Fig. 3:

DLIR framework used to train a registration ConvNet. During a forward pass (indicated by the thick blue arrow) the registration ConvNet analyzes moving images and outputs transformation parameters. The transformation parameters are used by the interpolator to warp the moving image. During a backward pass (indicated by the thick red arrow) an image similarity loss (i.e. dissimilarity) is determined between the warped image and a fixed template image, and the resulting loss is backpropagated trough the ConvNet. The ConvNet is trained in multiple iterations of forward and backward passes, with mini-batch stochastic gradient descent. Once the ConvNet has been trained for registration it can take a moving image as its input and it can output registration parameters in one pass, thus non-iteratively.

Iii-B Atlas-registration ConvNet training

For registration we propose a global 3-D rigid registration model with six degrees of freedom (shown in Figure 

4). The model allows translations in any direction, but rotations are restricted to the axial () axis. Furthermore, scaling in the axial plane is isotropic and independent from scaling along the axial axis. These restrictions preserve the relation of reference Agatston scores that are defined on the original (unregistered) axial image slices. This facilitates training of the subsequent calcium scoring ConvNet.

Fig. 4: Rigid transformation model used for to train the registration ConvNet. The six degrees of freedom allow translation in any direction, rotation around the axial axis, and uniform scaling in the axial plane independent from scaling along the axial direction. By constraining the registration to the proposed transformation model, we can trivially exploit the model parameters for selection and warping of axial slices that are presented to the calcium scoring ConvNet.

We use a computationally efficient ConvNet architecture that is listed in Table II. For fast analysis, images are downsampled close to 3 mm isotropic voxel dimensions; i.e. 661 downsampling for cardiac CT, and 662 downsampling for chest CT using average pooling. The ConvNet has three alternating layers of 333 convolutions and 222 average pooling and those are followed by two layers of 333 convolution. To facilitate a fixed output, global average pooling is applied before connection with two fully connected layers. The final output layer has six nodes, one for each transformation parameter. Throughout the network exponential linear units are used for activation, except in the output nodes. Three output nodes are unconstrained translation parameters (, , ), the rotation parameter () is constrained with a hyperbolic tangent between and , and the two scaling parameters (, ) are constrained with a hyperbolic tangent between and scaling factors. These output parameters are used to constitute the following 3-D transformation matrix:

Iii-C Atlas-registration ConvNet inference

We train an atlas-registration ConvNet for 3-D registration, but we use it for slice selection and 2-D warping. As a consequence, correspondence is guaranteed between warped axial slices and the per-slice calcium scores. Axial image slices are extracted from the original image from to , where is the depth of the atlas image along the axial axis. These slices are resampled using bi-linear interpolation to a 256256 grid with the following 2-D transformation matrix:

Atlas-Registration ConvNet Calcium Scoring ConvNet
512512N 3-D input 256256 2-D input
66{1,2} Avg. Pooling 224224 cropping
32*333 Convolutions 32*33 Convolutions
32*222 Avg. Pooling 32*2

2 Max Pooling

32*333 Convolutions 32*33 Convolutions
32*222 Avg. Pooling 32*22 Max Pooling
32*333 Convolutions 32*33 Convolutions
32*222 Avg. Pooling 32*22 Max Pooling
32*333 Convolutions 32*33 Convolutions
32*333 Convolutions 32*22 Max Pooling
Global Avg. Pooling 32*33 Convolutions
32*22 Max Pooling
32*33 Convolutions
32*22 Max Pooling
64 Fully Connected Nodes 64 Fully Connected Nodes
64 Fully Connected Nodes 64 Fully Connected Nodes
66 Output Nodes 61 Output Node
TABLE II: Efficient ConvNet architectures were used for atlas-registration as well as calcium scoring.

Iii-D Calcium scoring ConvNet

The calcium scoring ConvNet employs direct regression to predict an Agatston score from input axial image slices. The choice of 2-D ConvNets, in favor of 3-D ConvNets, is based on the number of samples that are available for training. There are more image slices available than image volumes. Furthermore, 2-D image analysis mimics clinical calculation of the Agatston calcium score that is performed in 2-D axial slices:

where is a 2-D CAC lesion in a slice of a CT volume . is the area of the lesion. The weighted intensity is based on the maximum radio-density in HU of a 2-D lesion in the following manner: 1 = [130, 200), 2 = [200, 300), 3 = [300, 400), and 4 = 400. The Agatston score is corrected when image slices are overlapping, thus when slice increment is not equal to slice thickness  [37].

Agatston scores are dependent on the CAC lesion area. Given that input images have different voxels sizes, we chose to simplify the prediction task by determining a pseudo-Agatston score. This score is obtained by cancelling out the axial pixel dimensions, the slice increment, and the slice thickness of the original Agatston score. The resulting target is the product of the number of voxels in a lesion , the predicted slice scaling factor , and the weighted intensity :

The calcium scoring ConvNet uses an efficient architecture that is listed in Table II. It analyzes random image croppings of 224224 pixels during training and center croppings during application. It has alternating layers of 33 convolutions and

max pooling, followed by two fully connected layers, and an output layer of one node. Throughout the network batch normalization 

[38] is used and exponential linear units are used for activation[39]. The final output node has a linear output to facilitate continuous prediction. However, given that clinically used CVD risk categories are exponentially increasing, the task of the calcium scoring ConvNet was modified to learn a log-transform of the pseudo-Agatston score:

where is the predicted score, and is the reference pseudo-Agatston score. The log-transform induces relatively high penalties for erroneous low calcium score predictions, and relatively low penalties for erroneous high calcium score predictions. Consequently, higher precision is forced for lower calcium burden, which is favorable for CVD risk stratification. During application of the calcium scoring ConvNet, the predicted outputs are converted to the original Agatston scores.

Iii-E Decision feedback

By employing regression of calcium scores, we circumvent time-consuming intermediate segmentation. On the other hand, it may be desirable to visualize regions in image slice that contributed to the calcium score. Inspired by the study of Zeiler and Fergus [40], we provide such visualization by using a deConvNet. The deConvNet uses the same operations of filtering and pooling as a ConvNet, but in reverse order from output to input. The reverse operations map the activities back to the input pixel space, and it shows which input patterns originally contributed to the activations in the feature maps. To obtain a smooth visual attention heatmap, the deConvNet is applied until the third convolutional layer, by taking the absolute value per feature of this layer, and by summing these features along the feature map dimension to get 2-D matrix. Using third order interpolation we obtain a smooth map that can be superpositioned on the image slice as a heatmap. This resulting heatmap visualizes attention by highlighting the regions that contributed to the Agatston score.

Iv Evaluation

Automatically predicted per-subject Agatston scores were compared with manually determined reference scores. Evaluations were performed on the hold-out test sets which were not used during method development. Two-way mixed intra-class correlation coefficient (ICC) for absolute agreement was computed and Bland-Altman analysis was performed to evaluate bias between predicted and reference Agatston scores. In addition, for each subject, CVD risk category was determined based on the Agatston score as defined in section II-C. Agreement between predicted and reference CVD risk categories was determined using accuracy and Cohen’s linearly weighted kappa ().

V Experiments and results

In this section we evaluate the atlas-registration ConvNet, the calcium scoring ConvNet, and the quality of decision feedback. In addition, we will evaluate whether the calcium scoring ConvNet requires to be trained on all data, or whether it can be trained on one dataset and applied to the other. Finally, we will compare state-of-the-art automatic calcium scoring methods with the proposed method. All experiments were performed with Theano 

[41], Lasagne [42], and OpenCV [43] on an Intel Xeon E5-1620 3.60 GHz CPU with an NVIDIA Titan X GPU.

V-a Atlas-registration ConvNet

Fig. 5: Cross-sectional views of the generated atlas images and the average images that illustrate registration performance. An initial atlas image LABEL:sub@fig:atlas:atlas1 was made from 237 cardiac CTs that were aligned with their geometric centers. This atlas image was used to train an atlas-registration ConvNet to obtain a refined atlas image LABEL:sub@fig:atlas:atlas2. Finally, this refined atlas was used to train the atlas-registration ConvNets for cardiac and chest CT FOV alignment. To illustrate the performance of the resulting ConvNets we provide average images of the cardiac CT and chest CT test sets before registration in LABEL:sub@fig:atlas:cardiac1 and LABEL:sub@fig:atlas:chest1, and after registration LABEL:sub@fig:atlas:cardiac2 and LABEL:sub@fig:atlas:chest2. For each example we show from top to bottom center slices of axial, coronal, and sagittal views.

Figure (a)a shows the initial atlas image that was created by aligning all cardiac training images using their geometric centroids. We chose the median dimensions and voxels sizes of all the cardiac training images define the atlas image space. The atlas can be iteratively refined, but given the constraints of the global registration model used here, only one update was sufficient. The final atlas image, shown in Figure (d)d, was used to train the atlas-registration ConvNets for cardiac and chest CT alignment. Thus, in total three ConvNet instances were trained: one to create an atlas image, one for cardiac CT alignment, and one for chest CT alignment. All ConvNets were trained in 15,000 iterations with mini-batches containing 32 randomly selected images. Training took about 40 hours per ConvNet. Adam [44] was used with a learning rate of 0.001 for mini-batch gradient descent. To illustrate performance of the atlas-registration ConvNets, Figure 5 shows images before and after registration. Figure (b)b shows the average image of the 530 cardiac CT images from the test set before registration and Figure (e)e shows these images after registration. Similarly, Figure (c)c shows an average image of the 506 chest CTs before registration and Figure (f)f shows these after registration. Note the similarity of the registered image with the refined atlas image shown in Figure (d)d.

Quantitative evaluation of registration results revealed that registration erroneously cropped CAC out of the selected slices. Between one and four image slices containing CAC were not selected in three cardiac CTs and three chest CTs. Upon closer inspection, two of the chest CTs had calcifications in the aortic arch and descending aorta incorrectly labeled as CAC in the reference, thereby affecting CVD risk categorization. Nevertheless, these annotations were left uncorrected in further analysis to facilitate a fair comparison with previously developed methods. The registration errors did not have an adverse effect on CVD risk categorization in the other cases.

V-B Calcium scoring ConvNet

The calcium scoring ConvNet was trained in 150,000 iterations using Adam [44]. Training took 21 hours with 100 image slices per mini-batch randomly selected from the registered image slices taken from the cardiac and chest CT training sets. High imbalance between the minority of slices with a calcium score and the majority of slices with zero calcium score prevented convergence during ConvNet training. To ensure convergence, the amount of image slices with CAC (Agatston score ) and without CAC (Agatston score ) were balanced during training. To prevent bias, training continued on the full imbalanced training set after 10,000 iterations. Additionally, we ensured stable convergence by decreasing the learning rate to 10% of its previous value every 50,000 iterations.

(a) Cardiac CT
(b) Chest CT
Fig. 6: Bland-Altman plots showing agreement between predicted and reference per subject Agatston scores in the cardiac CT LABEL:sub@fig:blandaltman:cardiac and chest CT LABEL:sub@fig:blandaltman:chest datasets. Limits of agreement are 1.96 SD, the positive biases in both datasets are mainly caused by overestimations of the higher Agatston scores.


I 259 0 1 0 0
II 9 36 4 0 0
III 2 3 82 2 0
IV 0 1 2 65 2
V 0 0 0 11 51
Cardiac CT


I 118 6 4 0 0
II 8 29 5 0 0
III 3 8 85 3 0
IV 1 1 7 99 4
V 0 0 0 3 122
Chest CT
TABLE III: Confusion matrices showing agreement in CVD risk categorization based on the total Agatston scores: I: very low , II: low , III: moderate , IV: moderately high , V: high . The method is evaluated separately on the test sets of cardiac CTs (left) and chest CTs (right). The corresponding linearly weighted is shown below the confusion matrices.

After training, the test sets were used to evaluate the calcium scoring ConvNet. Per-subject scores show high intraclass correlation coefficients (ICC); the ICC for cardiac CT and chest CT were both 0.98 with 95% confidence intervals of 0.98 to 0.99. Slight positive bias in cardiac and chest CT is visualized with the Bland-Altman plots shown in Figure 

6. This was mainly caused by overestimations of the higher Agatston scores. However, this was not noticeable in CVD risk stratification. Table III shows confusion matrices of predicted risk categories vs. the manual reference standard. In cardiac CT calcium scoring only four scans were two categories off, and in chest CT calcium scoring eight scans were two categories off. The scan that was three categories off was a scan with incorrectly annotated aorta calcium, as discussed in the previous section. Nonetheless, overall agreement was almost perfect [45] with Cohen’s linearly weighted s of 0.95 in cardiac CT and 0.93 in chest CT. Accuracy in CVD risk categorization was 0.93 for cardiac CT and 0.90 for chest CT. Because efficient network architectures are used, the method is able to achieve high speed when used on a single CPU core: within 5 s a score for cardiac CT is obtained and within 11 s a score for chest CT is obtained. When using a GPU, calcium scoring can be performed in real-time. Including image registration and image resampling, a calcium score for cardiac CT is obtained in less than 0.15 s and for chest CT in less than 0.30 s.

V-C Decision feedback

Decision feedback visualizes attention of the calcium scoring ConvNet. This feedback informs and end-user about the regions that contributed to the calcium score. Figure 7 shows examples of such feedback. The feedback helps an expert to quickly navigate and evaluate the image slices containing CAC.

(a) Predicted: 12 – Reference: 12
(b) Predicted: 124 – Reference: 124
(c) Predicted: 383 – Reference: 385
(d) Predicted: 229 – Reference: 230
(e) Predicted: 1,021 – Reference: 1,013
(f) Predicted: 436 – Reference: 437
Fig. 7: Examples of application of decision feedback in an application of cardiac CT (left) and chest CT (right). In each example, the center sagittal slice is shown on the left with the predicted per-slice Agatston scores plotted above it. Image slices selected for further evaluation by the registration ConvNet are indicated by the solid red lines. The axial slice having the highest Agatston score is indicated with the dashed white line and is shown in the middle. The right image shows the registered slice with the resulting decision feedback superpositioned as a heatmap. In both cardiac and chest CT, decision feedback shows that the method correctly focuses on large and small calcifications in the left coronary arteries, as shown in LABEL:sub@fig:attention:cardiac1 and LABEL:sub@fig:attention:chest1. Note that it is not fooled by other calcifications. Also, right coronary arteries are correctly identified, as shown in LABEL:sub@fig:attention:cardiac2 and LABEL:sub@fig:attention:chest2. Even scans having extensive calcifications the method focuses correctly different locations of CAC as is shown in LABEL:sub@fig:attention:cardiac3 and LABEL:sub@fig:attention:chest3.

We propose visual feedback as an optional qualitative tool, but we have performed a quantitative analysis to provide insight in its accuracy. To obtain quantitative results we analyzed heatmaps for slices with predicted calcium scores. The heatmaps were warped to the original image spaces by using the inverse transformation matrices. The values of the heatmaps were scaled between 0 and 1 to mimic probability maps for CAC candidate voxels. CAC candidates were defined as high density 26-connected voxels with a volume between 1.5 and 1,500 mm3[19]. For evaluation of these maps we performed precision-recall analysis (Figure 8). We have defined an optimal threshold by selecting the maximum F1 (i.e. Dice) score on the validation set. Table IV shows the obtained scores using the selected threshold on the test sets. The results show that detection performance is very accurate on the validation set as well as the test set.

Fig. 8: Precision recall curve of CAC segmentation using the obtained visual feedback heatmaps. The analysis is performed on the validation set to obtain an optimal threshold for evaluation. Optimal F1 score was 0.81 at a threshold of 0.27. Final results for quantitative evaluation of visualization feedback are shown in Table IV.

Additionally, decision feedback aided our analysis by clarifying incorrect calcium scores. Decision feedback revealed that the largest CVD miscategorizations were not caused by incorrect quantification but by incorrect recognition of CAC. Figure 9 shows six examples of the largest miscategorizations made by the calcium scoring ConvNet. The majority of errors were made in identification of calcifications near the coronary artery ostia. Calcifications near the ostia can be partly in the aorta and partly in the coronary artery. These calcifications are difficult to distinguish, especially when no information of neighboring slices is available.

Cardiac CT Chest CT
Precision 0.77 0.78
Recall 0.85 0.86
Accuracy 0.99 0.99
F1 (Dice) score 0.81 0.82
TABLE IV: Quantitative evaluation of visual feedback. Evaluation was performed segmenting CAC lesions with the visualization feedback. An optimal threshold was selected using precision recall analysis on the validation data shown in Figure 8. Final results show that visualization by the heatmap is is as accurate on the validation as on the test set.

(a) 9/14 – 0/0

(b) 0/6 – 114/259

(c) 0/0 – 14/14

(d) 11/12 – 0/0

(e) 10/21 – 0/0

(f) 5/13 – 0/0
Fig. 9: Examples of the largest errors, in terms of CVD risk categorization, made in cardiac CT (a-c) and in chest CT (d-f). Each image shows the axial slice most illustrative for the error. For image slices with a predicted calcium score, the heatmap is also provided. The captions show predicted slice calcium score / predicted total calcium scorereference slice calcium score / reference total calcium score. In LABEL:sub@fig:incorrect:cardiac1 a pacemaker lead, affected by a motion artifact, was incorrectly quantified as CAC. CAC near the coronary ostia was not quantified in LABEL:sub@fig:incorrect:cardiac2–having an incorrect reference annotation–and in LABEL:sub@fig:incorrect:cardiac3. In LABEL:sub@fig:incorrect:chest1 infrequently occuring calcification of the pericardium was quantified as CAC. In LABEL:sub@fig:incorrect:chest2 and LABEL:sub@fig:incorrect:chest3 calcifications near the coronary ostia were incorrectly quantified as CAC.

V-D Influence of training data and registration

For clinical application it would be useful to investigate whether the method needs training data from both datasets or if data from one set would suffice, and we investigated the influence of atlas-registration is required. Thus, we performed experiments using different combinations of training data with and without atlas-registration, as listed in Table V. The calcium scoring ConvNets were trained with either cardiac CT images, chest CT images, or a combination thereof. To balance cardiac and chest CT data, a subset of chest CT images was created by taking images from 237 randomly selected subjects and by removing every other slice in the chest CT images. Additionally, the histograms shown in Figure 10 provide insight in the distribution of calcium amount in the training data. Note that the chest CT subset has a very similar distribution compared to the cardiac CT training set.

Fig. 10: Histograms of per slice Agatston scores of the registered training datasets. Note that Agatston scores shown here are not corrected by factor . Please see Section III-D for application of this correction factor in the Agatston score.

The best performance was achieved using atlas-registration with a calcium scoring ConvNet trained on all cardiac and chest CT images. Lower scores are found when a calcium scoring ConvNet is only trained with cardiac CT or the subset of chest CTs. However, combining the two datasets increased the scores notably, giving a performance close to the ConvNet trained with all images. Furthermore, the results show that atlas-registration facilitated training on one type of data and high performance on the other: the ConvNet trained with the full set of chest CTs achieved a high performance on the cardiac CT test images that was very close to the best results.

Evaluated on:
Cardiac CT Chest CT
Data CTs Slices Fraction CAC Acc. ICC Acc. ICC

Trained on:

Non-Registered Cardiac CT 237 10,468 10.4% 0.92 0.89 0.89 0.46 0.41 0.24
Chest CT 1,012 211,353 06.6% 0.48 0.59 0.24 0.91 0.86 0.93
Cardiac + Chest CT 1,239 221,821 06.7% 0.90 0.86 0.87 0.92 0.88 0.94
Registered Cardiac CT 237 10,016 10.9% 0.92 0.88 0.97 0.86 0.79 0.90
Chest CT subset 237 11,716 14.8% 0.91 0.86 0.95 0.90 0.85 0.93
Cardiac + Chest CT subset 574 21,732 13.0% 0.94 0.92 0.99 0.91 0.88 0.97
Chest CT 1,012 100,379 13.8% 0.94 0.91 0.98 0.93 0.89 0.98
Cardiac + Chest CT 1,239 110,395 13.5% 0.95 0.93 0.98 0.93 0.90 0.98
TABLE V: Comparison of direct calcium scoring experiments using various datasets with and without atlas-registration. The number of CTs in the training set as well as the number of slices are given for each experiment. Additionally we provide the fraction of slices having a calcium score . The results indicate that a calcium scoring ConvNet can be trained on one type of data and evaluated on another.

V-E Comparison with other methods

Table VI shows a comparison with other state-of-the-art calcium scoring methods by Wolterink et al. [19] and Lessmann et al. [24] using the same datasets. The proposed method achieves similar performance compared to these methods, but it is hundreds of times faster. Even when ran on a single core of a CPU, the method achieves high speed. Additionally, we listed results from other direct calcium scoring methods by González et al. [15] and Cano-Espinosa et al. [46] using chest CT data from the COPDGene study [47]. We provide similar performance metrics to give an indication, but please note that a direct comparison between these methods and ours was not possible.

Data Correlation A B C Execution time
Source Number ICC acc. acc. acc. CPU GPU
Cardiac CT
Wolterink et al.[19] UMCU 530 0.96 0.95 0.91 20 min
Proposed method UMCU 530 0.97 0.99 0.95 0.93 0.95 0.96 0.94 0.93 5 s 0.15 s
Chest CT
Cano-Espinosa et al. [46] COPDGene 1,000 0.93 0.80 0.76
Lessmann et al. [24] NLST 506 0.91 0.91 7 min
Proposed method NLST 506 0.98 0.97 0.93 0.90 0.92 0.91 0.93 0.90 11 s 0.30 s
TABLE VI: Results of state-of-the-art automatic calcium scoring methods in cardiac CT and chest CT, direct calcium scoring methods in chest CT, and our proposed method. For each of the studies the number of scans are given that were used for evaluation. To allow better comparison, similar statistics are reported as described in the other studies: ICC quantifies agreement between automatic scores and manual reference scores, and correlation is reported with Pearson’s . Additionally, linearly weighted and accuracy are reported for three different stratifications into risk categories: (A) five categories as used in [17] and [19] {, , , , }; (B) four categories as used in [24] {, , , }; and (C) five categories as used in [15] and [46] {, , , }. Methods evaluated on identical datasets can be compared directly. The methods by Wolterink et al. [19], Lessmann et al. [24], and our proposed method were evaluated on systems with an Intel Xeon E5-1620 CPU, 32 GB of internal memory, and an NVIDIA Titan X GPU.

V-F Performance on orCaScore data

We evaluated our method on data from the orCaScore challenge [25]. This challenge provides data to evaluate a method for coronary calcium scoring. The data consists of non-contrast enhanced ECG-triggered cardiac CT acquired on CT scanners from four different vendors from four different hospitals. Training data is provided, but we evaluated our method on the test set of 40 patients without retraining. Table VII

shows the obtained confusion matrix and lists the results of dedicated cardiac CT calcium scoring methods that competed in the challenge. Given that our method does not differentiate between location of CAC, we only provide total calcium scoring results.



I 8 0 0 0
II 0 12 0 0
III 0 0 8 0
IV 0 0 1 11
Method Acc. ICC
A[18] 0.88 0.85 0.97
B[25] 0.98 0.98 0.99
C[25] 0.96 0.95 0.98
D[25] 0.80 0.80 0.60
E[19] 1.00 1.00 0.99
Ours 0.98 0.98 0.98
TABLE VII: Results of the proposed method on orCaScore challenge data. Left: The confusion matrix shows agreement in CVD risk categorization based on the total Agatston scores: I: , II: , III: , IV:. The corresponding linearly weighted is shown below the confusion matrix. Right: Comparison with other methods evaluated in the challenge [25].

V-G Per-artery calcium scores

Routine coronary artery calcium scoring is typically performed per artery. Currently, only total coronary calcium scores are reported and used for CVD risk prediction. For research purposes, per-artery calcium scores might provide interesting additional information. Hence, we evaluated performance of the proposed method for per-artery calcium scoring, i.e. scoring in the the LAD, LCX, and RCA. We chose to combine CAC scores in the LM and LAD, since it is difficult, if not impossible, to differentiate them in chest CT scans. The direct scoring ConvNet was adapted by changing the number of output nodes from one to to three. Similar to the experiment described in Section V-B, training started with a balanced set of image slices with and without calcium scores for the first 10,000 iterations and continued with the full set of image slices thereafter. Additionally, each mini-batch had at least three image slices containing each type of arterial calcification. Risk categories are clinically not defined for per-artery calcium scores, but they are obtained for total calcium scores by summation of per-artery scores. The results are listed in Table VIII.

Per-artery ICC Total scores
Cardiac CT 0.93 0.88 0.97 0.94 0.91 0.97
Chest CT 0.91 0.80 0.98 0.92 0.88 0.96
TABLE VIII: Intraclass correlation coefficient (ICC) for per-artery calcium scores. Since CVD risk categories are not defined for per-artery scores, CVD risk categorization was evaluated with linearly weighted and accuracy (Acc.) on the total calcium scores obtained by summation.

Vi Discussion

We have presented a method for automatic coronary calcium scoring in cardiac CT and chest CT. The method uses an atlas-registration ConvNet to align FOVs making input images alike. The atlas-registration ConvNet is trained for 3-D registration, but its rigid model is constrained to enable 2-D slice selection and 2-D image warping. Selected and warped input image slices are presented to a calcium scoring ConvNet that directly predicts the Agatston score in these slices. The method circumvents time-consuming CAC segmentation. To provide decision feedback, a visual attention heatmap can be generated that shows the regions in an image contributing to the calcium score. The method achieves excellent agreement for calcium score prediction for CVD risk categorization compared to manual calcium scoring. The method achieves similar performance compared to state-of-the-art methods, but achieves it hundreds of times faster.

In preliminary experiments we found that only a small ConvNet architecture was able to learn direct calcium scoring. Large ConvNet architectures architectures were unstable and failed to converge during training. By limiting the degrees of freedom of a ConvNet, i.e. by using a small architecture, we were able to train a ConvNet that learned to differentiate coronary calcification from other types of calcification e.g. aorta calcification, pericardium calcification, and heart valve calcification.

To simplify the problem we extracted bounding boxes around the heart in our preliminary work [30, 31]. However, this was a supervised method that classified presence of the heart in image slices. In case of noisy images, consecutive image slices could have discontinuous predictions. Discontinuous predictions resulted in an incorrect bounding box extracting a partial heart. For atlas-registration used in our current work this is not an issue.

The atlas-registration ConvNets were highly successful in pre-alignment of input CTs, i.e. in slice selection and image warping. Only 4 out of 1,036 test images had slices containing CAC that were missed by erroneous slice selection. Erroneous slice selection was likely caused by incorrect focus of the atlas-registration ConvNet on high contrast areas like the diaphragm. A mask drawn around the heart might steer focus of the ConvNet and might increase registration performance. Alternatively, a simple adjustment could be made by padding slice selection with some slices. Nonetheless, the errors caused by registration had negligible impact on calcium scoring and did not affect CVD risk categorization. Calcium scoring is better with atlas-registration than without it. Moreover, registration allows training and application of direct calcium scoring on datasets with different FOVs.

In general accuracy of predicted Agatston scores was high. Although Bland-Altman analysis showed that the method underestimated subjects with high Agatston scores. In fact, this was by design, because the method estimates a log transformed Agatston score, which induces relatively low precision for higher scores, and high precision for lower scores. Because the clinically used CVD risk categories are based on exponentially increasing Agatston scores, it is obviously more important to differentiate between subjects at low to moderate risk, than to differentiate between subjects at high risk. Thus, we imposed this higher precision on lower Agatston scores. Still, the largest CVD miscategorizations were found in the lower risk categories. Miscategorization was predominantly caused by incorrect identification of CAC and aortic calcifications near the coronary artery ostia. Even manual classification of these calcifications can be very difficult when they spread from the aorta through an ostium into the coronary artery. It often involves inspecting multiple adjacent slices in 3-D. Thus, performance of the method might be improve by exploiting additional 3-D information in future work. Additionaly, performance might improve by increasing input image resolution. The current resolution was chosen based on the majority of chest CT images, being roughly half the resolution of cardiac CTs. Nevertheless, even though all cardiac CTs were downsampled a high performance was obtained in these CTs.

The proposed method shows near perfect agreement in CVD risk categorization compared to manual calcium scoring, even when trained with a relatively low number of scans from a single dataset. Interestingly, training on one type of data allowed the model to be applied to the other type of data without any modifications or transfer learning. However, we found that a model trained on only chest CT led to better results than a model trained only on cardiac CT. One potential reason for this may be the distribution of CAC in the datasets: the population of ex-heavy smokers typically have more CAC 

[9] than the population undergoing calcium scoring cardiac CT. However, Figure 10 shows that the distribution of CAC in equally sized datasets of cardiac CT and chest CT is similar. An alternative reason could be the presence of motion artifacts, which are nearly absent in ECG-synchronized cardiac CT, but abundant in non-ECG-synchronized chest CT. Therefore, a model trained on chest CT may be more robust to such artifacts. While our experiments indicated that a ConvNet trained on the cardiac and chest CT datasets supplement each other, a calcium scoring ConvNet trained with only chest CTs almost matched performance of the best performing ConvNet. Additionally, we have shown that the method obtained near perfect CVD risk categorization results on cardiac CTs from the orCaScore challenge. The method did not require retraining on representative data from the different hospitals and vendors. Having a single system that can handle potentially any CT scan that visualizes the heart would be very practical in a routine radiology setting. In future work we will investigate whether the method could be readily applied on other types CTs, without requiring retraining or fine-tuning.

Additionally, we have shown that the method can provide per-artery calcium scores. While this is not required for CVD risk categorization, it might be interesting for clinical research. In terms of ICC [48], per-artery calcium scoring achieved good reliability () in the LCX, and excellent reliability () in the LAD and the RCA. In addition, determination of CVD risk using combined per-artery scores led to almost perfect agreement ([45]. Nevertheless, performance was slightly better when total calcium was directly determined. This difference in performance may be a consequence of increased complexity of the per-artery scoring task while using the same number of samples for training.

The proposed method can achieve a calcium score hundreds of times faster than previously proposed methods. This is mainly due to one-shot (i.e. non-iterative) registration, and direct quantification using regression. The direct calcium scoring method circumvents time-consuming intermediate segmentation. The method might also be suitable for e.g. determination of volume, (pseudo-)mass, or number of CAC; and for quantification of other lesions or diverse anatomical structures. However, the benefit of using a segmentation approach over direct scoring is that it provides immediate insight to the end-user. We mitigate this shortcoming of direct scoring, by providing decision feedback with a visual attention heatmap. In this way valuable feedback is still provided whenever an end-user requires it.

Vii Conclusion

We have presented an automatic method for direct calcium scoring in cardiac CT and chest CT. The method employs two ConvNets, one for atlas-registration to align the FOV of input images to an atlas image made from cardiac CTs and one for direct calcium scoring of input image slices using regression. The method achieves robust and accurate predictions of calcium scores in real-time. By providing visual feedback, insight is given in the decision process, making it readily implementable in a clinical and research settings.


  • [1] GBD 2015 Mortality and Causes of Death Collaborators, “Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980-2015: a systematic analysis for the global burden of disease study 2015,” Lancet, vol. 388, no. 10053, pp. 1459–1544, Oct 2016.
  • [2] World Health Organization, “Cardiovascular diseases (CVDs) [fact sheet].”
  • [3] J. Yeboah, R. McClelland, T. Polonsky, and et al., “Comparison of novel risk markers for improvement in cardiovascular risk assessment in intermediate-risk individuals,” JAMA, vol. 308, no. 8, pp. 788–795, 2012.
  • [4] H. S. Hecht, “Coronary artery calcium scanning: Past, present, and future,” JACC: Cardiovascular Imaging, vol. 8, no. 5, pp. 579 – 596, 2015.
  • [5] H. S. Hecht, P. Cronin, M. J. Blaha, M. J. Budoff, E. A. Kazerooni, J. Narula, D. Yankelevitz, and S. Abbara, “2016 scct/str guidelines for coronary artery calcium scoring of noncontrast noncardiac chest ct scans: A report of the society of cardiovascular computed tomography and society of thoracic radiology,” Journal of Thoracic Imaging, vol. 32, no. 5, p. W54–W66, 2017.
  • [6] A. J. Einstein, L. L. Johnson, S. Bokhari, J. Son, R. C. Thompson, T. M. Bateman, S. W. Hayes, and D. S. Berman, “Agreement of visual estimation of coronary artery calcium from low-dose ct attenuation correction scans in hybrid PET/CT and SPECT/CT with standard agatston score,” Journal of the American College of Cardiology, vol. 56, no. 23, pp. 1914–1921, Nov 2010.
  • [7] I. Mylonas, M. Kazmi, L. Fuller, R. A. deKemp, Y. Yam, L. Chen, R. S. Beanlands, and B. J. W. Chow, “Measuring coronary artery calcification using positron emission tomography-computed tomography attenuation correction images,” European Heart Journal Cardiovascular Imaging, vol. 13, no. 9, pp. 786–792, Sep 2012.
  • [8] S. A. M. Gernaat, I. Išgum, B. D. de Vos, R. A. P. Takx, D. A. Young-Afat, N. Rijnberg, D. E. Grobbee, Y. van der Graaf, P. A. de Jong, T. Leiner, and et al., “Automatic coronary artery calcium scoring on radiotherapy planning CT scans of breast cancer patients: Reproducibility and association with traditional cardiovascular risk factors,” PLOS ONE, vol. 11, no. 12, p. e0167925, Dec 2016.
  • [9] P. C. Jacobs, M. Prokop, Y. van der Graaf, M. J. Gondrie, K. J. Janssen, H. J. de Koning, I. Išgum, R. J. van Klaveren, M. Oudkerk, B. van Ginneken, and W. P. Mali, “Comparing coronary artery calcium and thoracic aorta calcium for prediction of all-cause mortality and cardiovascular events on low-dose non-gated computed tomography in a high-risk population of heavy smokers.” Atherosclerosis, vol. 209, no. 2, pp. 455–462, 2010.
  • [10] C. Chiles, F. Duan, G. W. Gladish, J. G. Ravenel, S. G. Baginski, B. S. Snyder, S. DeMello, S. S. Desjardins, R. F. Munden, and NLST Study Team, “Association of coronary artery calcification and mortality in the national lung screening trial: A comparison of three scoring methods,” Radiology, vol. 276, no. 1, pp. 82–90, 2015.
  • [11] The National Lung Screening Trial Research Team, “Reduced lung-cancer mortality with low-dose computed tomographic screening,” New England Journal of Medicine, vol. 365, no. 5, pp. 395–409, 2011.
  • [12] A. S. Agatston, W. R. Janowitz, F. J. Hildner, N. R. Zusmer, M. Viamonte, and R. Detrano, “Quantification of coronary artery calcium using ultrafast computed tomography,” Journal of the American College of Cardiology, vol. 15, no. 4, pp. 827–832, 1990.
  • [13] J. A. Rumberger, B. H. Brundage, D. J. Rader, and G. Kondos, “Electron beam computed tomographic coronary calcium scanning: A review and guidelines for use in asymptomatic persons,” Mayo Clinic Proceedings, vol. 74, no. 3, pp. 243–252, Mar 1999.
  • [14] J. Shemesh, C. I. Henschke, D. Shaham, R. Yip, A. O. Farooqi, M. D. Cham, D. I. McCauley, M. Chen, J. P. Smith, D. M. Libby, and et al., “Ordinal scoring of coronary artery calcifications on low-dose CT scans of the chest is predictive of death from cardiovascular disease,” Radiology, vol. 257, no. 2, pp. 541–548, Nov 2010.
  • [15] G. González, G. R. Washko, and R. S. J. Estépar, “Automated agatston score computation in a large dataset of non ECG-gated chest computed tomography,” in 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI), Apr 2016, pp. 53–57.
  • [16] Y. Xie, S. Liu, A. Miller, J. A. Miller, S. Markowitz, A. Akhund, and A. P. Reeves, “Coronary artery calcification identification and labeling in low-dose chest CT images,” in Proceedings of SPIE, vol. 10134, 2017, pp. 10 134 – 10 134 – 8.
  • [17] I. Išgum, M. Prokop, M. Niemeijer, M. A. Viergever, and B. van Ginneken, “Automatic coronary calcium scoring in low-dose chest computed tomography,” IEEE Transactions on Medical Imaging, vol. 31, no. 12, pp. 2322–2334, 2012.
  • [18] R. Shahzad, T. van Walsum, M. Schaap, A. Rossi, S. Klein, A. C. Weustink, P. J. de Feyter, L. J. van Vliet, and W. J. Niessen, “Vessel specific coronary artery calcium scoring: an automatic system,” Academic Radiology, vol. 20, no. 1, pp. 1–9, Jan 2013.
  • [19] J. M. Wolterink, T. Leiner, R. A. P. Takx, M. A. Viergever, and I. Išgum, “Automatic coronary calcium scoring in non-contrast-enhanced ECG-triggered cardiac CT with ambiguity detection,” IEEE Transactions on Medical Imaging, vol. 34, no. 9, pp. 1867–1878, Sep 2015.
  • [20] F. Durlak, M. Wels, C. Schwemmer, M. Sühling, S. Steidl, and A. Maier, Growing a Random Forest with Fuzzy Spatial Features for Fully Automatic Artery-Specific Coronary Calcium Scoring, ser. Lecture Notes in Computer Science.   Springer, Cham, Sep 2017, pp. 27–35.
  • [21] J. M. Wolterink, T. Leiner, M. A. Viergever, and I. Išgum, “Automatic coronary calcium scoring in cardiac CT angiography using convolutional neural networks,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. Frangi, Eds.   Cham: Springer International Publishing, 2015, pp. 589–596.
  • [22] J. M. Wolterink, T. Leiner, B. D. de Vos, R. W. van Hamersvelt, M. A. Viergever, and I. Išgum, “Automatic coronary artery calcium scoring in cardiac CT angiography using paired convolutional neural networks,” Medical Image Analysis, vol. 34, pp. 123–136, Dec 2016.
  • [23] N. Lessmann, I. Išgum, A. A. A. Setio, B. D. de Vos, F. Ciompi, P. A. de Jong, M. Oudkerk, W. P. T. M. Mali, M. A. Viergever, and B. van Ginneken, “Deep convolutional neural networks for automatic coronary calcium scoring in a screening study with low-dose chest CT,” in Proceedings of SPIE, G. D. Tourassi and S. G. Armato, Eds., vol. 9785, Mar 2016, p. 978511.
  • [24] N. Lessmann, B. v. Ginneken, M. Zreik, P. A. de Jong, B. D. de Vos, M. A. Viergever, and I. Išgum, “Automatic calcium scoring in low-dose chest CT using deep neural networks with dilated convolutions,” IEEE Transactions on Medical Imaging, vol. 37, no. 2, pp. 615–625, Feb 2018.
  • [25] J. M. Wolterink, T. Leiner, B. D. de Vos, J.-L. Coatrieux, B. M. Kelm, S. Kondo, R. A. Salgado, R. Shahzad, H. Shu, M. Snoeren, R. A. P. Takx, L. J. van Vliet, T. van Walsum, T. P. Willems, G. Yang, Y. Zheng, M. A. Viergever, and I. Išgum, “An evaluation of automatic coronary artery calcium scoring methods with cardiac ct using the orcascore framework,” Medical Physics, vol. 43, no. 5, pp. 2361–2373, 2016.
  • [26] B. D. de Vos, J. M. Wolterink, P. A. de Jong, T. Leiner, M. A. Viergever, and I. Išgum, “Convnet-based localization of anatomical structures in 3-d medical images,” IEEE Transactions on Medical Imaging, vol. 36, no. 7, pp. 1470–1481, July 2017.
  • [27] M. A. Hussain, A. Amir-Khalili, G. Hamarneh, and R. Abugharbieh, “Segmentation-free kidney localization and volume estimation using aggregated orthogonal decision cnns,” in Medical Image Computing and Computer-Assisted Intervention - MICCAI 2017, M. Descoteaux, L. Maier-Hein, A. Franz, P. Jannin, D. L. Collins, and S. Duchesne, Eds.   Cham: Springer International Publishing, 2017, pp. 612–620.
  • [28] X. Zhen, H. Zhang, A. Islam, M. Bhaduri, I. Chan, and S. Li, “Direct and simultaneous estimation of cardiac four chamber volumes by multioutput sparse regression,” Medical Image Analysis, vol. 36, pp. 184 – 196, 2017.
  • [29] W. Xue, G. Brahm, S. Pandey, S. Leung, and S. Li, “Full left ventricle quantification via deep multitask relationships learning,” Medical Image Analysis, vol. 43, pp. 54 – 65, 2018.
  • [30] B. D. de Vos, N. Lessmann, P. A. de Jong, M. A. Viergever, and I. Išgum, “Direct coronary artery calcium scoring in low-dose chest CT using deep learning analysis,” The Radiological Society of North America’s Annual Meeting, 2017.
  • [31] B. D. de Vos, N. Lessmann, P. A. de Jong, M. A. Viergever, and I. Išgum, “Direct and real-time cardiovascular risk prediction,” arXiv:1712.02982 [cs], 2017.
  • [32] I. Išgum, B. D. de Vos, J. M. Wolterink, D. Dey, D. S. Berman, M. Rubeaux, T. Leiner, and P. J. Slomka, “Automatic determination of cardiovascular risk by CT attenuation correction maps in Rb-82 PET/CT,” Journal of Nuclear Cardiology, Apr 2017.
  • [33] B. D. de Vos, F. F. Berendsen, M. A. Viergever, M. Staring, and I. Išgum, “End–to–end unsupervised deformable image registration with a convolutional neural network,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, September 14, Proceedings.   Cham: Springer International Publishing, 2017, pp. 204–212.
  • [34] B. D. de Vos, F. F. Berendsen, M. A. Viergever, H. Sokooti, M. Staring, and I. Išgum, “A deep learning framework for unsupervised affine and deformable image registration,” Medical Image Analysis, 2018.
  • [35] A. Rutten, I. Išgum, and M. Prokop, “Calcium scoring with prospectively ECG-triggered CT: using overlapping datasets generated with MPR decreases inter-scan variability,” European Journal of Radiology, vol. 80, no. 1, pp. 83–88, Oct 2011.
  • [36] C. Jongen, J. P. W. Pluim, P. J. Nederkoorn, M. A. Viergever, and W. J. Niessen, “Construction and evaluation of an average CT brain image for inter-subject registration,” Computers in Biology and Medicine, vol. 34, no. 8, pp. 647–662, Dec 2004.
  • [37] B. Ohnesorge, T. Flohr, R. Fischbach, A. Kopp, A. Knez, S. Schröder, U. Schöpf, A. Crispin, E. Klotz, M. Reiser, and et al., “Reproducibility of coronary calcium quantification in repeat examinations with retrospectively ECG-gated multisection spiral CT,” European Radiology, vol. 12, no. 6, pp. 1532–1540, Jun 2002.
  • [38] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on Machine Learning, Jun 2015, pp. 448–456.
  • [39] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (elus),” in International Conference on Machine Learning, 2016, arXiv: 1511.07289.
  • [40] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Computer Vision - European Conference on Computer Vision 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., vol. 8689.   Springer International Publishing, 2014, pp. 818–833.
  • [41] Theano Development Team, “Theano: A Python framework for fast computation of mathematical expressions,” arXiv e–prints, vol. abs/1605.02688, 2016.
  • [42] S. Dieleman, J. Schlüter, C. Raffel, E. Olson, S. K. Sønderby, D. Nouri, D. Maturana, M. Thoma, E. Battenberg, J. Kelly, J. D. Fauw, M. Heilman, D. M. de Almeida, B. McFee, H. Weideman, G. Takács, P. de Rivaz, J. Crall, G. Sanders, K. Rasul, C. Liu, G. French, and J. Degrave, “Lasagne: First release.” 2015.
  • [43] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000.
  • [44] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representation, 2015.
  • [45] M. L. McHugh, “Interrater reliability: the kappa statistic,” Biochem Med (Zagreb), vol. 22, no. 3, pp. 276–282, 2012.
  • [46] C. Cano-Espinosa, G. González, G. R. Washko, M. Cazorla, and R. S. J. Estépar, “Automated agatston score computation in non-ECG gated CT scans using deep learning,” in Proceedings of SPIE, vol. 10574, 2018, pp. 10 574 – 10 574 – 6.
  • [47] E. A. Regan, J. E. Hokanson, J. R. Murphy, B. Make, D. A. Lynch, T. H. Beaty, D. Curran-Everett, E. K. Silverman, and J. D. Crapo, “Genetic epidemiology of COPD (COPDGene) study design,” COPD, vol. 7, no. 1, pp. 32–43, Feb 2010.
  • [48] T. K. Koo and M. Y. Li, “A guideline of selecting and reporting intraclass correlation coefficients for reliability research,” J Chiropr Med, vol. 15, no. 2, pp. 155–163, 2016.