Cardiovascular disease (CVD) is the leading cause of death globally. Coronary artery calcium (CAC) indicates presence of CVD, and CAC burden has been shown to be a strong and independent predictor of cardiovascular events (CVEs) . Quantifying CAC (calcium scoring) is standardly performed in dedicated ECG-triggered cardiac CT scans without contrast enhancement. However, calcium scoring can also be performed in non-dedicated scans visualizing the heart, e.g. chest CT scans. CAC burden determined in low-dose chest CT scans acquired in lung cancer screening studies has also been shown to predict all-cause mortality . In addition, the two largest lung screening trials, the National Lung Screening Trial (NLST) and the Dutch-Belgian lung cancer screening trial (NELSON), demonstrated that not lung cancer but CVD is the leading cause of mortality in populations of heavy smokers [2, 6, 8]. Hence, determining calcium score as an unrequested finding in lung cancer screening scans has recently gained attention.
In clinical routine, coronary artery calcifications are scored manually. This requires an expert to identify CAC lesions. Especially in non-dedicated chest CT scans this is a cumbersome process. Lack of ECG-triggering results in artifacts that are caused by cardiac motion, thus affecting appearance of coronary calcifications. Low irradiation dose results in high noise levels, which makes it hard to identify small calcifications. In addition, subjects undergoing screening are at high risk of CVD due to history of heavy smoking, thus often present many calcified lesions requiring substantial manual interaction in scoring.
To allow fast identification of subjects at risk of CVEs in lung cancer screening settings at low cost, qualitative instead of quantitative scoring has been proposed categorizing subjects into none, mild, moderate, or heavy risk groups . Even though simple and fast, such analyses requires experts to inspect the scans. To overcome this limitation, several automatic calcium scoring methods for chest CT have been proposed [5, 9, 12]
. These methods first select high density voxels that are assumed to contain calcium with the clinically accepted threshold of 130 HU. The selected voxels are than classified by supervised machine learning methods to separate CAC from other calcifications. Išgum et al.
proposed a method that used a map providing a priori probabilities for the spatial appearance of CAC. Thereafter, a two-stage classification employing nearest neighbor and support vector classifiers was designed to identify coronary calcifications using texture, size, and spatial features that were derived from the spatial a priory probability map. Xie et al. identified CAC lesions as connected groups of high-intensity voxels in a roughly segmented heart that was found by segmentations of the lungs, bone, aorta and fatty tissue. Lessmann et al.  identified coronary artery calcifications using a convolutional neural network (ConvNet) that analyzed high density voxels in a bounding box around the heart.
We propose a method that determines calcium score directly from the CT scan that circumvents intermediate segmentation of coronary calcifications. The method first localizes the heart using a ConvNet for classification and subsequently quantifies CAC burden in axial image slices directly using a ConvNet for regression. The method allows direct and real-time quantification of cardiovascular risk in lung cancer screening participants.
A subset of 1,546 chest CT scans (911 men, age range: 54–74) was randomly selected from a set of 6,000 available baseline scans from the NLST. Scans were selected with 1:2 ratio of deaths:survivors to match the overall ratio in our data set. All scans were acquired during inspiratory breath-hold without contrast enhancement. Scans were acquired in 31 different hospitals with 120 or 140 kVp tube voltage and 30–160 mAs tube current. Axial images slices were reconstructed with varying kernels, varying thickness (1.00–3.00 mm), varying increments (0.63–3.00 mm), and with varying in-plane resolutions (0.49–0.98 mm per voxel). In our study, scans with less than 100 slices or slices thicker than 3 mm were not considered, because they were not adequate for calcium scoring. Furthermore, the scans were resampled to 3.00 mm slice thickness and 1.50 mm slice increment to make the scans suitable for calcium scoring .
The reference standard was defined by one of three observers who manually identified CAC lesions in the scans. Following the clinical procedure, CAC lesions were defined as 26-connected components containing voxels above 130 HU threshold with a minimum size of 1.5 mm. For each subject Agatston and volume calcium scores were computed. These scores were used as reference standard for development, training, and evaluation of the method. Based on the Agatston score, each subject was assigned to one of five cardiovascular risk groups: very low (1), low (1–10), moderate (10–100), moderately high (100–400), and high (400).
We propose a ConvNet design to directly quantify CAC from axial image slices using regression. This mimics clinical analysis, where CAC burden is quantified in axial slices with the Agatston score  for CVD risk categorization [4, 1]. The score is determined as follows, in each slice the area of each CAC lesion is multiplied by a weight factor that is defined by the maximum CT number in the lesion (1: 130–200 HU, 2: 200–300 HU, 3: 300–400 HU, 4: 400 HU). The total patient Agatston score is obtained by addition of CAC lesion scores in all slices. Alternatively, CAC can be quantified with the CAC volume score (i.e. the total volume of all CAC lesions in scan). The volume score has been shown to be more reproducible and – although without clinical implications – it is often calculated.
To facilitate analysis within the heart region only, a bounding box around the heart is first extracted using a method described by De Vos et al. . This method uses three independent ConvNets that each determine presence of the heart in axial, coronal, or sagittal image slices. The combination of these per-slice probabilities yields a 3D bounding box around the heart.
Axial image slices cropped to the cardiac bounding box are used as input for a dedicated regression ConvNet (Figure 1
). The ConvNet contains six layers of 3x3 convolution kernels with 2x2 max pooling, two fully connected layers, and thereafter a single output node. Batch normalization is applied after each layer with exponential linear units used for activation. To enable correct computation of area or volume of calcified lesions, voxel dimensions are appended to the first connected layer. The regression ConvNet is trained to output either an Agatston or CAC-volume score.
To train the ConvNet, 1,158 images were randomly selected. Among them 772 were used to design and 386 to validate the network. An independent set of 388 images was selected to test the method and was not used during method development in any way.
Input image slices were first cropped to the cardiac bounding box and were thereafter 0-padded topixels to obtain a fixed input size for the regression ConvNet.
The regression ConvNet was trained in 50 epochs with mini-batches of 100 randomly ordered image slices. After every epoch the performance was evaluated on the validation set and the optimal network was chosen based on the training and the validation errors, which were defined as the mean of absolute differences between target and predicted values. Adam was used as optimizer with a learning rate of
, a first moment exponential decay rate of 0.9, and a second moment exponential decay rate of 0.999.
Four experiments were performed for both Agatston and volume scoring. In the first experiment a regression ConvNet was trained with image slices as input and the per-slice manually determined CAC score as the target regression value. The ConvNet was trained with separate kernels per convolutional layer. Thereafter, the same setup was used but with shared kernels (shared weights but separate biases per kernel) to limit the number of parameters. In the third experiments the same network design was trained, but with log transformed target regression value (, where is the transformed target value and the reference CAC score. In the final experiment, the same log transformed target regression values were used with shared kernels across layers. The log transform was performed to optimize the networks for CVD risk stratification. Given that the total per-patient Agatston scores are non-linearly stratified into CVD risk categories, more importance was assigned to lower Agatston scores – the log transform induces higher penalties for lower values, thus forcing higher precision for lower calcium burden.
The total subject Agatston and volume scores obtained automatically by the ConvNet regressor were compared with manually determined scores. Two-way intraclass correlation coefficient (ICC) for absolute agreement with 95% confidence interval was computed. In addition, for each subject, a CVD risk category was determined based on the Agatston score. Agreement between reference and automatic CVD risk categorization was assessed using accuracy and Cohen’s linearly weighted.
6 Results and Discussion
Six scans were excluded from the experiments (3 scans from the training, 2 from the validation and 1 from the test set) because heart localization failed in one scan, and in the other scans the heart was larger than the allowed maximum size of the input. The training set contained 47,007 image slices.
Table 1 lists ICC values for Agatston and CAC volume scores for each experiment. The results show that the best performance was achieved when weights were not shared between kernels across layers and no transformation of the target values was performed. Furthermore, a slightly better performance was achieved in prediction of Agatston than of CAC volume scores. Given that the Agatston score depends on size (area) and on maximum intensity of calcifications, while the volume score depends on size (number of voxels) only, this is not a surprising result. Namely, ConvNets perform the analysis by determining texture filters.
|i||0.98 (0.97–0.98)||0.97 (0.96–0.97)|
|ii||✓||0.97 (0.97–0.98)||0.96 (0.95–0.97)|
|iii||✓||0.96 (0.95–0.97)||0.95 (0.94–0.96)|
|iv||✓||✓||0.94 (0.93–0.95)||0.93 (0.92–0.95)|
Table 2 lists accuracy and Cohen’s linearly weighted kappa showing agreement in CVD risk category assignment between manually defined and automatically predicted Agatston scores. These results also show that the best performance was achieved when weights were not shared between kernels across layers. However, unlike in CAC score prediction, better results were achieved when the target Agatston scores were log transformed. As hypothesized, log transformation increased precision and accuracy of low valued Agatston scores and therefore of risk categorization.
To illustrate that the proposed method obtains results by detecting CAC, transposed convolutions were performed to obtain feature maps back to the third convolutional layer. A joint response of the network was plotted as a heatmap (Figure 2) to serve as and indication of the area of interest that determined the predicted CAC score.
Earlier studies performing automatic calcium scoring in cardiac or chest scans first identified calcified lesions and subsequently computed calcium scores. Several methods developed for CAC scoring in cardiac CT scans reported accurate quantification but are typically not directly applicable for calcium scoring in chest CT . To the best of our knowledge, three methods were developed for CAC scoring in chest CT scans acquired in lung cancer screening settings. The studies evaluated performance of their methods differently and it is therefore not possible to directly compare their performance. Xie et al.  analyzed a set of 41 scans from the NLST and reported agreement in the assignment of subjects in four CVD risk categories (0–10, 10–100, 100–400, 400) of 59%. Isgum et al.  and Lessmann et al.  performed calcium scoring in scans from the NELSON trial and reported agreement in assignment of subjects in five CVD risk groups (1, 1–10, 10–100, 100–400, 400). Both studies reported high accuracy of 82% and 84%, respectively. However, both studies analyzed scans that were acquired with a standardized protocol. In this study scans were used from 31 hospitals acquired on 31 different scanners with a variety in protocols. This resulted in very diverse set of chest scans.
Training the networks took approximately 7 hours per network, but testing took less than 2 seconds per scan on a state-of-the-art GPU, including localization of the heart.
The ConvNet predicting calcium score analyzed axial slices cropped to the bounding box around the heart. Although the method to generate these bounding boxes was robust, as determined by visual inspection; in future work, we aim to investigate performance of the method in full axial slices. Furthermore, the method has been evaluated using chest CT scans. In future work we will evaluate application to dedicated cardiac scans and other scans that visualize the heart.
A method for direct determination of coronary artery calcium scores from CT images has been presented. Unlike previous methods, this method allows for real-time quantification of coronary calcium burden without the need for identification or segmentation of individual coronary artery calcifications. The results demonstrate that the method is able to assign subjects undergoing lung cancer screening to CVD risk groups with high accuracy.
The authors thank the National Cancer Institute for access to NCI’s data collected by the National Lung Screening Trial. The statements contained herein are solely those of the authors and do not represent or imply concurrence or endorsement by NCI.
-  Agatston, A.S., Janowitz, W.R., Hildner, F.J., Zusmer, N.R., Viamonte, M., Detrano, R.: Quantification of coronary artery calcium using ultrafast computed tomography. J. Am. Coll. Cardiol. 15(4), 827–832 (1990)
-  Chiles, C., Duan, F., Gladish, G.W., Ravenel, J.G., Baginski, S.G., Snyder, B.S., DeMello, S., Desjardins, S.S., Munden, R.F., the NLST Study Team, F.: Association of coronary artery calcification and mortality in the national lung screening trial: A comparison of three scoring methods. Radiology 276(1), 82–90 (2015)
-  De Vos, B.D., Wolterink, J.M., De Jong, P.A., Viergever, M.A., Isgum, I.: 2D image classification for 3D anatomy localization; employing deep convolutional neural networks. In: SPIE Medical Imaging (February 2016)
-  Detrano, R., Guerci, A.D., Carr, J.J., Bild, D.E., Burke, G., Folsom, A.R., Liu, K., Shea, S., Szklo, M., Bluemke, D.A., et al.: Coronary calcium as a predictor of coronary events in four racial or ethnic groups. N. Engl. J. Med. 358(13), 1336–1345 (2008)
-  Isgum, I., Prokop, M., Niemeijer, M., Viergever, M.A., van Ginneken, B.: Automatic coronary calcium scoring in low-dose chest computed tomography. IEEE Trans Med Imaging 31(12), 2322–2334 (2012)
-  Jacobs, P.C., Prokop, M., van der Graaf, Y., Gondrie, M.J., Janssen, K.J., de Koning, H.J., Isgum, I., van Klaveren, R.J., Oudkerk, M., van Ginneken, B., Mali, W.P.: Comparing coronary artery calcium and thoracic aorta calcium for prediction of all-cause mortality and cardiovascular events on low-dose non-gated computed tomography in a high-risk population of heavy smokers. Atherosclerosis 209(2), 455–462 (2010)
-  Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: International Conference on Learning Representation (2015)
-  van Klaveren, R.J., Oudkerk, M., Prokop, M., Scholten, E.T., Nackaerts, K., Vernhout, R., van Iersel, C.A., van den Bergh, K.A.M., van ’t Westeinde, S., van der Aalst, C., et al.: Management of lung nodules detected by volume CT scanning. N Engl J Med 361(23), 2221–2229 (2009)
-  Lessmann, N., Isgum, I., Setio, A., De Vos, B., Ciompi, F., de Jong, P., Oudkerk, M., Mali, W., Viergever, M., van Ginneken, B.: Deep convolutional neural networks for automatic coronary calcium scoring in a screening study with low-dose chest CT. In: SPIE Medical Imaging. pp. 9785–36 (2016)
-  Rutten, A., Isgum, I., Prokop, M.: Calcium scoring with prospectively ECG-triggered CT: Using overlapping datasets generated with MPR decreases inter-scan variability. Eur J Radiol 80(1), 83–88 (2010)
-  Wolterink, J.M., Leiner, T., De Vos, B.D., Coatrieux, J.L., Kelm, B.M., Kondo, S., Salgado, R.A., Shahzad, R., Shu, H., Snoeren, M., Takx, R.A.P., van Vliet, L.J., van Walsum, T., Willems, T.P., Yang, G., Zheng, Y., Viergever, M.A., Išgum, I.: An evaluation of automatic coronary artery calcium scoring methods with cardiac ct using the orcascore framework. Medical Physics 43(5), 2361–2373 (2016)
-  Xie, Y., Cham, M.D., Henschke, C., Yankelevitz, D., Reeves, A.P.: Automated coronary artery calcification detection on low-dose chest CT images. In: Proc. SPIE Med. Imag. vol. 9035, p. 90350F (2014)
-  Yeboah, J., McClelland, R.L., Polonsky, T.S., Burke, G.L., Sibley, C.T., Oâ€™Leary, D., Carr, J.J., Goff, D.C., Greenland, P., Herrington, D.M.: Comparison of novel risk markers for improvement in cardiovascular risk assessment in intermediate-risk individuals. JAMA 308(8), 788–795 (2012)