Automatic Calcium Scoring in Cardiac and Chest CT Using DenseRAUnet

by   Jiechao Ma, et al.

Cardiovascular disease (CVD) is a common and strong threat to human beings, featuring high prevalence, disability and mortality. The amount of coronary artery calcification (CAC) is an effective factor for CVD risk evaluation. Conventionally, CAC is quantified using ECG-synchronized cardiac CT but rarely from general chest CT scans. However, compared with ECG-synchronized cardiac CT, chest CT is more prevalent and economical in clinical practice. To address this, we propose an automatic method based on Dense U-Net to segment coronary calcium pixels on both types of CT scans. Our contribution is two-fold. First, we propose a novel network called DenseRAUnet, which takes advantage of Dense U-net, ResNet and atrous convolutions. We prove the robustness and generalizability of our model by training it exclusively on chest CT while test on both types of CT scans. Second, we design a loss function combining bootstrap with IoU function to balance foreground and background classes. DenseRAUnet is trained in a 2.5D fashion and tested on a private dataset consisting of 144 scans. Results show an F1-score of 0.75, with 0.83 accuracy of predicting cardiovascular disease risk.


Direct Automatic Coronary Calcium Scoring in Cardiac and Chest CT

Cardiovascular disease (CVD) is the global leading cause of death. A str...

Generative Models for Reproducible Coronary Calcium Scoring

Purpose: Coronary artery calcium (CAC) score, i.e. the amount of CAC qua...

A CT-Based Airway Segmentation Using U^2-net Trained by the Dice Loss Function

Airway segmentation from chest computed tomography scans has played an e...

Automatic calcium scoring in low-dose chest CT using deep neural networks with dilated convolutions

Heavy smokers undergoing screening with low-dose chest CT are affected b...

ViPTT-Net: Video pretraining of spatio-temporal model for tuberculosis type classification from chest CT scans

Pretraining has sparked groundswell of interest in deep learning workflo...

Bone Structures Extraction and Enhancement in Chest Radiographs via CNN Trained on Synthetic Data

In this paper, we present a deep learning-based image processing techniq...

Coronary Artery Centerline Extraction in Cardiac CT Angiography Using a CNN-Based Orientation Classifier

Coronary artery centerline extraction in cardiac CT angiography (CCTA) i...

1 Introduction

Cardiovascular disease (CVD) has become one of the most high-mortality diseases, for which the amount of coronary artery calcification acts as a strong indicator of CVD risk [1]. In clinical practice, CAC is quantified by the Agatston score, using dedicated cardiac CT scans, followed by a expert who manually identify CAC lesions.

To assist medical professionals, previous work based on classical machine learning have attempted to design CAD methods for computation of CAC score. Durlak et al.


applied an atlas-based feature approach in combination with a random forest classifier which is used to incorporate fuzzy spatial knowledge from offline data. Isgum et al.


employed a nearest neighbor classifier directly and a two-stage classification with nearest neighbor as well as support vector machine classifiers. There are plenty of other research can be explored

[4, 5, 6].

In recent years, convolutional neural networks (CNNs) have exhibited great success in Computer Vision by data-driven, especially in image classification tasks. Meanwhile, fully convolutional networks (FCNs) , as the extension of CNNs, also obtained state-of-the-art performance for segmentation problems. In the context of medical image segmentation, specifically cardiac calcification segmentation, algorithms based on deep learning have shown promise. Wolterink et al.

[7] first attempted to apply CNNs to CAC scoring in contrast-enhanced cardiac CT, with a two-stage network structure but only one stage using deep learning. Recently, some works used two-stage deep learning structure [8, 9], with the first stage identifying CAC-suspected voxels and the second stage more precisely identifying CAC. Shadmi et al. [10] employed Dense-FCN, a design different from the two-stage methods, to segment the lesion directly in cardiac CT. But all the automatic CAC scoring approaches above are designed for either cardiac or chest CT only.

More recently, multiple screening in one CT session has become a trend in clinical practice. Huang et al. [11] presented an automatic method with two CNNs that performs direct computation of CAC score in both cardiac and chest CT scans. On the other hand, according to the work of Wolterink et al. [9], 2.5D input has a great advantage compared with 3D input in CAC scoring, as the number of parameters are greatly reduced while retaining spatial information. Both Lessmann et al. and wolterink et al. [8, 9] used the 2.5D ConvNets combining features from three identical 2D ConvStacks with shared weights, each processing an input patch from a different orthogonal viewing direction (axial, sagittal and coronal). To our knowledge, none have applied the efficient 2.5D FCN architecture on multiple types of non-enhanced CT.

In this work, we propose an automatic method for CAC scoring on both ECG-synchronized cardiac and chest CT. Unlike the the methods required two cascaded networks to calculate CAC scoring, our network directly segment the calcified voxels and obtain CAC scoring. Meanwhile, we adopt a 2.5D patch input to reduce the computational overhead of 3D input. Instead of previous 2.5D methods [8, 9]which input patch from axial, sagittal and coronal direction, our network takes 9-channel stacks of images with corresponding 2D labels for segmentation of the corresponding center slice. We applied our method on a private dataset composed of 44 Cardiac CT scans and 805 chest CT scans. In comparison to experts’ manual annotations, our algorithm achieved competitive results.

2 Materials and Methods

2.1 Data

A dataset of 849 CT scans was collected from several medical centers in China, which consists of 805 chest CT scans and 44 cardiac CT scans. The CT scans were acquired by different CT scanners with Philips, GE and Siemens. Each CT scan contains a sequence of slices at the thin-section slice spacing (range from 1.0 to 3.0 mm). CAC lesions were manually labeled by three experienced radiologists from different centers.

We newly connected 144 CT scans as a test set, incorporating chest CT scans and cardiac CT scans from medical centers in China. And to evaluate our network performance, lesions were delineated by experienced radiologists.

2.2 Data preprocessing

Since the connected dataset contains various sizes of chest CT scans and cardiac CT scans, we process all images as follows. First, we resize all CT images to pixel resolution. Second, we randomly crop and then resize images to , where maximum size of the cropped image is . We continuously select nine processed slices as the input of our network, and for such an input, its label is the ground-truth label of its middle slice. We also process all ground-truth labels to alter the pixel label when its corresponding CT value lower than 130HU.

2.3 DenseRAUnet for segmentation

We proposed a novel FCN architecture based on dense U-Net for calcification segmentation, called DenseRAUnet. The network consists of two main components: (1) a basic network for feature extraction, and (2) three task-specific sub-network structures, incorporating

Residual Atrous Unit (RAU), scSE block and Extra Dense Block (EDB). Fig. 1 depicts our proposed DenseRAUnet.

Figure 1: The overall structure of DenseUnet and details in Residual Atrous Unit

The basic network is an encoder-decoder architecture, similar to dense U-Net. We adopt a backbone network (DenseNet-121) as the encoder sub-network. The decoder sub-network consists of three decoder modules. Each decoder module is an upsampling block followed by a scSE block, where upsampling block contains a deconvolution layer and two convolution layers, which followed by a Batch Normalization (BN) layer and an activation function called ReLU.

Residual Atrous Unit. Accurately segmenting various sizes of calcified areas may require different combinations of local and global information. So we consider that a simple skip connection is not enough for the complex segmentaiton task. Inspired by ASPP [12] and embed the idea of Inception [13], we further design a lateral connection called Residual Atrous Unit (RAU). Such a module is a residual block, and is used to capture multi-scale information by combining several convolutional layers with different dilation rates in parallel. As shown in Fig.~1, we use a concatenation of three dilated convolution layers with dilation rates are 2, 4, and 8 in each RAU.

scSE block. To take full advantage of local and global information, we added scSE block in the decoder sub-network, which is introduced in [14] for recalibrating the feature maps separately along channel and space.

Extra Dense Block. In order not to waste the image features extracted from input images, we insert an Extra Dense Block (EDB) in the first skip connection. Such a block could make more accurate use of shallow information, which do not represent input image in a high dimensional space, via adding more nonlinear into the first long connection.

2.4 Loss Function

Inter-class imbalances are common problems when using deep learning methods for image segmentation, and even more in medical image segmentation. To solve it, we propose a new loss function, the combination of Bootstrap Loss and IoU Loss:


Bootstrap Loss. When we train a FCN, though images were cropped, there may be thousands of labeled pixels to predict. However, many of them may be easily distinguishable, and continuing to learn from these pixels does not improve model performance. In the context of medical image segmentation, most of such pixels are marked as background. For this reason, we design a weighted bootstrap loss, which not only forces network to focus on hard pixels but also balances positive and negative pixels during training.

Suppose there is only one processed image per mini-batch and there are a total pixels to predict. There are only two categories in the label space. Let denotes the ground-truth label of pixel , and

denotes the predicted probability that pixel

belongs to the category . Then, the loss function could be defined as:


where is a threshold. Here is equal to one when the condition in parentheses, and otherwise is zero. In other words, we focus all positive pixels and drop negative pixels when they are too easy for the current model, i.e. their predicted probability greater than . In practice, we hope that positive and negative pixels are balanced, hence we add and as trade-off coefficients.

IoU Loss. Bootstrap loss is similar to cross entropy loss, focusing more on its own predictions of pixels and ignoring the relationship between adjacent ones. To better obtain the boundary of lesion, we add IoU in the loss function using such a relationship. Suppose there are pixels to predict. To ensure that losses are on the same magnitude, we use the following exponential form of IoU:


where is the predicted probability of pixel , is the ground-truth label of pixel .

2.5 Post-processing

The final segmentation result of the network is obtained by a predefined threshold (here set to 0.5), and each lesion segmented by the network is considered a calcification candidate. Then each candidate is classified as CAC by thresholding with 130 HU and performing connected-components analysis. Since the CT slice thickness is mostly 1mm, calculation of the final Agatston score for the whole volume is done by the following corrected formula:


where is the th CT slice of a CT volume, is the th selected lesion, is the weighted intensity, is the lesion area, and is the slice spacing (mm).

3 Experiments and Results

Evaluation Metric. We evaluate the pixel-level segmentation performance of the network by F1 score:


We also define CAC rate denotes the proportion of patients who was correctly predicting the CVD risk level without post-processing, and CAC filter Rate represents the proportion of patients with post-processing.

Implementation details.

The experiments conducted were all trained from scratch and initialized by the Gauss method. During training, we collected one processed image as a mini-batch for each iteration and trained for 25 epochs. To optimize these experiments with fast convergence, we employed the SGD optimizer with momentum of 0.9. The initial learning rate is 0.001 and is reduced by 0.99 times per 2000 iterations. The parameters in the loss function are experimentally set as

, and . We implemented all the experiments via the deep learning toolki MXNet and trained on a GTX 1080 (NVIDIA) GPU.

Ablation experiments. We use “Dense U-net & Bootstrap Loss” as the baseline for all experiments. To evaluate the effectiveness of various structures in our method, we conducted ablation experiments. First, using the bootstrap loss, we compare the role of three modules in the network. Second, we studied the effect of two loss functions through trained our network.

Basic network Bootstrap RAU EDB scSE IoU F1-Score
Dense U-Net 0.65
0.75 Ours
Table 1: Comprasion of performance of the basic network using different tricks
Tricks CAC No. CAC filter No. Patients CAC Rate CAC filter Rate
Dense U-net 101 99 144 0.70 0.69

Dense U-net+RAU
104 111 144 0.72 0.77

Dense U-net+RAU+EDB
109 115 144 0.76 0.80

Dense U-net+RAU+EDB+scSE
109 117 144 0.76 0.81

Our proposed method
113 120 144 0.78 0.83
Table 2: Qualitative results of CAC rate and CAC filter rate for patients, Patients represents the total number of patients, CAC No. and CAC fliter No. represents the number of patients were predicted correctly by model and post-processing respectively.

Table 1 lists the F1 scores of Dense U-Net using different tricks, Table 2 indicates the performance of corresponding network architectures in Table 1 on CAC. It is shown that all the tricks provide increase in F1 score and CAC in comparison to the baseline. We further observe that adding RAU in the network achieves more significant improvement for CAC segmentation. Comparing the results across Table 1, our method yields the best performance. From Table 2 we can also conclude that post-processing by the definition of CAC score is essential.

4 Conclusion

This paper proposed an algorithm based on deep learning. Our method consists of two core elements: (1) a novel fully convolutional network, DenseRAUnet, and (2) a loss function combined bootstrap loss and IoU. We trained our network in a 2.5D-patch fashion to reduce input parameters while preserving spatial information. While trained solely on chest CT, our model achieved competitive and robust performance on both chest CT and cardiac CT which has significant higher resolution and lower spacing compared to training data, thanks to the power of residual atrous unit that enlarges the receptive field with downward compatibility. We aim to futher explore and extend our method to other medical image analysis challenges in future work.

Figure 2: Segmentation results of chest CT (top) and cardiac CT (bottom). From left to right: the segmentation result of our model without post-processing, the result of with post-processing, ground truth.


  • [1] John A Rumberger, Bruce H Brundage, Daniel J Rader, and George Kondos. Electron beam computed tomographic coronary calcium scanning: a review and guidelines for use in asymptomatic persons. In Mayo Clinic Proceedings, volume 74, pages 243–252. Elsevier, 1999.
  • [2] Felix Durlak, Michael Wels, Chris Schwemmer, Michael Sühling, Stefan Steidl, and Andreas Maier. Growing a random forest with fuzzy spatial features for fully automatic artery-specific coronary calcium scoring. In International Workshop on Machine Learning in Medical Imaging, pages 27–35. Springer, 2017.
  • [3] Ivana Isgum, Mathias Prokop, Meindert Niemeijer, Max A Viergever, and Bram Van Ginneken. Automatic coronary calcium scoring in low-dose chest computed tomography. IEEE transactions on medical imaging, 31(12):2322–2334, 2012.
  • [4] Uday Kurkure, Deepak R Chittajallu, Gerd Brunner, Yen H Le, and Ioannis A Kakadiaris. A supervised classification-based method for coronary calcium detection in non-contrast ct. The international journal of cardiovascular imaging, 26(7):817–828, 2010.
  • [5] Rahil Shahzad, Theo van Walsum, Michiel Schaap, Alexia Rossi, Stefan Klein, Annick C Weustink, Pim J de Feyter, Lucas J van Vliet, and Wiro J Niessen. Vessel specific coronary artery calcium scoring: an automatic system. Academic radiology, 20(1):1–9, 2013.
  • [6] Jelmer M Wolterink, Tim Leiner, Richard AP Takx, Max A Viergever, and Ivana Išgum. An automatic machine learning system for coronary calcium scoring in clinical non-contrast enhanced, ecg-triggered cardiac ct. In Medical Imaging 2014: Computer-Aided Diagnosis, volume 9035, page 90350E. International Society for Optics and Photonics, 2014.
  • [7] Jelmer M Wolterink, Tim Leiner, Max A Viergever, and Ivana Išgum. Automatic coronary calcium scoring in cardiac ct angiography using convolutional neural networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 589–596. Springer, 2015.
  • [8] Nikolas Lessmann, Bram van Ginneken, Majd Zreik, Pim A de Jong, Bob D de Vos, Max A Viergever, and Ivana Išgum. Automatic calcium scoring in low-dose chest ct using deep neural networks with dilated convolutions. IEEE transactions on medical imaging, 37(2):615–625, 2018.
  • [9] Jelmer M Wolterink, Tim Leiner, Bob D de Vos, Robbert W van Hamersvelt, Max A Viergever, and Ivana Išgum. Automatic coronary artery calcium scoring in cardiac ct angiography using paired convolutional neural networks. Medical image analysis, 34:123–136, 2016.
  • [10] Ran Shadmi, Victoria Mazo, Orna Bregman-Amitai, and Eldad Elnekave. Fully-convolutional deep-learning based system for coronary calcium score prediction from non-contrast chest ct. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), pages 24–28. IEEE, 2018.
  • [11] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , pages 4700–4708, 2017.
  • [12] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2018.
  • [13] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
  • [14] Abhijit Guha Roy, Nassir Navab, and Christian Wachinger. Concurrent spatial and channel ‘squeeze & excitation’in fully convolutional networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 421–429. Springer, 2018.