Automatic calcium scoring in low-dose chest CT using deep neural networks with dilated convolutions

11/01/2017 ∙ by Nikolas Lessmann, et al. ∙ UMC Utrecht 0

Heavy smokers undergoing screening with low-dose chest CT are affected by cardiovascular disease as much as by lung cancer. Low-dose chest CT scans acquired in screening enable quantification of atherosclerotic calcifications and thus enable identification of subjects at increased cardiovascular risk. This paper presents a method for automatic detection of coronary artery, thoracic aorta and cardiac valve calcifications in low-dose chest CT using two consecutive convolutional neural networks. The first network identifies and labels potential calcifications according to their anatomical location and the second network identifies true calcifications among the detected candidates. This method was trained and evaluated on a set of 1744 CT scans from the National Lung Screening Trial. To determine whether any reconstruction or only images reconstructed with soft tissue filters can be used for calcification detection, we evaluated the method on soft and medium/sharp filter reconstructions separately. On soft filter reconstructions, the method achieved F1 scores of 0.89, 0.89, 0.67, and 0.55 for coronary artery, thoracic aorta, aortic valve and mitral valve calcifications, respectively. On sharp filter reconstructions, the F1 scores were 0.84, 0.81, 0.64, and 0.66, respectively. Linearly weighted kappa coefficients for risk category assignment based on per subject coronary artery calcium were 0.91 and 0.90 for soft and sharp filter reconstructions, respectively. These results demonstrate that the presented method enables reliable automatic cardiovascular risk assessment in all low-dose chest CT scans acquired for lung cancer screening.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 3

page 5

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Screening with low-dose chest CT has been found effective in reducing mortality from lung cancer in current or former heavy smokers[1]. However, smoking is not only a major risk factor for lung cancer, but also for cardiovascular disease (CVD)[2]. The presence of CVD can be detected in CT scans by measuring the amount of coronary artery calcification (CAC), a strong and independent predictor of cardiovascular events. CAC is usually quantified in dedicated cardiac CT images and expressed as a calcium score[3]. Recent studies have shown that calcium scores are also able to predict cardiovascular events if quantified in low-dose chest CT for lung cancer screening[4, 5, 6]. Calcium scoring could therefore complement lung cancer screening programs to help identify subjects at elevated cardiovascular risk without the need for further imaging [7, 8]. However, manual calcium scoring in addition to lung screening would impose a considerable extra burden on screening programs due to the large number of subjects, the high average calcium burden in a high risk population, the suboptimal image quality of low-dose screening scans, and the lack of ECG synchronization leading to cardiac motion artifacts. Automatic calcium scoring could therefore be a viable alternative that would enable routine cardiovascular risk prediction from low-dose chest CT scans.

Automatic methods for CAC scoring have been developed mostly for dedicated non-contrast enhanced cardiac CT [9, 10, 11, 12, 13, 14, 15] or cardiac CT angiography (CTA) scans [16, 17, 18, 19, 20, 21, 22]. Only few methods have been developed specifically for coronary calcium scoring in chest CT [23, 24, 25, 26].

In non-contrast cardiac CT scans, the coronary arteries are visible only when calcified or embedded in fat. Automatic scoring methods therefore typically rely on segmentation or rough localization of larger structures such as the heart and the aorta to derive a region of interest [12, 13]

or to derive spatial features for classification of candidate lesions using machine learning

[9, 10, 11]. Other methods that use machine learning derive spatial features from segmentations of the coronary arteries, which are obtained by registration of the non-contrast scan with a CTA-based atlas[14, 15]. Commonly used features besides spatial features are texture, lesion volume and shape. Spatial features are consistently reported to be most important.

In cardiac CTA scans, the coronary arteries are well visible due to the arterial contrast enhancement. Automatic scoring methods therefore typically perform a segmentation of the coronary artery tree. The segmentation is used to detect calcifications by searching for strong intensity gradients along the segmented vessel [16, 17, 18, 19] because calcifications are typically brighter than the contrast enhanced lumen of the vessel. In a similar approach, Mittal et al.[20]

employ a classifier to detect calcifications based on features that describe the texture along the vessel. In contrast to such approaches, Wolterink et al.

[15]

proposed a method without prior segmentation of the coronary arteries. The method uses a convolutional neural network (CNN) in combination with simple spatial features based on image coordinates to classify candidate voxels in the image. To reduce the false positive rate of the CNN, connected groups of voxels that were detected by the CNN are reclassified by a random forest classifier. In their later publication

[22], a second CNN replaces this step.

In non-contrast chest CT scans, segmentation of the coronary artery tree is not feasible due to the lack of contrast enhancement and due to cardiac motion artifacts caused by the lack of ECG synchronization. Automatic scoring methods therefore typically rely on other means to identify a region of interest, similar to methods for CAC detection in non-contrast cardiac CT. Išgum et al.[23]

obtain spatial features from a coronary calcium probability atlas and use these in combination with volume and texture features in a multi-classifier approach. Xie et al.

[24] and González et al.[26] segment or roughly localize the heart and identify coronary calcifications in the detected region of interest based on decision rules. In our preliminary work, we proposed to use a CNN to classify candidate voxels within a bounding box around the heart [25].

In addition to coronary calcifications, thoracic aorta calcifications (TAC) and calcifications of the cardiac valves have been related to cardiovascular risk[27, 28, 29]. For calcium scoring in the thoracic aorta in chest CT, few automatic methods have been published. All methods first perform a segmentation of the aorta followed by either rule-based calcification detection using auxiliary segmentations of trachea and spine [30, 31] or calcification detection based on machine learning [32]

. The machine learning approach uses kNN classifiers with features similar to those used by methods for coronary calcium detection: various spatial features derived from the segmentation of the aorta, volume of the potential calcification and texture features. For detection and quantification of cardiac valve calcifications, automatic methods have not been published.

We propose an automatic system for concurrent detection of CAC, TAC and cardiac valve calcifications in low-dose chest CT. These calcifications likely show different aspects of atherosclerotic disease and their quantities can potentially complement each other in detecting the presence of CVD and in predicting cardiovascular events. In contrast to simple combination of the output of multiple systems, e.g., one system for CAC and another for TAC detection, a single method for concurrent detection avoids ambiguous results.

Furthermore, we propose to label each voxel separately according to the affected vessel rather than labeling calcified lesions, i.e., connected voxels above a certain intensity value. Lesion labeling is standardly performed in clinically used commercial software and most previous automatic methods. However, voxel labeling allows separation of single lesions that extend to multiple vascular beds, e.g., in the aorta and a coronary artery. This is important because calcifications in different arteries carry different prognostic value [28].

Next, we propose to use a CNN to directly identify potential calcifications within the entire image, without the need to explicitly segment or localize anatomical structures. The majority of methods in the literature for CAC or TAC detection rely on segmentation to restrict the region of interest or to infer spatial features as these have been found important for accurate calcium detection. Instead, context information is not provided by segmentations, but rather the CNN has to be able to infer context information directly from the image. We therefore use a CNN with a particularly large receptive field, which is achieved by using an architecture based on dilated convolutions. This large receptive field furthermore enables the network to label candidates based on their anatomical location. Similar to [22], we subsequently employ a second CNN to identify calcifications among the candidates identified and labeled by the first CNN. The main contribution is therefore a network architecture tailored specifically to the problem of calcium detection in low-dose chest CT scans.

Finally, we evaluated our method on a large and diverse dataset from the to date largest lung cancer screening trial with low-dose chest CT. Most other methods for automatic CAC or TAC scoring have been evaluated only on relatively small and homogeneous datasets. However, a method that is applied clinically will face images acquired with a multitude of scanner models and reconstructed with a wide range of reconstruction algorithms. In this work, scans were therefore selected such that a wide variety of screening sites, scanner models and reconstruction algorithms are present.

Ii Dataset

Vendor Models Reconstruction filters
General Electric LightSpeed 16 Standard (soft tissue)
(GE) LightSpeed Pro 16 Bone, Lung (medium/sharp)
LightSpeed Ultra
LightSpeed Plus
LightSpeed QX/i
Discovery QX/i
HiSpeed QX/i
Siemens Sensation 4 B30f (soft tissue)
Sensation 16 B50f, B80f (medium/sharp)
Volume Zoom
Philips MX8000 C (soft tissue)
MX8000 IDT D (medium/sharp)
Toshiba Aquilion FC10 (soft tissue)
FC51 (medium/sharp)
TABLE I: Overview of scanner models and reconstruction filters
Fig. 1: Segmentation of a TAC lesion in a noisy scan (left) using 3D region growing (center) and fully manual segmentation (right). Region growing here leads to segmentation of a large amount of noise together with the calcium.
Fig. 2: Overview of the proposed calcium detection method. Two CNNs are applied consecutively to first identify and label candidates (CNN1), and to finally identify true calcifications among the candidates (CNN2). The size of the receptive fields of the networks are indicated by green dotted areas.

For training and evaluation, we used low-dose chest CT scans acquired in the National Lung Screening Trial (NLST). The NLST was a large lung cancer screening trial in the United States that enrolled current or former heavy smokers aged 55 to 74[1]. To develop and evaluate the proposed automatic calcium scoring method on a diverse data set, we selected scans from available baseline scans by randomly sampling from the scans acquired with the 25 most common imaging settings with respect to scanner model and reconstruction algorithm. Specifically, 100 scans were selected for each of the 10 most common settings, and up to 50 for each of the 15 next most common settings.

The selected scans were acquired in 31 medical centers on 13 different scanner models from four major CT scanner vendors (Table I). The scans were acquired at breath-hold after inspiration in helical scanning mode without contrast enhancement and without ECG synchronization. Tube voltage was set to 120 kVp, or 140 kVp for large subjects (). In-plane resolution ranged from to , slice thickness from to and slice spacing from to . Since calcium scoring is typically performed on thick slices, we reconstructed thick axial slices with slice spacing from all scans.

To establish a reference standard, calcifications were manually labeled in all scans. Scans were distributed among four trained observers and one radiologist with extensive experience in calcium scoring. To measure interobserver agreement, a subset of 100 scans (four scans from each of the 25 different scanner models and reconstruction algorithms) was annotated by two of the trained observers and the radiologist. Manual calcium annotation usually requires the observer to select only a single voxel per lesion. The lesion is then automatically segmented with region growing using the standard intensity threshold of . In low-dose scans, however, intensity based region growing often leads to large amounts of noise being segmented with the calcium (Figure 1). Moreover, it can lead to the spine and ribs being segmented together with calcium, or calcifications in arteries branching off the aorta being segmented together with calcium in the aorta. The observers therefore marked calcifications voxel-by-voxel () in the coronary arteries, the aorta and the aortic and mitral valves, including the annulus. Coronary calcifications were labeled as either left anterior descending artery (LAD), left circumflex artery (LCX) or right coronary artery (RCA). The left main coronary artery was considered part of LAD because these are difficult to distinguish on ungated scans. Motion artifacts caused by calcifications were annotated as calcifications because an exact separation of true calcification and artifact is often not possible. Depending on the amount of calcification and the image quality, the annotation effort varied from 5–10 minutes for images with soft reconstruction and little calcium to 60–90 minutes for images with sharp reconstruction and/or large amounts of calcium.

Iii Method

The proposed method for automatic detection of CAC, TAC and calcifications of the aortic and mitral valves consists of two steps. Each step uses a CNN to classify voxels in the image. The first CNN (CNN1) has a large receptive field to be able to detect calcium based on the anatomical context and to label calcium voxels according to their anatomical location. The second CNN (CNN2) has a smaller receptive field and discards false positives based on local image information. Only voxels that CNN1 considers calcium are classified by CNN2 as either true-positive or false-positive (Figure 2).

Iii-a First stage network (CNN1)

CNN1 classifies all voxels in the image that exceed the standard calcium threshold of . The number of voxels that need to be classified in each scan is therefore high. Classification voxel-by-voxel with a sliding window approach would be highly inefficient as many identical convolutional operations would be repeated unnecessarily. CNN1 is therefore constructed as a purely convolutional network [33, 34], i.e., with all layers implemented as convolutions, which allows arbitrary-sized inputs so that entire slices or volumes can be classified at once (Figure 3). CNN1 classifies voxels as either LAD (including the left main coronary artery), LCX, RCA, TAC, aortic valve calcification, mitral valve calcification or background.

Previous publications showed that spatial information is particularly important for calcium detection. To allow CNN1 to infer spatial information from the image area covered by its receptive field, its receptive field needs to be relatively large. However, CNNs with large receptive fields, such as very deep networks or networks with large convolution kernels, often suffer from overfitting due to large numbers of trainable parameters. To allow for a large receptive field while keeping the number of trainable parameters low, we rely on dilated convolutions, which are based on convolution kernels with spacing between their elements. By stacking convolutions with exponentially growing dilation, the receptive field of the network grows exponentially while the number of trainable parameters only grows linearly[35]. At the same time, the network still includes all information within its receptive field in the analysis.

The architecture of CNN1 is similar to the network proposed by Yu et al.[35], but extends it from 2D inputs to three orthogonal 2D inputs (often referred to as 2.5D) [36]. We chose 2.5D inputs over 3D inputs because of the previously reported superior performance for calcium scoring [22]. The receptive field of CNN1 is pixels, which corresponds to roughly a quarter of an axial slice.

The input of CNN1 is a set of three orthogonal patches, which always intersect in a single voxel regardless of the patch size. This contradicts the idea of purely convolutional networks as larger inputs do then not lead to larger outputs. However, the processing of 2.5D input patches can be divided into 2D subtasks by processing the axial, sagittal and coronal inputs independently[22]

. Each 2D input is first separately processed with the subnetwork for the respective orientation, allowing us to obtain a feature representation per patch in an efficient manner. The remaining layers of the network are applied to the concatenated feature vectors from all three orientations to obtain posterior probabilities for each voxel.

Inspired by the concept of deep supervision[37], each subnetwork has an auxiliary softmax output layer (, and in Figure 3). These are used during training to enable learning from input patches larger than the receptive field, i.e., learning from multiple labeled pixels per patch. Using these auxiliary output layers, auxiliary loss terms , and are defined. A loss term is defined with the output of the entire network for only the voxel in the intersection of the three orthogonal input patches. These loss terms are combined into an overall loss term using a weight factor : .

After training, the auxiliary output layers are used together with the output layer of the entire network for classification. The corresponding posterior probabilities , and from the auxiliary output layers are combined with the posterior probabilities by computing the weighted average of the probabilities with weights , , , and .

Fig. 3: Architecture of CNN1. The three orthogonal patches are analyzed by subnetworks with identical structure (bottom). The seven output classes are LAD, LCX, RCA, TAC, aortic valve, mitral valve and background. Convolutional layers are shown as boxes with filter size (top) and dilation factor (bottom). All convolutional layers consist of 32 filters, only the layer before consists of 128 filters.

Iii-B Second stage network (CNN2)

CNN1 detects potential calcifications based on appearance and spatial context, and furthermore determines whether the calcification is located in the coronary arteries (LAD including left main, LCX or RCA), the aorta, or the aortic or mitral valve. However, metal artifacts, image noise or other high intensity structures, such as parts of the spine in direct proximity to the aorta, can result in false positive voxel detections.

CNN2 refines the output of CNN1 by distinguishing between true calcifications and false positive voxels with similar appearance and location. In contrast to CNN1, CNN2 does therefore not need to focus on the spatial context but can focus on local information and finer details. CNN2

does not use dilated convolutions, but instead non-dilated convolutions with max-pooling between convolutions. CNN

2 is not purely convolutional like CNN1 as it only needs to analyze a limited number of voxels. CNN2 analyzes 2.5D inputs and has a receptive field of pixels. Opposed to the multi-class output of CNN1, the output of CNN2 is binary as its purpose is false positive reduction and not categorization of the detected calcifications. The architecture of CNN2 (Figure 4) is motivated by our preliminary work on coronary calcium scoring[25], in which a similar network achieved good performance when the problem was restricted to a region of interest. In this work, CNN1 detects and labels candidate voxels, i.e., objects of interest, instead of a region of interest.

Fig. 4: Architecture of CNN2. The three orthogonal patches are analyzed by subnetworks with identical structure (bottom). The two output classes are calcium and background. Convolutional layers are shown as boxes with filter size (top) and number of filters (bottom). MP denotes max-pooling layers with the specified pooling region. FC denotes fully-connected (dense) layers with the specified number of units.

Iv Evaluation

[width=0.24]figures/examples_softsharp/A.png SoftSharp[width=0.24]figures/examples_softsharp/A_overlay.png SoftSharp
[width=0.24]figures/examples_softsharp/C.png SoftSharp[width=0.24]figures/examples_softsharp/C_overlay.png SoftSharp
Original imageImage with overlay

Fig. 5: Comparison of low-dose chest CT scans reconstructed with soft filter kernel (left half of each image) and sharp filter kernel (right half of each image). The figures on the right show the same scans as the figures on the left, but with voxels highlighted in red, indicating all voxels above the standard calcium threshold.

Calcium scoring is normally performed in images with soft tissue reconstruction. However, lung cancer screening data also includes images reconstructed with sharper filter kernels, in which edges but also noise appear more prominent (Figure 5). To evaluate whether our method needs to be trained with images that are reconstructed with parameters recommended for calcium scoring (soft tissue filter kernels) or whether it can be trained with all images acquired in the screening (see Table I), two pairs of CNN1 and CNN2 were trained: one pair using soft reconstructions only and the other pair using both soft and sharp reconstructions. Furthermore, to evaluate whether calcium detection can be performed in images reconstructed using soft and sharp filter kernels, and to evaluate which of the two training settings leads to best performance, we evaluated our method separately on soft and sharp reconstructions.

To evaluate the performance of the method, calcifications were quantified per subject and per label using volume and Agatston scores. Agatston scores were normalized to account for overlapping slices [38]. The agreement between automatically and manually determined calcium volumes was assessed using sensitivity, average false positive volume per scan and the F score. Interobserver agreement was assessed using the same metrics by comparing the annotations of the second and third observers to the radiologist as reference.

Additionally, we evaluated cardiovascular risk classification. Each subject was assigned one of four risk categories (i@–iv@: 0–10, 11–100, 101–1000, >1000) based on their total CAC Agatston score. Reliability of the risk category assignment was assessed using the linearly weighted coefficient.

Soft reconstructions Sharp reconstructions
CAC LAD LCX RCA CAC LAD LCX RCA
Reference standard
Scans with any calcification ()
Calcium volume / scan ()
Second observer
Sensitivity ()
False positive volume / scan ()
F score calcium volume
Third observer
Sensitivity ()
False positive volume / scan ()
F score calcium volume
Trained on soft reconstructions only
Sensitivity ()
False positive volume / scan ()
F score calcium volume
Trained on soft and sharp reconstructions
Sensitivity ()
False positive volume / scan ()
F score calcium volume
subset of 100 scans
TABLE II: Manual and automatic CAC scoring performance on 310 soft and 196 sharp reconstructions. Reference standard refers to the amount of calcium manually identified by the observers. The performance of a second and third observer on a subset of 100 scans is additionally reported. The performance of the automatic method is reported when the networks were trained on soft reconstructions and when they were trained on soft and sharp reconstructions.
Soft reconstructions Sharp reconstructions
Automatic Automatic
Reference i@ ii@ iii@ iv@ i@ ii@ iii@ iv@
i@
ii@
iii@
iv@
Trained on soft reconstructions only
Soft reconstructions Sharp reconstructions
Automatic Automatic
Reference i@ ii@ iii@ iv@ i@ ii@ iii@ iv@
i@
ii@
iii@
iv@
Trained on soft and sharp reconstructions
TABLE III: Agreement in risk categorization of subjects based on their total CAC Agatston score (i@: 0–10, ii@: 11–100, iii@: 101–1000, iv@: >1000) between manual reference standard and automatically determined scores. The agreement is reported separately for soft and sharp reconstructions. The table on the left side specifies the agreement when the automatic method was trained on soft reconstructions only and the table on the right side when the automatic method was trained on soft and sharp reconstructions.

V Experiments and Results

The image quality of 57 scans () was considered inadequate for manual annotation due to severe metal artifacts or excessive image noise. The remaining scans with manual reference standard were divided into subsets for training ( = 1012 scans), validation ( = 169 scans) and testing ( = 506 scans). These were scans of different participants as sometimes multiple reconstructions of the same baseline scan were included. The division into subsets was random, but all scans of the same participant were assigned to the same subset. Among the scans, were reconstructed with soft filters and with sharp filters. Among the scans with multiple annotations, were reconstructed with soft filters and with sharp filters.

The areas covered by the receptive fields of both networks are not necessarily comparable across scans due to different resolutions. We therefore resampled all scans in-plane with bilinear interpolation to

, which is the average resolution across the dataset. Resampling was only performed in-plane because the slice spacing was already standardized to before manual annotation (see Section II). Both networks CNN1 and CNN2 thus analyzed images at a standardized resolution. The predicted label maps were finally resampled to the original resolution using nearest neighbor interpolation.

The two networks CNN1 and CNN2 were trained sequentially. CNN1 was trained with high density voxels () in the training scans. CNN2 was trained with high density voxels classified as any type of calcification by CNN1. The validation set was used to ensure there was no substantial overfitting and to determine convergence of the networks. We trained both networks on balanced minibatches, which consisted half of randomly selected calcium voxels of any class and half of randomly selected background voxels. Both networks used exponential linear units[39]

as activation function. Adam

[40] was used as optimizer with a learning rate of

and the categorical cross-entropy as loss function. CNN

1 was trained on patches of pixels, i.e., larger than its receptive field, with and CNN2 on patches of pixels. During training of both networks, Dropout[41] (CNN1: , CNN2: ) and L2 weight decay (CNN1: , CNN2: ) were used for regularization. Since CNN2

has many more trainable parameters, we added batch normalization

[42] between all layers to provide additional regularization. For both networks, the output class was determined as the class with highest activation. CNN1 used and to weight the output layers, which corresponds to averaging between the normal and the auxiliary outputs.

The networks were implemented using the Theano

[43] and Lasagne[44] frameworks and trained on NVIDIA Titan X GPUs. The total computation time is 5–7 minutes, depending on the size of the image volume and the number of candidate objects in the image. With our non-optimized implementation, CNN1 needs on average minutes to scan an entire image, and CNN2 needs on average seconds to classify the detected candidate voxels.

V-a Detection of CAC

The performance of automatic CAC detection was evaluated based on scores per artery and per subject. Per artery and per subject sensitivities, average false positive volumes and F scores for CAC detection are listed in Table II. Examples of detected calcifications are shown in Figure 6.

In images reconstructed with soft filter kernels, the automatic method detected more than of the calcium in LAD and RCA, and in LCX. Training on soft and sharp reconstructions compared to only soft reconstructions led to similar performance for CAC with F scores of and . In images reconstructed with sharp filter kernels, the F score for CAC increased from to when sharp reconstructions were added to the training data.

Risk categories derived from per-subject CAC scores agreed with the manual reference annotation in images reconstructed with soft filter kernels in of the subjects when trained only on soft reconstructions and in when sharp reconstructions were added to the training data. In images reconstructed with sharp filter kernels, agreement increased from to when sharp reconstructions were added to the training data. Confusion matrices for the risk category assignment are shown in Table iii@.

Interobserver agreement was high for per-subject CAC in soft reconstructions with F scores of and for the second and third observer, respectively. Interobserver agreement was overall slightly lower in sharp reconstructions compared to soft reconstructions. For risk category assignment, was and for the second and third observer, respectively, in soft reconstructions and and in sharp reconstructions.

V-B Detection of TAC

Reconstruction filter Soft Sharp
Reference standard
Scans with any calcium ()
Calcium volume / scan ()
Second observer
Sensitivity ()
False positive volume / scan ()
F score calcium volume
Third observer
Sensitivity ()
False positive volume / scan ()
F score calcium volume
Trained on soft reconstructions only
Sensitivity ()
False positive volume / scan ()
F score calcium volume
Trained on soft and sharp reconstructions
Sensitivity ()
False positive volume / scan ()
F score calcium volume
subset of 100 scans
TABLE IV: Manual and automatic TAC scoring performance on 310 soft and 196 sharp reconstructions. Reference standard refers to the amount of calcium manually identified by the observers. The performance of a second and third observer on a subset of 100 scans is additionally reported. The performance of the automatic method is reported when the networks were trained on soft reconstructions and when they were trained on soft and sharp reconstructions.

The observers identified TAC in the majority of scans, in of (). The detection performance of the automatic method in terms of sensitivity, average false positive volume and F score is listed in Table IV. In images reconstructed with soft filter kernels, the automatic method achieved an F score of regardless whether only soft reconstructions or soft and sharp reconstructions were used for training. In images reconstructed with sharp filter kernels, the F score increased from to when sharp reconstructions were added to the training data. Interobserver agreement was high for TAC and, overall, the observers had higher sensitivity than the automatic method, but also higher average false positive volume.

V-C Detection of cardiac valve calcifications

Soft reconstructions Sharp reconstructions
Aortic valve Mitral valve Aortic valve Mitral valve
Reference standard
Scans with any calcium ()
Calcium volume / scan ()
Second observer
Sensitivity ()
False positive volume / scan ()
F score calcium volume
Third observer
Sensitivity ()
False positive volume / scan ()
F score calcium volume
Trained on soft reconstructions only
Sensitivity ()
False positive volume / scan ()
F score calcium volume
Trained on soft and sharp reconstructions
Sensitivity ()
False positive volume / scan ()
F score calcium volume
subset of 100 scans
TABLE V: Performance of manual and automatic scoring of aortic and mitral valve calcifications, reported separately for 310 soft and 196 sharp reconstructions. Reference standard refers to the amount of calcium manually identified by the observers. The performance of a second and third observer on a subset of 100 scans is additionally reported. The performance of the automatic method is reported when the networks were trained on soft reconstructions and when they were trained on soft and sharp reconstructions.

Cardiac valve calcifications were infrequently identified by the observers, aortic valve calcifications in 92 of 506 scans () and mitral valve calcifications in 58 of 506 scans (). The detection performance of the automatic method in terms of sensitivity, average false positive volume and F score is listed in Table V.

The automatic method achieved lower performance for detection of cardiac valve calcifications compared to detection of CAC and TAC. However, similar was that adding sharp reconstructions to the training data improved F scores in sharp reconstructions, from to for aortic valve calcifications and from to for mitral valve calcifications. Interobserver agreement in terms of F score was particularly low in images reconstructed with sharp filter kernels low with F scores of  /  and  /  for aortic and mitral valve calcifications, respectively.

V-D Single vs. two-stage performance

To evaluate the contribution of CNN2 to the overall performance, we compared the performance of CNN1 followed by CNN2 against CNN1 alone. Trained using only images with soft reconstruction kernels, CNN1 achieved a sensitivity of with an average false-positive volume of on soft reconstructions, and sensitivity with average false-positive volume on sharp reconstructions. Reclassification of the positive detections by CNN2 reduced the sensitivity by and , but at the same time reduced the average false-positive volume per scan by and in soft and sharp reconstructions, respectively.

When images with sharp reconstruction kernels were added to the training data, CNN1 achieved a sensitivity of at an average false-positive volume of in soft reconstructions. In sharp reconstructions, the sensitivity was at average false-positive volume. Reclassification by CNN2 reduced the sensitivity by and , but at the same time reduced the average false-positive volume per scan by and in soft and sharp reconstructions, respectively.

V-E Effect of receptive field size

To evaluate the influence of the size of the receptive field of CNN1 on the detection performance, we trained networks with various maximum dilation factors. A larger maximal dilation factor results in a larger receptive field and also a deeper network (Figure 3;[35]). For receptive field sizes of , , and , CNN1 achieved F scores of , , and , respectively, in images reconstructed with soft reconstruction kernels. In images with sharp reconstruction kernel, the F scores were , , and . The largest network had to be trained on smaller batches of samples due to hardware limitations (32 instead of 64), which made the network more difficult to train.

V-F Effect of auxiliary output layers and loss terms

We additionally evaluated whether the auxiliary softmax output layers of CNN1 with the corresponding auxiliary loss terms had a positive effect on training time and performance. We observed that the network learned slower with the auxiliary loss terms. However, the detection performance improved. The performance of CNN1 with and without auxiliary output layers and loss terms was compared using training images reconstructed with both soft and sharp reconstruction kernels. In images reconstructed with soft filter kernels, F scores for calcium detection (binary, disregarding the label) were without auxiliary outputs and with auxiliary outputs. In images reconstructed with sharp filter kernels, the F scores were without auxiliary outputs and with auxiliary outputs.

V-G Comparison with other methods

The number of other publications on automatic calcium scoring methods in low-dose chest CT is low. No other methods have been published that concurrently detected CAC, TAC and cardiac valve calcifications, and also no methods for detection of cardiac valve calcifications. Hence, the results of the proposed combined system can only be compared to the simpler tasks of detecting either only CAC or only TAC.

For automatic CAC scoring in low-dose chest CT, Išgum et al.[23] reported a sensitivity of at false positive volume per scan in scans. In the same scans, Lessmann et al.[25] achieved a sensitivity of at an average false positive volume of . However, the dataset on which these methods were tested was much less diverse than the dataset used in this paper. All scans were acquired in the same hospital with CT scanners from a single vendor and reconstructed using a single soft filter kernel. Furthermore, the average CAC burden per subject was considerably lower ( vs. ). We therefore evaluated the better performing method[25] on the current test data after retraining with the current training data. The method described in [25] classifies individual voxels, but originally the average posterior probabilities across connected voxels was calculated to classify lesions rather than individual voxels. This averaging was now omitted to allow for a comparison on voxel-level. The best performance was achieved on soft reconstructions, using both soft and sharp reconstructions for training: the average F score per scan for CAC detection was ( CI: ), mostly attributed to many false positive detections. In comparison, the proposed method achieved an average F score of ( CI: ) on the same test data using the same training data. This difference was statistically significant (

, paired samples t-test).

Other methods for automatic CAC scoring were only evaluated in terms of their correlations with manual scores: Xie et al.[24]

performed linear regression with CAC Agatston scores in

scans and reported . González et al.[26] reported a Pearson correlation coefficient of for CAC Agatston scores in scans.

For automatic TAC scoring in low-dose chest CT, Išgum et al.[32] reported a sensitivity of at false positive volume per scan in scans. Kurugol et al.[30] reported a sensitivity of and a positive predictive value of for TAC volume in scans. In comparison, we reported here a sensitivity of at an average false positive volume of and a positive predictive value of in scans reconstructed with soft filter kernels. Xie et al.[31] reported only the correlation with manual scores as after performing linear regression for TAC volume in scans.

Fig. 6: Example cases overlayed with color coded automatic detection results. The color scheme is as follows: red=LAD including LM, green=LCX, blue=RCA, yellow=TAC, purple=aortic valve calcification, orange=mitral valve calcification. Examples of (a)–(f) correctly detected and labeled calcifications, (g) and (h) correctly labeled calcifications of which a few voxels were missed, (i) an artificial aortic valve that was labeled as calcium, (j) a metal artifact next to the aortic wall that was partially labeled as calcium, (k) a false positive detection in proximity of the aorta, (l) a correctly detected calcification in LCX that was incorrectly labeled as LAD.

V-H Voxel-level vs. lesion-level annotation

The manual reference annotation was performed voxel-by-voxel to enable annotation of scans with poor image quality. To assess how much voxel-level annotation differs from the standard lesion-level annotation, we converted the voxel-level annotations into lesion-level annotations using 3D region growing with the standard calcium threshold (). Lesions were labeled using majority voting if they contained voxels with different labels. Scans with a more than five times increase in calcium volume were excluded. These were of the scans with soft reconstruction kernel and of the scans with sharp reconstruction kernel. In the remaining scans, the overall Agatston score increased on average by 85 in soft reconstructions and by 155 in sharp reconstructions. This clearly indicates that in low-dose CT scans, lesion-level annotation leads to an overestimation of the calcium score.

Vi Discussion

We proposed a method for automatic detection of CAC subdivided into LAD, LCX and RCA calcifications, TAC and cardiac valve calcifications in low-dose chest CT. The method is the first that detects these calcifications concurrently. The approach is based on two consecutive CNNs: The first CNN uses stacked dilated convolutions to facilitate a large receptive field, which enables identification and spatial labeling of high density voxels. The second CNN discards false positive detections of the first CNN.

For CAC and TAC detection, the method achieved a performance close to the level of interobserver agreement. The method was furthermore able to separate calcifications in the coronary arteries into LAD, LCX and RCA calcifications (Figure 6 (f)). The method as well as the observers were more successful in identification of LAD and RCA calcifications than LCX calcifications. The course of LCX is particularly difficult to follow in non-contrast scans. Hence, LCX calcifications can be difficult to differentiate from LM and LAD calcifications (Figure 6 (l)), as well as from those in the mitral valves. In comparison to CAC and TAC, calcifications of the aortic and mitral valves were less common in our dataset. Performance of the automatic method was below the performance of CAC and TAC detection. However, this is also a difficult task for experts. The observers especially disagreed on mitral valve calcifications, which is in line with findings of previous studies [45]. The disagreement is mainly caused by confusion with LCX calcifications and the lack of soft tissue contrast in the mitral valve region. For the aortic valve, confusion with TAC was the most common cause of disagreement.

False positive detections were mostly caused by mislabeling of calcifications with respect to their location (e.g., LAD and LCX), low-dose and motion artifacts, and other calcifications such as calcified lymph nodes or calcifications in other vessels (Figure 6 (i)–(k)). False positive detections outside the heart and the aorta occurred infrequently and usually in proximity to the heart or the aorta. This demonstrates that CNN1 was able to implicitly learn to recognize the typical spatial context of calcifications in the image. The individual evaluation of CNN1 additionally showed that CNN2 substantially contributes to reducing false-positive detections while maintaining a high sensitivity. However, future work could aim at unifying the two networks into a single network.

False negative detections were sometimes partially misclassified lesions (Figure 6 (h)–(i)). Partial misclassification can occur because the method performs voxel classification rather than the standardly used lesion classification. Even though voxel labeling occasionally causes partial misclassification of calcifications, it enables splitting of calcifications that are contained in more than one arterial bed, such as those partly located in the aorta and partly in the coronary arteries (Figure 6 (f)). Assigning a calcification that is partially contained in the aorta to the coronary artery could affect cardiovascular risk categorization. Similarly, assigning LM calcifications to the aorta would result in missing high risk lesions. To the best of our knowledge, this is the first method enabling splitting of the calcifications according to their arterial bed.

Other methods for calcification detection often first detect a region of interest in the image. The proposed method is able to omit this step and instead searches the entire image for calcifications without the need for any preprocessing steps. Moreover, the method does not require explicit spatial features, even though these have been reported in the literature as crucial for automatic calcium scoring. The results demonstrate that a CNN with dilated convolutions is able to recognize the spatial context in three orthogonal 2D patches.

In contrast, other commonly used network architectures have various shortcomings that make them less suited for calcium scoring in low-dose chest CT. For example, U-net[46] is not well suited for sparse problems because it can process 3D volumes only in smaller tiles, of which most would not contain any calcium voxels. Residual networks[47] and the similar Densely connected networks[48] use pooling to increase their receptive field, which is not compatible with the idea of purely convolutional networks. Classifying all voxels in a typical 3D chest CT volume with these networks would be inefficient and time-consuming. HoughNets[49]

are based on the idea of enforcing learned shape priors, which is useful in segmentation tasks of structures with relatively homogeneous shape. However, the shape of calcifications is rather heterogeneous, especially if scans are distorted by cardiac motion. Spatial transformer networks

[50] address alignment issues, but chest CT scans are fairly standardized. The scanned subject typically lies on the back and the FOV of the reconstructed image is standardly configured using the outer body contour or the ribs, and the apex/base of the lungs as landmarks.

A particular strength of this paper is the large, diverse and realistic dataset of low-dose chest CT scans from the NLST that we used for training and evaluation. Even though this data is challenging due to low radiation dose, the lack of ECG-synchronization and the high diversity of image acquisition parameters, the automatic method achieved good detection performance and high agreement in risk categorization. The separate evaluation in images reconstructed with soft and sharp filter kernels additionally demonstrates that the performance on soft reconstructions does not suffer when sharp reconstructions are added to the training data. This indicates that the networks were able to generalize to both types of reconstructions.

The high reliability of the risk categorization indicates that this method can be used for cardiovascular risk assessment in lung cancer screening. While standardized risk categories are defined for CAC scores, TAC and cardiac valve calcium scores are currently not commonly used for cardiovascular risk assessment. Automatic scoring of these calcifications enables evaluation of their predictive value using available large datasets from lung screening trials, or other screening trials with CT imaging visualizing the heart.

Acknowledgment

We are grateful to the United States National Cancer Institute (NCI) for providing access to NCI’s data collected by the National Lung Screening Trial. The statements contained in this paper are solely ours and do not represent or imply concurrence or endorsement by NCI. To enable other researchers to request the same data and perform a comparison with the here presented results, NCI has been provided with the list of scans used in this study.

References

  • [1] The National Lung Screening Trial Research Team, “Reduced lung-cancer mortality with low-dose computed tomographic screening,” New England Journal of Medicine, vol. 365, pp. 395–409, 2011.
  • [2] S. Vollset, A. Tverdal, and H. Gjessing, “Smoking and deaths between 40 and 70 years of age in women and men,” Annals of Internal Medicine, vol. 144, pp. 381–389, 2006.
  • [3] H. S. Hecht, “Coronary artery calcium scanning: Past, present, and future,” JACC: Cardiovascular Imaging, vol. 8, pp. 579–596, 2015.
  • [4] J. Shemesh, C. I. Henschke, D. Shaham, R. Yip, A. O. Farooqi, M. D. Cham, D. I. McCauley, M. Chen, J. P. Smith, D. M. Libby, M. W. Pasmantier, and D. F. Yankelevitz, “Ordinal scoring of coronary artery calcifications on low-dose CT scans of the chest is predictive of death from cardiovascular disease,” Radiology, vol. 257, pp. 541–548, 2010.
  • [5] P. C. Jacobs, M. J. A. Gondrie, Y. van der Graaf, H. J. de Koning, I. Isgum, B. van Ginneken, and W. P. Th. M. Mali, “Coronary artery calcium can predict all-cause mortality and cardiovascular events on low-dose CT screening for lung cancer,” American Journal of Roentgenology, vol. 198, pp. 505–511, 2012.
  • [6] C. Chiles, F. Duan, G. W. Gladish, J. G. Ravenel, S. G. Baginski, B. S. Snyder, S. DeMello, S. S. Desjardins, and R. F. Munden, “Association of coronary artery calcification and mortality in the National Lung Screening Trial: A comparison of three scoring methods,” Radiology, vol. 276, pp. 82–90, 2015.
  • [7] O. M. Mets, R. Vliegenthart, M. J. Gondrie, M. A. Viergever, M. Oudkerk, H. J. de Koning, W. P. Mali, M. Prokop, R. J. van Klaveren, Y. van der Graaf, C. F. Buckens, P. Zanen, J.-W. J. Lammers, H. J. Groen, I. Isgum, and P. A. de Jong, “Lung cancer screening CT-based prediction of cardiovascular events,” JACC: Cardiovascular Imaging, vol. 6, pp. 899–907, 2013.
  • [8] H. S. Hecht, C. Henschke, D. Yankelevitz, V. Fuster, and J. Narula, “Combined detection of coronary artery disease and lung cancer,” European Heart Journal, vol. 35, pp. 2792–96, 2014.
  • [9] I. Išgum, A. Rutten, M. Prokop, and B. van Ginneken, “Detection of coronary calcifications from computed tomography scans for automated risk assessment of coronary artery disease,” Medical Physics, vol. 34, pp. 1450–61, 2007.
  • [10] U. Kurkure, D. R. Chittajallu, G. Brunner, Y. H. Le, and I. A. Kakadiaris, “A supervised classification-based method for coronary calcium detection in non-contrast CT,” International Journal of Cardiovascular Imaging, vol. 26, pp. 817–828, 2010.
  • [11] G. Brunner, D. R. Chittajallu, U. Kurkure, and I. A. Kakadiaris, “Toward the automatic detection of coronary artery calcification in non-contrast computed tomography data,” International Journal of Cardiovascular Imaging, vol. 26, pp. 829–838, 2010.
  • [12] J. Wu, G. Ferns, J. Giles, and E. Lewis, “A fully automated multi-modal computer aided diagnosis approach to coronary calcium scoring of msct images,” in SPIE Medical Imaging, vol. 8314, 2012, p. 83142Y.
  • [13] X. Ding, P. J. Slomka, M. Diaz-Zamudio, G. Germano, D. S. Berman, D. Terzopoulos, and D. Dey, “Automated coronary artery calcium scoring from non-contrast CT using a patient-specific algorithm,” in SPIE Medical Imaging, vol. 9413, 2015, p. 94132U.
  • [14] R. Shahzad, T. van Walsum, M. Schaap, A. Rossi, S. Klein, A. C. Weustink, P. J. de Feyter, L. J. van Vliet, and W. J. Niessen, “Vessel specific coronary artery calcium scoring: an automatic system,” Academic Radiology, vol. 20, pp. 1–9, 2013.
  • [15] J. M. Wolterink, T. Leiner, R. A. P. Takx, M. A. Viergever, and I. Isgum, “Automatic coronary calcium scoring in non-contrast-enhanced ECG-triggered cardiac CT with ambiguity detection,” IEEE Transactions on Medical Imaging, vol. 34, pp. 1867–78, 2015.
  • [16] D. Dey, V. Y. Cheng, P. J. Slomka, R. Nakazato, A. Ramesh, S. Gurudevan, G. Germano, and D. S. Berman, “Automated 3-dimensional quantification of noncalcified and calcified coronary plaque from coronary CT angiography,” Journal of Cardiovascular Computer Tomography, vol. 3, pp. 372–382, 2009.
  • [17] S. Wesarg, M. F. Khan, and E. A. Firle, “Localizing calcifications in cardiac CT data sets using a new vessel segmentation approach,” Journal of Digital Imaging, vol. 19, pp. 249–257, 2006.
  • [18] D. Eilot and R. Goldenberg, “Fully automatic model-based calcium segmentation and scoring in coronary CT angiography,” International Journal of Computer Assisted Radiology and Surgery, vol. 9, pp. 595–608, 2014.
  • [19] W. Ahmed, M. A. de Graaf, A. Broersen, P. H. Kitslaar, E. Oost, J. Dijkstra, J. J. Bax, J. H. Reiber, and A. J. Scholte, “Automatic detection and quantification of the agatston coronary artery calcium score on contrast computed tomography angiography,” International Journal of Cardiovascular Imaging, vol. 31, pp. 151–161, 2015.
  • [20] S. Mittal, Y. Zheng, B. Georgescu, F. Vega-Higuera, S. K. Zhou, P. Meer, and D. Comaniciu, “Fast automatic detection of calcified coronary lesions in 3d cardiac CT images,” in International Workshop on Machine Learning in Medical Imaging, ser. LNCS.   Springer, 2010, vol. 6357, pp. 1–9.
  • [21] J. M. Wolterink, T. Leiner, M. A. Viergever, and I. Išgum, “Automatic coronary calcium scoring in cardiac CT angiography using convolutional neural networks,” in MICCAI, ser. LNCS.   Springer, 2015, vol. 9349, pp. 589–596.
  • [22] J. M. Wolterink, T. Leiner, B. D. de Vos, R. W. van Hamersvelt, M. A. Viergever, and I. Išgum, “Automatic coronary artery calcium scoring in cardiac CT angiography using paired convolutional neural networks,” Medical Image Analysis, vol. 34, pp. 123–136, 2016.
  • [23] I. Išgum, M. Prokop, M. Niemeijer, M. A. Viergever, and B. van Ginneken, “Automatic coronary calcium scoring in low-dose chest computed tomography,” IEEE Transactions on Medical Imaging, vol. 31, pp. 2322–34, 2012.
  • [24] Y. Xie, M. D. Cham, C. Henschke, D. Yankelevitz, and A. P. Reeves, “Automated coronary artery calcification detection on low-dose chest CT images,” in SPIE Medical Imaging, vol. 9035, 2014, p. 90350F.
  • [25] N. Lessmann, I. Išgum, A. A. A. Setio, B. D. de Vos, F. Ciompi, P. A. de Jong, M. Oudkerk, W. P. Th. M. Mali, M. A. Viergever, and B. van Ginneken, “Deep convolutional neural networks for automatic coronary calcium scoring in a screening study with low-dose chest CT,” in SPIE Medical Imaging, vol. 9785, 2016, p. 978511.
  • [26] G. González, G. R. Washko, and R. S. J. Estépar, “Automated agatston score computation in a large dataset of non ECG-gated chest computed tomography,” in IEEE 13th International Symposium on Biomedical Imaging (ISBI), 2016, pp. 53–57.
  • [27] G. H. Tison et al., “Multisite extracoronary calcification indicates increased risk of coronary heart disease and all-cause mortality: The Multi-Ethnic Study of Atherosclerosis,” Journal of Cardiovascular Computer Tomography, vol. 9, pp. 406–414, 2015.
  • [28] P. C. Jacobs et al., “Comparing coronary artery calcium and thoracic aorta calcium for prediction of all-cause mortality and cardiovascular events on low-dose non-gated computed tomography in a high-risk population of heavy smokers,” Atherosclerosis, vol. 209, pp. 455–462, 2010.
  • [29] M. J. Willemink et al., “Prognostic value of heart valve calcifications for cardiovascular events in a lung cancer screening population,” International Journal of Cardiovascular Imaging, vol. 31, pp. 1243–49, 2015.
  • [30] S. Kurugol, R. S. J. Estépar, J. Ross, and G. R. Washko, “Aorta segmentation with a 3D level set approach and quantification of aortic calcifications in non-contrast chest CT,” in 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2012, pp. 2343–46.
  • [31] Y. Xie, Y. M. Htwe, J. Padgett, C. Henschke, D. Yankelevitz, and A. P. Reeves, “Automated aortic calcification detection in low-dose chest CT images,” in SPIE Medical Imaging, 2014, p. 90350P.
  • [32] I. Išgum, A. Rutten, M. Prokop, M. Staring, S. Klein, J. P. Pluim, M. A. Viergever, and B. van Ginneken, “Automated aortic calcium scoring on low-dose chest computed tomography,” Medical Physics, vol. 37, pp. 714–723, 2010.
  • [33] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in

    Conference on Computer Vision and Pattern Recognition

    , 2015, pp. 3431–40.
  • [34] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, “Striving for simplicity: The all convolutional net,” in Proceedings of ICLR, 2015.
  • [35] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” in Proceedings of ICLR, 2016.
  • [36]

    A. Prasoon, K. Petersen, C. Igel, F. Lauze, E. Dam, and M. Nielsen, “Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network,” in

    MICCAI, ser. LNCS.   Springer, 2013, vol. 8150, pp. 246–253.
  • [37] C.-Y. Lee, S. Xie, P. W. Gallagher, Z. Zhang, and Z. Tu, “Deeply-supervised nets,” in

    Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS)

    , 2015.
  • [38] B. Ohnesorge et al., “Reproducibility of coronary calcium quantification in repeat examinations with retrospectively ECG-gated multisection spiral CT,” European Radiology, vol. 12, pp. 1532–40, 2002.
  • [39] D. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (ELUs),” in Proceedings of ICLR, 2016.
  • [40] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv:1412.6980, 2014.
  • [41] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, pp. 1929–58, 2014.
  • [42] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on Machine Learning, 2015.
  • [43] Theano Development Team, “Theano: A Python framework for fast computation of mathematical expressions,” arXiv:1605.02688, 2016.
  • [44] S. Dieleman, J. Schlüter, C. Raffel, E. Olson, S. K. Sønderby, D. Nouri et al., “Lasagne: First release.” Aug. 2015. [Online]. Available: http://dx.doi.org/10.5281/zenodo.27878
  • [45] R. W. van Hamersvelt et al., “Cardiac valve calcifications on low-dose unenhanced ungated chest computed tomography: inter-observer and inter-examination reliability, agreement and variability,” European Radiology, vol. 24, pp. 1557–64, 2014.
  • [46] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in MICCAI, ser. LNCS.   Springer, 2015, vol. 9351, pp. 234–241.
  • [47] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv:1512.03385, 2015.
  • [48] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, “Densely connected convolutional networks,” in Conference on Computer Vision and Pattern Recognition, 2017.
  • [49] F. Milletari et al.

    , “Hough-CNN: Deep learning for segmentation of deep brain regions in MRI and ultrasound,”

    Computer Vision and Image Understanding, 2017, epub ahead of print.
  • [50] M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer networks,” in Advances in Neural Information Processing Systems, vol. 28, 2015.