Death rates attributed to lung cancer are three times higher than for any other cancer in the United States . Diagnosis of this pathology is informed by the presence of malignant pulmonary nodules that appear in thoracic computed tomography (CT) images . There is a current trend toward regular monitoring programs of high-risk groups using methods such as low-dose CT . This has been proposed to help catch the pathology in its early stages where, in developed countries, diagnosis dramatically increases the 5-year patient survival rate by 63-75% . It is likely that radiologists who are tasked with locating and classifying pulmonary nodules would see a dramatic increase in workload with the saturation of such protocols. Fast and accurate automated lung nodule detection methods would then improve lung image evaluation throughput and objectivity by assisting radiologists in their assessment.
One of the major challenges in designing effective automated lung nodule detection methods is the massively unbalanced nature of the data. For example, over the entire Lung Image Database Consortium image collection (LIDC-IDRI) [2, 3, 5]
less than 1% of image voxels contain positive nodule examples. The class imbalance problem has received wide attention in the machine learning and data mining communities, where typical solutions include class over- and under-sampling, weighted losses, and posterior probability recalibration. Sampling schemes have been studied in medical imaging classification (e.g.  and references therein) and segmentation 
, whereas loss function adjustments were key to results in. In Computer-Aided Detection (CADe) applications, specialized knowledge can be used, such as limiting the domain of detection to the lung only (requiring a lung masking model) , or training a highly sensitive candidate nodule screening model and then refining predictions by cascading false positive reduction stages [19, 13]. A common theme across these approaches is that they tend to be problem-dependent, and sizable efforts must often be expended to find the balancing technique yielding the best performance.
This paper proposes a generic approach to tackle class imbalance, by using, during training, an online adaptation of the distribution of majority and minority class examples, in the spirit of curriculum learning . The Curriculum Adaptive Sampling for Extreme Data imbalance (CASED) is a novel sampling curriculum that allows for a 3D fully convolutional network (FCN) to yield segmentations high enough in quality to make detection a mere consequence. In contrast to approaches where an off-the-shelf segmentation model  or FCN  is trained to only provide candidates to a second, independently-trained convolutional neural network (CNN) for classification, CASED combines curriculum learning and adaptive data sampling in a way that makes the second classifier redundant. This is achieved by allowing the FCN to first learn how to distinguish nodules from their immediate surroundings while continuously introducing training examples that the model has trouble classifying. This approach yields a surprisingly minimalist proposal to the lung nodule detection problem that tops the LUNA16 challenge  leader-board with a score of 88.35%. Furthermore, weakly-supervised training, with only a point and radius provided for each training nodule, yields results competitive with those of full segmentation.
CASED adheres to the observation that the solution to object detection is fully contained in the solution to object segmentation. That is, given an ideal segmentation, a determination of the location, extent, and identity of an imaged object becomes trivial. However, training a model to yield even acceptable medical image segmentations is a considerably harder task than detection for two main reasons. First, manual segmentation of training data is a laborious and expensive endeavour. And second, the model must be able to describe the complex variations of texture ranging over the extent of a given object and its surroundings. Fortunately, the first problem is less significant here as large datasets of annotated lung CT scans are available ; however, robustness to weakly labeled data is important. Regarding the second problem, recent work on FCNs (e.g. FCN-8s for natural images , U-Net for biomedical images ) has shown that their ability to model multi-scale context over finite image regions makes them ideal candidates for medical image segmentation problems. It behooves one to ask then, in the context of lung nodule detection, why has it not yet been shown that FCNs alone are a competitive solution to this problem? We hypothesize the answer lies in the extreme data imbalance associated to the problem, which has not yet been sufficiently addressed. In the following we present CASED as an approach to overcome this issue.
One of the more attractive properties of FCNs is their ability to handle images of arbitrary size. This feature allows us to reduce data imbalance by training on small image patches where the output stride of the model contains at least one positive nodule voxel. As one would start teaching a child to read the alphabet by restricting their gaze to a large letter A, the model first learns how to represent nodules given only their immediate surroundings. An important consequence of training the FCN on image patches is that we are able to randomize training examples across both patient images and also image regions. Training only on patches that contain nodule examples will result in an extremely sensitive model but with low specificity because it would not learn how to represent the majority of the input image space. Therefore, a curriculum is introduced where the proportion of training patches that contain nodules to those that do not is decreased according to a schedule that tends toward the data distribution as the number of training examples seen approaches infinity.
After training the FCN using this curriculum with random sampling of background patches, it generally converges to a solution that still gives systematic and predictable false positives. Furthermore, the vast majority of voxels in typical lung images are correctly and confidently predicted as non-nodule, so random sampling would be far more likely to show examples that would have little to no effect on loss optimization. Hence, we introduce a sampling strategy that favours training examples for which prediction using recent model parameters produces false results, an instance of hard negative mining (HNM) .
Figure 1 shows a flowchart of the CASED framework. Let be a training set of patches. Patch generators are shown in red boxes. The generators and represent distributions over the set of all patches and the set of patches that contain nodules, respectively. FCN models are shown in blue boxes where the training model shares its weights with a predictor that is run in parallel for the purposes of HNM. The green boxes represent samplers with distributions that vary with the mini-batch iteration . The sampler selects patches based on both and the training loss . The function specifying must be on the range and as . The sampler defines the curriculum and chooses between and according to a mixing that depends on . The mixing coefficient is specified by with range and convergence to as . The distribution governing the sampler is given by
where if contains a nodule, and otherwise. In the limit, as goes to infinity,
converges to a uniform distribution over, which makes CASED a valid curriculum .
3 Data and Implementation
We study CASED as applied to the task of lung nodule detection using the publicly available LIDC image collection [2, 3, 5]. The LIDC contains 1010 patients and a total of 1018 clinical thoracic CT scans. Each scan has been analyzed through a two-phase nodule annotation process by four expert radiologists. In the first phase each radiologist independently marks nodules as belonging to one of three classes (nodule 3mm, nodule 3mm, and non-nodule 3mm), where the measurement refers a nodule’s diameter. In the second phase, each expert can refine their annotations after seeing the anonymous annotations of the other three radiologists. The LIDC contains 2635 nodules annotated in this way and there are 142 cases that either contain no detected nodules or nodule 3mm.
illustrates the model used. The model is comprised of three distinct components: (1) downstream feature extraction path, (2) upstream feature pooling path, and (3) linear pixel classifier. In the downstream path, we use layers of “convolution” and “pooling”. Each layer effectively encodes a progressively larger image neighbourhood of the input image as we go deeper. In the upstream path, we use layers of “convolution” and “strided transposed convolution” layers. Multi-scale features extracted in the downstream path are combined to provide pixel-level features in the input image space. Finally, the linear pixel classifier uses a simple “sigmoid” layer to provide per-pixel prediction of nodule or non-nodule.
CASED training requires minimal data preprocessing. For a given CT scan, image intensities are transformed to Hounsfield units and linearly rescaled. The scan is then resized to 1.25mm isotropic voxels. For training, binary segmentation maps are built from the expert annotations listed in the provided XML files and are also transformed into the 1.25mm isotropic space. The binary segmentation maps are nodule-wise refined to only label as nodule those voxels that correspond to the intersection of all available annotations. For example, if a nodule only has an annotation from one rater, that annotation is used; however, if a nodule has annotations from multiple raters the intersection of those annotations is used.
Training is done by optimizing voxel-wise binary cross-entropy over each prediction patch (of sizeas input. Nodule patches are defined as those for which there is a labeled nodule voxel within the output stride. All other patches are called background. The curriculum is initialized with and is decayed after each mini-batch iteration. Finally, “background” patches are sampled based on whether they contain a false positive prediction using recent model parameters.
At test time, an equally minimalist approach to postprocessing is required. Given a test image, the model outputs a soft segmentation map estimating the probability that a given voxel belongs to the nodule class. This map is thresholded giving a binary segmentation on which connected component analysis is performed to yield candidate nodules. The center of mass and average value of the segmentation map over each candidate is found to yield a list of point and confidence predictions. The points are finally transformed back into the native image space. Because the model is fully convolutional, the input size at test need only be divisible by 8. Given sufficient GPU memory the entire CT scan can be passed as input without tiling and full prediction takes only a few seconds.
4 Experiments and Results
We evaluate the CASED framework using the 2016 Lung Nodule Analysis Challenge (LUNA16) 10-fold cross-validation split . Each fold contains 88-89 CT scans. The reference standard for LUNA16 consists of all nodule 3mm that have been detected by at least three of four raters. Evaluation is based on the detection sensitivity at various false positive rates per scan. A detailed explanation of the evaluation can be found on the LUNA16 website .
|False Positives Per Scan|
|ETROCAD [18, 15]||0.250||0.522||0.651||0.752||0.811||0.856||0.887||0.676|
|M5LCAD [11, 15]||0.306||0.360||0.540||0.691||0.762||0.797||0.798||0.608|
For each test fold, we train on eight and validate on one of the remaining folds. We also use model ensembling to improve the reliability of the results. Finally, we repeat the experiment using spherical segmentations defined by the location and radius of each nodule instead of the reference annotations (CASED-Sphere).
Table 1 summarizes the results of these experiments for the lung nodule detection task and provides a comparison to the results of other methods submitted to the LUNA16 leader board. The CASED learning framework shows a 8.9% relative increase in average sensitivity over the best published results for a given model, ZNET . The free-response receiver operating characteristic (FROC) curve for CASED appears in Figure 3. Finally, we demonstrate robustness to segmentation quality by showing that a 3.8% relative increase over ZNET is achieved with CASED-Sphere.
This paper proposes CASED, a new curriculum sampling algorithm for the highly class imbalanced problems that are endemic in medical imaging applications. We demonstrate that CASED is a robust learning framework for training deep lung nodule detection models. Evaluated on the LUNA16 challenge, we achieve the current state-of-the-art leader-board performance with an average sensitivity score of 88.35%. Since the CASED algorithm does not require any assumption on image modality, it can be applied to any arbitrarily large dataset wherein the unbalanced nature of data poses major problems for designing automated methods.
-  Lung nodule analysis 2016. https://luna16.grand-challenge.org, [accessed 22/02/2017]
-  Armato, S.G., McLennan, G., et al.: The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT Scans. Medical Physics 38(2), 915–931 (2011), http://dx.doi.org/10.1118/1.3528204
-  Armato, S.G., McLennan, G., et al.: Data From LIDC-IDRI. The Cancer Imaging Archive. (2015), http://doi.org/10.7937/K9/TCIA.2015.LO9QL9SX
-  Bengio, Y., Louradour, J., et al.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning. pp. 41–48. ACM (2009)
-  Clark, K., Vendt, B., et al.: The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. Journal of Digital Imaging 26(6), 1045–1057 (2013), http://dx.doi.org/10.1007/s10278-013-9622-7
-  Diederich, S., Lentschig, M., et al.: Detection of pulmonary nodules at spiral CT: comparison of maximum intensity projection sliding slabs and single-image reporting. European Radiology 11(8), 1345–1350 (2001), http://dx.doi.org/10.1007/s003300000787
-  Dubey, R., Zhou, J., et al.: Analysis of sampling techniques for imbalanced data: An ADNI study. Neuroimage 87, 220–241 (Feb 2014)
-  Havaei, M., Davy, A., et al.: Brain tumor segmentation with deep neural networks. Medical Image Analysis 35, 18–31 (2017), http://www.sciencedirect.com/science/article/pii/S1361841516300330
-  He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (Sept 2009)
-  Lopez Torres, E., Fiorina, E., et al.: Large scale validation of the M5L lung CAD on heterogeneous CT datasets. Medical Physics 42(4), 1477–1489 (2015), http://dx.doi.org/10.1118/1.4907970
-  Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation, pp. 234–241. Springer International Publishing, Cham (2015), http://dx.doi.org/10.1007/978-3-319-24574-4_28
-  Setio, A.A.A., Jacobs, C., et al.: Automatic detection of large pulmonary solid nodules in thoracic CT images. Medical Physics 42(10), 5642–5653 (2015), http://dx.doi.org/10.1118/1.4929562
-  Setio, A.A.A., Ciompi, F., et al.: Pulmonary nodule detection in CT images: false positive reduction using multi-view convolutional networks. IEEE transactions on medical imaging 35(5), 1160–1169 (2016)
-  Setio, A.A.A., Traverso, A., et al.: Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the luna16 challenge. arXiv preprint arXiv:1612.08012 (2016)
-  Siegel, R., Naishadham, D., Jemal, A.: Cancer statistics, 2013. CA: A Cancer Journal for Clinicians 63(1), 11–30 (2013), http://dx.doi.org/10.3322/caac.21166
Sung, K.K., Poggio, T.: Example-based learning for view-based human face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 39–51 (Jan 1998),http://dx.doi.org/10.1109/34.655648
-  Tan, M., Deklerck, R., et al.: A novel computer-aided lung nodule detection system for CT images. Medical Physics 38(10), 5630–5645 (2011), http://dx.doi.org/10.1118/1.3633941
-  Valente, I.R.S., Cortez, P.C., et al.: Automatic 3D pulmonary nodule detection in CT images: a survey. Computer methods and programs in biomedicine 124, 91–107 (2016)