One of the most significant obstacles to the development of deep learning-based computer-aided diagnosis (CAD) platforms in radiology is the need for large, annotated medical image datasets. Particularly in the case of 3D imaging modalities, such as computed tomography (CT), it is often prohibitively onerous for radiologists to provide sufficient manual annotations for the training of deep models. Therefore, training a model using a large data set of annotated samples is practically unfeasible. One such domain is the detection of emphysema, a disease associated with shortness of breath and elevated cancer risk. Emphysema often manifests as ruptured air sacs within only a portion of the lung volume. Its variety of presentations on CT presents a challenge to training a model to detect emphysema from volumetric imaging data with binary diagnostic labels alone.
A commonly utilized approach to enable learning with the absence of precise labels is multiple instance learning (MIL). In MIL, sets of samples are grouped into labeled bags, wherein a positive label signifies the presence of positive samples within a bag. Previous work has successfully leveraged an MIL framework for detection of emphysema and a variety of other lung diseases on CT. MIL using a hand-crafted feature-based classifier to evaluate a number of 2D patches from the lung has been shown to identify emphysemaemph1 ; emph2 and other lung diseases copd . More recently, Bortsova et al. miccai reported success in grading emphysema by summarizing the output of a convolutional neural network (CNN) over a number of 2D patches using a proportional approach similar to MIL.
A disadvantage of MIL-based approaches is that they fail to retain relationships between samples. For example, while effective at summarizing information from a number of samples, MIL does not retain the spatial relationship between samples drawn from an image. Furthermore, the efficacy of MIL is dependent on the pooling approach utilized to summarize predictions across the bag: a parameter which can substantially influence the instances in which a model will succeed or fail. For instance, a max pooling-based approach considers the single sample most strongly associated with disease without incorporating any information from the bag’s other samples. Meanwhile, a mean pooling of predictions within a bag might miss a disease diagnosis present in only a few samples.
Recurrent neural networks, such as long short term memory (LSTM), excel in identifying relationships between correlated samples, such as in pattern recognition across time series data. Convolutional long short term memory (Conv-LSTM) convlstm extends this capability to spatial data by making the operations of a LSTM convolutional. Conv-LSTM has been highly effective in characterizing changes in image patterns over time, such as video classification classify and gesture recognition gesture . Rather than identifying spatiotemporal patterns from time series image data, we propose the use of Conv-LSTM to “scan” through an imaging volume for disease presence without the need for expert annotations of diseased regions. In contrast to an MIL-based approach, our framework allows the detection of emphysema-associated image patterns on and between slices as it processes through the image volume. The network stores emphysema-associated image patterns across multiple bidirectional passes through a volume, and outputs a final set of features characterizing the entire volume without requiring a potentially reductive bag pooling operation. Our approach can make efficient use of weak, but readily accessible image labels (e.g. binary diagnosis of emphysema positive or negative) for abnormality detection within image volumes.
2.1 Dataset and processing
A total of 8794 non-contrast CT volumes from 6648 unique participants enrolled in the National Lung Screening Trial (NLST) nlst_nejm ; nlst_rad were utilized. We used 3807 CT volumes from 2789 participants who were diagnosed with emphysema across the 3 years of study as positive samples and 4987 CT volumes from 3859 participants who were not diagnosed with emphysema in any of the 3 years as negative samples. 75% of these scans, with balanced distribution of emphysema positive and negative patients, were utilized for model training. 4197 volumes from 3166 patients were used to directly learn model parameters, while 2434 volumes from 1319 patients were used to tune hyper-parameters and evaluate performance in order to choose the best-performing model. The remaining 2163 volumes (578 emphysema positive, 1585 emphysema negative), each from a unique patient, were held out for independent testing. Volumes were resized to 128x128x35, corresponding to an average slice spacing of 9 mm.
2.2 Convolutional Long Short Term Memory (LSTM)
The architecture is comprised of four units each including convolution operations applied separately to each individual slice and a conv-LSTM to process the volume slice-by-slice. Two 3x3 convolutional layers with batch normalization are followed by max-pooling. The output from the convolutional layers for each slice are then processed sequentially in forward or reverse order by the conv-LSTM layer, which outputs a set of features obtained through convolutional operations with the input slice as well as previous slices within the volume. All layers within a unit share the same number of filters and process the volume in either ascending or descending order. The four convolutional units have the following dimensionality and directionality: Ascending 1: 32 filters, Descending 1: 32 filters, Ascending 2: 64 filters, Descending 2: 64 filters. The final Conv-LSTM layer outputs a single set of features, which thus summarizes the network’s findings after processing through the full imaging volume multiple times. A fully-connected layer with sigmoid activation then computes probability of emphysema. The network, depicted in Fig.1
, comprises a total of 901,000 parameters. All models were trained for 50 epochs or until performance in the validation set ceased to improve.
2.3 Comparison Experiments
Multiple Instance Learning:
We implemented a MIL-based network where each slice of the CT volume was considered a sample from a bag. To this end, a purely convolutional network architecture resembling that of Fig. 1, with additional single-slice convolutional layers replacing conv-LSTM layers, to analyze each slice was implemented. A number of methods of summarizing predictions across the entire volume into a single bag probability were explored. The overall probability, , for a bag containing samples with individual probability of emphysema, , can be computed via the following approaches:
Max Pooling: = max()
Mean Pooling: =
Product Pooling: =
Conv-LSTM was also compared with a 3D CNN similar to the structure of the 2D CNN with MIL, instead with a single dense layer and no pooling operation on the final convolutional layer. The number of kernels for each comparison model was increased to make its number of parameters relatively equivalent with our Conv-LSTM framework and ensure a fair comparison (Table 1).
3 Results and Conclusions
Convolutional-LSTM demonstrated strong performance in the identification of emphysema when trained using only weakly annotated imaging volumes, achieving an AUC=0.82. It outperformed a CNN with MIL regardless of pooling strategy (Max pooling: AUC=0.69, Mean Pooling: AUC=0.70, Product pooling: AUC=0.76). At the optimal operating point corresponding to the Youden Index youden , our model achieved sensitivity and specificity of 0.77 and 0.74, respectively. Results for all evaluated models in the testing set are shown in Table 1.
|MIL - Max Pooling||64||1,011,393||0.69||0.59||0.68||0.63|
|MIL - Mean Pooling||64||1,011,393||0.70||0.76||0.57||0.66|
|MIL - Product Pooling||64||1,011,393||0.76||0.61||0.79||0.69|
Significantly, our approach requires no time-consuming annotation or manual processing of imaging data. Our framework allows training for disease detection from simple binary diagnostic labels, even if the disease is localized to only a fraction of the image. Therefore, our network can be readily trained from information easily attainable by mining radiology reports in an automated fashion. This capability significantly expands the pool of volumetric imaging data that can be practically utilized for such an application, and could allow easy retraining and finetuning of an algorithm when applied at a new hospital. Beyond emphysema, this approach is applicable to other disease/abnormality detection problems where the availability of volumetric imaging data exceeds the capacity of radiologists to provide manually delineated ground truth, but labels can be easily mined from radiology reports.
- (1) Ørting, S., et al.: Detecting emphysema with multiple instance learning. 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018) (2018)
- (2) Peña, I., et al.: Automatic emphysema detection using weakly labeled hrct lung images. PLoS ONE (2018)
- (3) Cheplygina, V., et al.: Classification of copd with multiple instance learning. 2014 22nd IEEE International Conference on Pattern Recognition (2014)
- (4) Bortsova, G., et al.: Deep learning from label proportions for emphysema quantification. Medical Image Computing and Computer Assisted Intervention - MICCAI 2018 (2018)
Shi, X., et al.:
Convolutional lstm network: A machine learning approach for precipitation nowcasting.Neural Information Processing Systems (NIPS) (2015)
Yue-Hei Ng, J., et al.:
Beyond short snippets: Deep networks for video classification.
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
- (7) Pigou, L., et al.: Beyond temporal pooling: Recurrence and temporal convolutions for gesture recognition in video. International Journal of Computer Vision (2018)
- (8) National Lung Screening Trial Research Team: Reduced lung-cancer mortality with low-dose computed tomographic screening. New England Journal of Medicine (2011)
- (9) National Lung Screening Trial Research Team: The national lung screening trial: overview and study design. Radiology (2011)
- (10) Youden, W.: Index for rating diagnostic tests. Cancer (1950)