Automatically classifying findings of interest within chest radiographs remains a challenging task. Systems that can perform this task accurately have several use-cases. In particular, chest x-rays are the most commonly ordered imaging study for pulmonary disorders (Raoof et al., 2012) and given the sheer volume of images being produced, an automated system that provides secondary reads for radiologists could allow important findings not to be missed. Moreover, in regions where access to trained radiologists is limited, an automated system that can accurately detect thoracic diseases from chest x-rays would be greatly valuable. Alternatively, in fast-paced care settings such as the emergency department and intensive care unit, clinicians may not have time to wait for the results of a radiology report to become available. A system that can automatically flag potentially lethal conditions (e.g. complications from mechanical ventilation leading to pneumothorax (Chen et al., 2002)) could allow care providers to respond to emergency situations sooner.
In this work, we train deep convolutional neural networks for the purpose of automatically classifying findings in chest x-ray images. We make use of the MIMIC-CXR dataset, which has been made available via a limited release. The MIMIC-CXR dataset is intended for future dissemination and consists of the largest available set of frontal and lateral chest radiographs, compared to previously released datasets. The dataset consists of 473,064 chest x-rays in DICOM format collected from 63,478 patients. In addition, 206,574 radiology reports that correspond to the CXR images are also available. Consistent with recent previous works on automated chest x-ray analysis (Yao et al., 2017; Rajpurkar et al., 2017; Guan et al., 2018; Kumar et al., 2017; Baltruschat et al., 2018), we focus on recognizing 14 thoracic disease categories, including atelectasis, cardiomegaly, consolidation, edema, effusion, emphysema, fibrosis, hernia, infiltration, mass, nodule, pleural thickening, pneumonia and pneumothorax. Unique from previous works, we train separate models based on the view position of the radiograph. In particular, separate CNN models are trained for posteroanterior (PA), anteroposterior (AP) and lateral view position CXR images. Furthermore, consistent with how standard chest examinations take place (Gay et al., 2013), we present a novel DualNet architecture that accepts as input both a frontal and lateral chest x-ray taken from a patient during a radiographic study. An overview of the DualNet architecture is graphically depicted in Fig. 1. Both the frontal and lateral CXR inputs are processed by separate convolutional neural networks and their outputs are combined into a fully connected layer to make a final classification. We compare the DualNet architecture to baseline CNN architectures that process PA, AP and lateral inputs separately and show that processing both frontal and lateral inputs simultaneously leads to improvement in classification performance.
2 Related Work
While computer-aided diagnosis in chest radiography has been studied for many years (Van Ginneken, 2001; Van Ginneken et al., 2001), the public release of the ChestX-ray14 dataset (Wang et al., 2017) has resulted in many recent works that attempt to classify thorax diseases from frontal chest x-rays (Yao et al., 2017; Rajpurkar et al., 2017; Guan et al., 2018; Kumar et al., 2017; Baltruschat et al., 2018). Perhaps the most well-known of these works is that of Rajpurkar et al. (2017), in which the authors present the ChexNet system. In addition to evaluating ChexNet by determining how well the model classifies 14 thorax diseases labeled within the ChestX-ray14 dataset, Rajpurkar et al. (2017) extracted a subset of 420 images and evaluated the performance of a binary ChexNet model trained to detect pneumonia. They compare the performance of ChexNet to that of four practicing academic radiologists and show that ChexNet was able to achieve a better score compared to the average score from the radiologists.
A recent work by (Guan et al., 2018) uses the ChestX-ray14 dataset to train attention-guided networks (AG-CNN) which consist of a standard global branch CNN that processes a full CXR image, as well as a local branch CNN that processes the result of applying a CAM-like saliency map (Zhou et al., 2016) for region localization. When both branches are fine-tuned together they show improved performance over utilizing each branch individually. The authors of (Guan et al., 2018) present improved results over those presented by (Rajpurkar et al., 2017), however, they perform a random 70/10/20 train/validation/test set split that does not ensure the same subject is not present in different sets.
Apart from the ChestX-ray14 dataset, previously smaller publicly released datasets have also been used to construct CXR classifiers. The Japanese Society of Radiological Technology (JSRT) dataset (Shiraishi et al., 2000) is a small frontal CXR dataset that contains normal images, as well as radiographs exhibiting malignant and benign lung nodules. In (Gordienko et al., 2017), the authors train small convolutional neural networks both with and without lung segmentation for the task of classifying whether lung nodules are present within a CXR image or not. They apply their method on the JSRT dataset, as well as on a version of the JSRT dataset where bone shadows have been eliminated, BSE-JSRT (Van Ginneken et al., 2006). Due to the limited size of the JSRT dataset (247 images), the training/validation set results presented in (Gordienko et al., 2017) exhibit significant overfitting, however the authors do show improved performance when using the BSE-JSRT dataset with bone shadows removed.
Finally, the Indiana chest X-ray collection consists of 8,121 images and 3,996 corresponding radiology reports (Demner-Fushman et al., 2015). In (Islam et al., 2017), the authors combined the Indiana chest X-ray collection with the JSRT dataset, as well as the Shenzhen dataset (Jaeger et al., 2014). They compare the performance of several CNN architectures such as AlexNet (Krizhevsky et al., 2012), VGG (Simonyan and Zisserman, 2014) and ResNet (He et al., 2015) and show that an ensemble of models leads to classification performance improvement.
While a wide range of works continue to become available that apply deep convolutional networks to chest x-rays images, a few common drawbacks exist that limit their applicability for constructing automated CXR classifiers. In particular, approaches trained on the ChestX-ray14 dataset accept 8-bit, grayscale PNG images as model inputs. This is a limited dynamic range compared to DICOM formats that typically encode medical image pixel and voxel data using 12-bit depth, or greater. In addition, the released PNG images in the ChestX-ray14 dataset were resized to 1024x1024 pixel width and height. This resizing was performed without maintaining the original aspect ratio and could introduce distortion into the images. Several of the works severely downsample input images to match dimensions required for pre-trained classifiers (Rajpurkar et al., 2017; Guan et al., 2018). Moreover, the mentioned recent works (Yao et al., 2017; Rajpurkar et al., 2017; Guan et al., 2018; Kumar et al., 2017; Baltruschat et al., 2018) make no distinction between PA and AP chest x-rays view positions. This can be a problem for some findings, such as cardiomegaly, which can only be accurately assessed in PA images, as the AP view will exaggerate the heart silhouette due to magnification. Finally, the above works focus solely on the evaluation of frontal chest x-rays, whereas the lateral view reveals lung areas that are hidden in the frontal view (Raoof et al., 2012). The lateral view can be especially useful in detecting lower-lobe lung disease, pleural effusions, and anterior mediastinal masses (Ahmad, 2001) and is hence routinely taken into account by practicing radiologists.
The contributions of this work are as follows:
We train deep convolutional neural networks to recognize multiple common thorax diseases on the as-yet largest collection of chest radiographs – the MIMIC-CXR dataset.
We describe and evaluate CNN models for processing frontal, as well as lateral chest x-rays, which have received less attention from previous research efforts. Furthermore, we develop distinct models for anteroposterior and posteroanterior frontal view types.
We introduce a novel DualNet architecture, which simultaneously processes a patient’s frontal and lateral chest x-rays and demonstrate its usefulness in improving performance against baseline classifiers.
4 Model architectures
To classify thorax diseases in chest x-ray inputs, we train separate models based on the type of view position that was used to acquire the image. In particular, three networks are trained specifically for posteroanterior (PA), anteroposterior (AP) and lateral (Lateral) view types. Furthermore, we introduce a new network architecture, referred to as DualNet, that accepts paired frontal and lateral CXR inputs. We train two types of DualNet architectures, one for PA-Lateral pairs and one for AP-Lateral pairs. A schematic of the DualNet architecture is shown in Fig. 1.
For each model type, the baseline CNN architecture used is a modified version of DenseNet-121 (Huang et al., 2017). The original DenseNet-121 model is altered by replacing the typical 3-channel (RGB) input layer with 1-channel (grayscale) input. Four DenseNet blocks are applied using a growth rate of 32 channels per layer.
To allow the handling of different image input sizes, we perform a global average pooling operation on the final convolutional layer before a fully connected layer maps 1x1 feature maps to 14 output classes. As each input image could potentially contain multiple findings the task of classifying thorax diseases in chest x-rays is treated as a multi-class, multi-label problem. As such, a sigmoid operation is applied to each of the 14 outputs and a binary cross-entropy loss function is used to train each network.
5 Data processing
5.1 Data splitting
The MIMIC-CXR dataset was split into separate training, validation and test sets. 20% of the images were held out as test data. The remaining 80% of the data was then split between the training (70%) and validation (10%) sets. The dataset was split by subject to ensure a distinct set of subjects existed in training, validation and test datasets, i.e. no subject was present in more than one dataset group. The data splits were then verified to ensure prevalence between view types and thoracic disease categories were consistent between groups. Table 1 shows the overall number of training, validation and test set images used to train and evaluate separate PA, AP and lateral view models. In addition, Table 2 depicts the number of radiological studies where frontal and lateral pairs of images were available to train PA-Lateral and AP-Lateral DualNet models.
|PA & Lateral||59,500||8,414||17,146|
|AP & Lateral||15,372||2,186||4,257|
5.2 Data transformations
A series of transformation operations were applied to the data before model training. First, nearest-neighbor interpolation was used to scale images to a specified dimension, while ensuring original aspect ratio was maintained. Images were then cropped to enforce equivalent width and height. In our experiments, we used an image width and height of 512 x 512 pixels. Pixels were then normalized to between 0 and 1 by dividing by the maximum grayscale value.
5.3 Instance Labeling
Chest x-ray images were labeled from their corresponding radiology reports using NegBio (Peng et al., 2017; Wang et al., 2017). NegBio maps each report into a collection of Unified Medical Language System (UMLS) concept ids. Images were first mapped to an initial set of 46 UMLS concept ids. A further mapping took place to subsequently assign these 46 concepts to the 14 common thoracic disease types commonly used in previous works (Wang et al., 2017; Rajpurkar et al., 2017). As such, more specific concept ids were mapped to their more general class, e.g. C0546333 (right pneumothorax) and C0546334 (left pneumothorax), were both mapped to the general concept - pneumothorax. (Note: emphysema did not occur in the MIMIC-CXR dataset, so was excluded, giving a total of 13 CXR findings).
To assign final labels, only annotations from the ‘Findings’ and ‘Impression’ sections of radiology reports were used. Furthermore, a positive label was only assigned if the concept was not identified as being negated or uncertain by NegBio. An individual image could be assigned multiple positive labels for each of the 13 common thoracic disease types. Images where none of the 13 disease types were identified were labeled as ‘No Finding’. Table 3 presents the ordered per class prevalence for each of the final 14 class labels.
6 Model training
Baseline PA, AP and Lateral models were seeded using pre-trained ImageNet(Russakovsky et al., 2015) weights. No data augmentation took place and Adam optimization (Kingma and Ba, 2014) was used with a cyclic learning rate (Smith, 2017). An initial learning rate range test established a learning rate boundary of between 0.001 – 0.02 in which the learning rate then fluctuates between during training. The Triangular2 policy (Smith, 2017)
was used to control how learning rate fluctuations are altered over time. Finally, stratified mini-batch sampling was employed to ensure each mini-batch maintained overall class prevalence during training. PyTorch was used for model development and models were trained using data parallelism over 8 Nvidia Titan Xp GPUs.
7.1 Frontal and Lateral Model Evaluation
Table 4 presents the per class AUC results calculated on the held-out test set for each of the PA, AP and Lateral models described. It can be seen that recognition performance for each of the findings varies by view type. Compared with AP and lateral views, PA models result in larger AUC values for atelectasis, cardiomegaly, fibrosis, infiltrates and pleural thickening. For frontal view types, the PA model achieves a larger average AUC (0.702), compared to the AP model (0.655). This difference is likely due to changes in the clinical setting where these images are acquired. In particular, AP images are typically obtained in the intensive care unit. Apart from heart and lung anatomy, other internal or external non-anatomical objects are likely to be present in CXR images taken from ICU patients, including items such as endotracheal and nasogastric tubes, peripherally inserted central catheter lines, as well as other medical devices such as electrodes and cardiac pacemakers (Hunter et al., 2004).
Table 4 also shows the benefit of the lateral model, which achieves a larger average AUC (0.706) compared to PA and AP frontal models. Per class, the Lateral model results in larger AUC values for the following findings: consolidation, edema, effusion, hernia, mass, pneumonia and pneumothorax.
7.2 DualNet Architecture Evaluation
To evaluate the DualNet architecture, Table 5 compares the per class AUC results for the subset of radiological studies in the MIMIC-CXR dataset where both frontal and lateral images were obtained. Both PA & Lateral, as well as AP & Lateral combinations are considered. First, we evaluate the performance of applying the separately trained frontal and lateral models to the images from these studies. We compare the individual model performance to the jointly trained DualNet architecture. In 12 out of 14 cases, the DualNet architecture performs better than applying separately trained models for PA & Lateral studies111Note that for pleural thickening an improvement was witnessed beyond the scale shown in Table 5.. For radiology studies that obtained AP & Lateral radiographs, the DualNet architecture outperforms individual model classification for 10 of the 14 classes (highlighted in bold).
Overall, it can be seen that average AUC is greater for DualNet classifiers, compared to individually trained classifiers. For PA & Lateral studies, DualNet achieves an average AUC of 0.721 compared to 0.690 for individually trained classifiers. For AP & Lateral studies, DualNet achieves an average AUC of 0.668 compared to 0.637 for individually trained classifiers.
|Finding||Individual PA+Lateral||DualNet PA+Lateral||Individual AP+Lateral||DualNet AP+Lateral|
8 Conclusions and Future Work
We have presented a collection of deep convolutional neural networks trained on the largest released dataset of chest x-ray images – the MIMIC-CXR dataset. We evaluated our models on the task of recognizing the presence or absence of common thorax diseases. Separate models were trained to assess frontal, as well as lateral CXR inputs and a novel DualNet architecture was introduced that emulates routine clinical practice by taking into account both view types simultaneously. In future work, we plan to overcome several limitations of the current approach. First, several improvements could be made to our CNN training procedure, including the addition of techniques known to improve image-based classification performance, such as data augmentation and pixel normalization. More importantly, our models as currently described, only consider radiograph pixel information when making a classification decision. For several conditions (e.g. pneumonia), careful consideration of a patient’s history and current clinical record is required to make an accurate final assessment. In future work we plan to incorporate this information within our model architecture. Finally, while automated radiograph analysis has many potential benefits, further consideration must be given to how these systems can best fit into clinical practice to aid workflow and be helpful for clinicians and the care of their patients.
The authors gratefully acknowledge Alistair Johnson from the MIT Laboratory for Computational Physiology for making the MIMIC-CXR dataset available, as well as Yifan Peng from the Biomedical Text Mining Group, Computational Biology Branch, NIH, for supplying the NegBio library.
- Ahmad (2001) Naveed Ahmad. Mastering ap and lateral positioning for chest x-ray. https://www.auntminnie.com/index.aspx?sec=ser&sub=def&pag=dis&ItemID=52189, 2001. Accessed: 2018-04-15.
- Baltruschat et al. (2018) Ivo M Baltruschat, Hannes Nickisch, Michael Grass, Tobias Knopp, and Axel Saalbach. Comparison of deep learning approaches for multi-label chest x-ray classification. arXiv preprint arXiv:1803.02315, 2018.
- Chen et al. (2002) Kuan-Yu Chen, Jih-Shuin Jerng, Wei-Yu Liao, Liang-Wen Ding, Lu-Cheng Kuo, Jann-Yuan Wang, and Pan-Chyr Yang. Pneumothorax in the icu: patient outcomes and prognostic factors. Chest, 122(2):678–683, 2002.
- Demner-Fushman et al. (2015) Dina Demner-Fushman, Marc D Kohli, Marc B Rosenman, Sonya E Shooshan, Laritza Rodriguez, Sameer Antani, George R Thoma, and Clement J McDonald. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, 23(2):304–310, 2015.
- Gay et al. (2013) Spencer B. Gay, Juan Olazagasti, Jack W. Higginbotham, Atul Gupta, Alex Wurm, and Jonathan Nguyen. Positioning. https://www.med-ed.virginia.edu/courses/rad/cxr/technique1chest.html, 2013. Accessed: 2018-04-15.
- Gordienko et al. (2017) Yu Gordienko, Peng Gang, Jiang Hui, Wei Zeng, Yu Kochura, O Alienin, O Rokovyi, and S Stirenko. Deep learning with lung segmentation and bone shadow exclusion techniques for chest x-ray analysis of lung cancer. arXiv preprint arXiv:1712.07632, 2017.
- Guan et al. (2018) Qingji Guan, Yaping Huang, Zhun Zhong, Zhedong Zheng, Liang Zheng, and Yi Yang. Diagnose like a radiologist: Attention guided convolutional neural network for thorax disease classification. arXiv preprint arXiv:1801.09927, 2018.
- He et al. (2015) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. corr abs/1512.03385 (2015), 2015.
- Huang et al. (2017) Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks. In doi: 10.1109/CVPR.2017.243. URL https://doi.org/10.1109/CVPR.2017.243.
- Hunter et al. (2004) Tim B Hunter, Mihra S Taljanovic, Pei H Tsau, William G Berger, and James R Standen. Medical devices of the chest. Radiographics, 24(6):1725–1746, 2004.
- Islam et al. (2017) Mohammad Tariqul Islam, Md Abdul Aowal, Ahmed Tahseen Minhaz, and Khalid Ashraf. Abnormality detection and localization in chest x-rays using deep convolutional neural networks. arXiv preprint arXiv:1705.09850, 2017.
- Jaeger et al. (2014) Stefan Jaeger, Sema Candemir, Sameer Antani, Yì-Xiáng J Wáng, Pu-Xuan Lu, and George Thoma. Two public chest x-ray datasets for computer-aided screening of pulmonary diseases. Quantitative imaging in medicine and surgery, 4(6):475, 2014.
- Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Krizhevsky et al. (2012) Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
- Kumar et al. (2017) Pulkit Kumar, Monika Grewal, and Muktabh Mayank Srivastava. Boosted cascaded convnets for multilabel classification of thoracic diseases in chest radiographs. arXiv preprint arXiv:1711.08760, 2017.
- Peng et al. (2017) Yifan Peng, Xiaosong Wang, Le Lu, Mohammadhadi Bagheri, Ronald Summers, and Zhiyong Lu. Negbio: a high-performance tool for negation and uncertainty detection in radiology reports. arXiv preprint arXiv:1712.05898, 2017.
- Rajpurkar et al. (2017) Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, et al. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225, 2017.
- Raoof et al. (2012) Suhail Raoof, David Feigin, Arthur Sung, Sabiha Raoof, Lavanya Irugulpati, and Edward C Rosenow. Interpretation of plain chest roentgenogram. Chest, 141(2):545–558, 2012.
- Russakovsky et al. (2015) Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.
- Shiraishi et al. (2000) Junji Shiraishi, Shigehiko Katsuragawa, Junpei Ikezoe, Tsuneo Matsumoto, Takeshi Kobayashi, Ken-ichi Komatsu, Mitate Matsui, Hiroshi Fujita, Yoshie Kodera, and Kunio Doi. Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists’ detection of pulmonary nodules. American Journal of Roentgenology, 174(1):71–74, 2000.
- Simonyan and Zisserman (2014) Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Smith (2017) Leslie N Smith. Cyclical learning rates for training neural networks. In Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on, pages 464–472. IEEE, 2017.
- Van Ginneken (2001) Bram Van Ginneken. Computer-aided diagnosis in chest radiography. PhD thesis, Utrecht University, 2001.
- Van Ginneken et al. (2001) Bram Van Ginneken, BM Ter Haar Romeny, and Max A Viergever. Computer-aided diagnosis in chest radiography: a survey. IEEE Transactions on medical imaging, 20(12):1228–1241, 2001.
- Van Ginneken et al. (2006) Bram Van Ginneken, Mikkel B Stegmann, and Marco Loog. Segmentation of anatomical structures in chest radiographs using supervised methods: a comparative study on a public database. Medical image analysis, 10(1):19–40, 2006.
- Wang et al. (2017) Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M Summers. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3462–3471. IEEE, 2017.
- Yao et al. (2017) Li Yao, Eric Poblenz, Dmitry Dagunts, Ben Covington, Devon Bernard, and Kevin Lyman. Learning to diagnose from scratch by exploiting dependencies among labels. arXiv preprint arXiv:1710.10501, 2017.
Zhou et al. (2016)
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba.
Learning deep features for discriminative localization.In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on, pages 2921–2929. IEEE, 2016.