Chest X-ray is the most common medical imaging exam with over 35 million taken every year in the US alone . They allow for inexpensive screening of several pathologies including masses, pulmonary nodules, effusions, cardiac abnormalities and pneumothorax. Due to increasing workload pressures, many radiologists today have to read more than 100 X-ray studies daily. Therefore, automated tools trained to predict the risk of specific abnormalities given a particular X-ray image have the potential to support the reading workflow of the radiologist. Such a system could be used to enhance the confidence of the radiologist or prioritize the reading list where critical cases would be read first.
Due to the recent availability of a large scale data set , several works have been proposed to automatically detect abnormalities in chest X-rays. The only peer-reviewed published work is by Wang et al. 
which evaluated the performance using four standard Convolutional Neural Networks (CNN) architectures (AlexNet, VGGNet, GoogLeNet and ResNet). The following not peer-reviewed papers can be found on arXiv. In , a slightly modified DenseNet architecture was used. Yao et al. 
utilized a variant of DenseNet and Long-short Term Memory Networks (LSTM) to exploit the dependencies between abnormalities. In
, Guan et al. proposed an attention guided CNN whereby disease specific regions are first estimated before focusing the classification task on a reduced field of view. However most of the current work on arXiv shows results by splitting the data randomly for training, validation and testing[6, 5] which is problematic as the average image count per patient for the ChestX-Ray14  data set is 3.6. Thus the same patient is likely to appear in both training and test set. Additionally there is a significant variability in the classification performance between splits due to the class imbalance, thus making performance comparisons problematic. The solely prior work containing publicly released patient-wise splits is the work by Wang et al.
In this paper, we propose a location aware Dense Network (DNetLoc) to detect pathologies in chest X-ray images. We incorporate the spatial information of chest X-ray pathologies and exploit high resolution X-ray data effectively by utilizing high-resolution images during training and testing. Moreover, we benchmark our method on the largest data set reported in the community with 86,876 patients and about 297,541 images, utilizing both the ChestX-Ray14  and PLCO  data sets. In addition we propose a new benchmarking set-up on this data set, including published patient-wise training and test splits, supporting the ability to effectively compare future algorithm performance on the largest public chest X-ray data set. We achieve the best performance reported on the existing ChestX-Ray14 benchmarking data set where both patient-wise train and test splits are published.
The ChestX-Ray14 data set  contains 30,805 patients and 112,120 chest X-ray images. The size of each image is with 8 bits gray-scale values. The corresponding report includes 14 pathology classes.
In the PLCO data set , there are 185,421 images from 56,071 patients. The original size of each image is with 16 bit gray-scale values. We choose 12 most prevalent pathology labels, among which 5 pathology labels contain also the spatial information. The details of such spatial information are described in Section 3.3.
Across both data sets, there are 6 labels which share the same name. However, in our experiment, we avoid combining the images of similar labels as we cannot guaratee the same label definition. Additionally we assume there is no patient overlap between these two datasets.
The pathology labels are highly imbalanced. This is clearly illustrated in Fig. 1, which displays the total number of images across all pathologies in the 2 data sets. This poses a challenge to any learning algorithm.
3.1 Multi-label Setup
We use a variant of DenseNet with 121 layers 9]. At first we focus on the ChestX-Ray14 dataset. The labels consist of a C
dimensional vectorwhere C=14 with binary values, representing either the absence (0) or the presence (1) of a pathology. As a multi-label problem, we treat all labels during the classification independently by defining C
binary cross entropy loss functions. As the data set is highly imbalanced, we incorporate additional weights within the loss functions, based on the label frequency within each batch:
where and , with and indicating the number of presence and absence samples, respectively.
During training, we use a batch size of 128. Larger batch sizes increase the probability to contain samples of each class and increase the weight scale of and . The original images are normalized based on the ImageNet pre-trained model  with 3 input channels. We increase the global average pooling layer before the final layer to . The Adam optimizer  (, , ) is used with an adaptive learning rate: The learning rate is initialized with and reduced 10x when the validation loss plateaus.
3.2 Leveraging High-Resolution Images and Spatial Knowledge
Two strided convolutional layers with 3 filters ofand a stride of 2 are added as the first layers to effectively exploit the high-resolution chest X-ray images. The filter weights of both layers are initialized equal to a Gaussian down-sampling operation. We use an image size of as input to our network.
Contrary to the ChestX-Ray14  data set the PLCO data set  includes consistent spatial location labels for many pathologies. We include 12 pathology labels of the PLCO data set in our experiments (see Fig. 1, right side). The location information is available for 5 pathologies. The location information contains the information about the side (right lung, left lung), finer localization in each lung (divided in equal fifth), including an additional label for diffuse disease. The exact position information of multiple and diffused diseases is not provided.
Therefore, we create 9 additional classes: 6 are responsible for the lobe position (equally split in five parts and a “wildcard” label for multiple diseases: E.g. if the image contains nodules in multiple lung parts, only this label is present), 2 for the lung side (left and right), and 1 for diffused diseases over multiple lung parts. Fig. 2 illustrates the label definition based on spatial information.
The spatial location labels are trained as binary and independent classes with cross entropy functions. The number of present class labels depend on the number of diseases that contain location information.
3.3 Dataset Pooling
We combine the ChestX-Ray14 and the PLCO datasets. The training and validation set includes images from both data sets. Several classes share similar class labels. However, we do not know if both data sets are created based on the same label definition. Due to this fact, we treat the labels independently and create different classes. We normalize brightness and contrast of the PLCO dataset images by applying histogram normalization. All images are normalized based on the mean and the standard deviation to match the ImageNet definition. Each batch contains images from both data sets.
Combining both datasets (C=35), we compute the loss function
where is either 0 or 1, depending which dataset the image is coming from and whether the spatial information exists.
3.4 Global Architecture
Overall, we create a local aware Dense Network that adaptively deals with label availability during training. The final network consists of 35 labels, 14 from the ChestX-Ray14 dataset, and 21 from PLCO dataset. Fig. 3 illustrates the architecture of the network (DNetLoc).
4 Experimental Results
The ChestX-Ray14 dataset contains an average of 3.6 images per patient and PLCO 3.3 images per patient. Thus, there is a high probability the same patient appears in all 3 subsets if a random image-split is used. This paper uses only patient-wise splits. All splits used in this paper are published on GitHub . For all experiments we separate the data as follows: 70% for training, 10% for validation, and 20% for testing.
Below we present our experimental results. First, we show the state-of-the-art results on the ChestX-Ray14 dataset, following the official patient-wise split. Then, we present the results on the PLCO data set, illustrating the value of using location information and data pooling.
|Wang et al. ||Our DNet||Our DNet|
Table 1 shows the best AUC scores obtained on the ChestX-Ray dataset using the official test set. Our network increases the mean AUC score by over 5% compared to the previous work. We observed several limitations with the official split where training and test data sets have different characteristics. This can be either the large label inconsistency or the fact that there are on average 3 times more images per patient in the test set compared with the training set. Thus we computed several random patient-splits each leading better performance with average AUC with standard deviation. Detailed performance for the novel benchmarking patient-wise split is shown in Table 1 and in Fig. 4 left).
Overall, significant label variance of the follow-up exams are noticeable across the ChestX-Ray14 data set. This might be due to the circumstance that many follow-ups are generated with a specific question, e.g did the Pneumothorax disappear. Thus repeated and consistant labeling of other abnormalities in follow-up studies varies. As the ChestX-Ray14 labels are generated from reports this would introduce incomplete labeling for many follow-ups.
|Our DNet||Our DNetLoc|
|Bone/Soft Tissue Lesion||0.853||0.845|
Finally we evaluate our method on the PLCO data. The results in Table 2 show that location information and leveraging high resolution images improve the classification accuracy for most pathologies. For a subset of pathologies where location information is provided (marked in bold), the performance increases by an average of 2.3%. Moreover, the training time was reduced by a factor of 2 when location information is used. For the PLCO data set we reach a final mean AUC score of 87.4%. Fig. 4 shows the performance of our method for both the ChestX-Ray14 and the PLCO test set.
We presented a novel method based on location aware Dense Networks to classify pathologies in chest X-ray images, effectively exploiting high-resolution data and incorporating spatial information of pathologies to improve the classification accuracy. We showed that for pathologies where the location information is present the classification accuracy improved significantly. The algorithm is trained and validated on the largest chest X-ray data set containing 86,876 patients and 297,541 images. Our system has the potential to support the current high throughput reading workflow of the radiologist by enabling him to gain more confidence by asking an AI system for a second opinion or flag ”critical” patients for closer examination. In addition we have shown the limitations in the validation strategy of previous works and propose a novel setup using the largest public data set and provide patient-wise splits which will facilitate a principled benchmark for future methods in the space of abnormality detection on chest X-ray imaging.
Disclaimer: This feature is based on research, and is not commercially available. Due to regulatory reasons, its future availability cannot be guaranteed.
-  Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.: Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proc. CVPR. (2017) 3462–71
-  Kamel, S.I., Levin, D.C., Parker, L., Rao, V.M.: Utilization trends in noncardiac thoracic imaging, 2002-2014. JACR 14(3) (2017) 337–42
-  He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. CVPR. (2016) 770–778
-  Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz, C., Shpanskaya, K., et al.: Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv:1711.05225 (2017)
-  Yao, L., Poblenz, E., Dagunts, D., Covington, B., Bernard, D., Lyman, K.: Learning to diagnose from scratch by exploiting dependencies among labels. arXiv:1710.10501 (2017)
-  Guan, Q., Huang, Y., Zhong, Z., Zheng, Z., Zheng, L., Yang, Y.: Diagnose like a Radiologist: Attention Guided Convolutional Neural Network for Thorax Disease Classification. ArXiv e-prints (January 2016)
-  Gohagan, J.K., Prorok, P.C., Hayes, R.B., Kramer, B.S.: The prostate, lung, colorectal and ovarian (plco) cancer screening trial of the national cancer institute: history, organization, and status. Controlled clinical trials 21(6) (2000) 251S–272S
-  Huang, G., Zhang, L., van der Maaten, L., Weinberger, K.Q.: Densely Connected Convolutional Networks. ArXiv e-prints (August 2016)
-  Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. IJCV 115(3) (December 2015) 211–252
-  Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. ArXiv e-prints (December 2014)
-  ***: Patient-wise splits used in this work will be published on github before sep 2018