1 Introduction
Deep learning methods have shown great promise in medical diagnosis [3]. Specifically, after the release of NIH ChestXray dataset, deep learning based methods achieved high performances on chest xray classification hitherto unprecedented on large scale [12, 8, 5, 13]
. Despite these promising accomplishments, the previously proposed methods for chest Xray classification do not show adequate feature interpretability, while similar methods achieve higher feature interpretability on computer vision datasets
[14]. Feature interpretability is critical in the clinical setting, as it helps explain why a diagnostic decision is made by the classifier [1, 7]. Therefore it is also critical that the methods proposed for medical image diagnosis achieve high scores in interpretability metrics.Several works consider interpretability of neural network classifiers. Rajpurkar et al. [8] propose a classification model for Pneumonia detection on NIH ChestXray14 dataset and visualize Class Activation Maps (CAMs) [14] to show interpretability of features used for Pneumonia prediction. They also propose a multilabel classification model with high classification accuracy for all classes in the dataset, however do not evaluate its interpretability. Wang et al. [12] propose a unified weakly supervised multilabel image classification and disease localization framework. The localization, which is based on CAMs, serves as a metric for evaluating the interpretability of the features. Their proposed weighted cross entropy scheme enforces the model to learn interpretable features in the highly imbalanced (in terms of positive and negative examples) NIH ChestXray dataset. However, the localization results do not keep pace with the classification results. Li et al. [5] improve the localization on the same dataset significantly by incorporating the annotated bounding boxes into training. Although this is an intuitive method to enforce the model to learn explainable features, lack of annotated data hinders the use of this approach. Biffi et al. [1] propose a method based on convolutional generative neural networks for designing models with interpretable features and the method is applied to the classification of cardiovascular diseases. The method enforces interpretability by design and limits the architecture of the neural network to only generative ones. Therefore, an approach that would enforce interpretability on all high performing models (and not just a specific architecture) is appreciated.
It is postulated that feature representations learned using robust training capture salient data characteristics. [11]. In this work, we improve the interpretability of the state of the art neural network classifiers via adversarially robust optimization. The work tries to steer the models toward learning features that are more semantically relevant to the pathologies in the classification problem. Initially, we propose a baseline neural network classifier based on the state of the art. Then we modify its loss to adversarially robust loss and measure the improvement in terms of interpretability and classification accuracy. To evaluate the feature interpretability of the proposed solution, its localization accuracy is measured. Moreover, CAMs and saliency maps [10] are presented for visual evaluation.
2 Methodology
2.1 Baseline Model
Given a database of Xray images, and their corresponding labels , where and with being the number of classes, our aim is to train a model , where is the predicated label and denotes the parameters of the model. The loss to be minimized is binary cross entropy loss, and for each input example is defined as:
(1) 
where is a weighting factor to balance the positive labels, and defined as the ratio of the number of negative labels to the number of positive labels in a batch.
2.2 Adversarially Robust Optimization
This work aims at improving the learned feature representations of neural network classifiers through training models that are robust against adversarial examples. We view adversarial examples and robustness from the perspective of optimization. Given the loss formulated in equation 1, adversarial examples are perturbed inputs that try to maximize the loss
(2) 
where is the perturbation and defines the allowed perturbation [2]. In order to make models robust against these adversarial examples, the loss is modified to a minmax problem so that it incorporates robustness as an objective [6]:
(3) 
The approach for solving the optimization problem is to repeatedly find input perturbations by solving the inner maximization, and then update the model parameters to reduce the loss on these perturbed inputs. It is not necessary for the inner maximization to be solved exactly, and an approximate lower bound could lead to a reasonable solution for the minmax problem [6]. Our purpose for robust optimization is not only having a high performance against adversarial attacks, but also steering the model towards learning more interpretable features. We choose a and optimization method for Eq. 2 that is computationally reasonable, and we show that the model still learns robust features.
3 Experiments
3.1 Dataset
We evaluate our method on the NIH chestXray14 dataset [12], which is the largest publicly available chest xray dataset to date and includes 112,120 frontalview Xray images of 30,805 unique patients in resolution. Each image has fourteen labels associated with it, each corresponding to common thoracic pathologies. We use the train/test split provided with the dataset in its latest update, i.e. from the entire dataset, 25596 images are in the test set. The rest are split to training (90%) and validation (10%) sets. In the test set, 880 images have bounding box annotations of at least one pathology. Annotations exist only for 8 out of 14 pathologies.
3.2 Baseline Model
In previous works [12, 8, 5], the feature maps of the last layer are of low resolution and they are used for generating the CAMs. These Lowresolution CAMs are not able to localize pathologies such as Nodule that are small in size. Hence, it is intuitive to modify the CNN to have larger feature maps in the last layer. Therefore we adopt a densely connected convolutional neural network [4], DenseNet121, and remove the denseblocks 3 and 4 and their corresponding transition layers (2 and 3) in order to get higher resolution feature maps in the last layer. The aforementioned neural network serves as our baseline model and its classification accuracy and interpretability evaluation are depicted in Fig. 2 and Fig. 3 respectively.
In all our experiments, the networks are trained using stochastic gradient descent with a learning rate of 0.01 and a momentum of 0.9. We use a batch size of 32 and do not use weight decay and dropout. The training is continued until the validation loss (Eq.
1) diverges, and the model with the smallest validation loss is used.3.3 Interpretability Evaluation
Weaklysupervised localization accuracy is measured for each classification model and is used as a proxy for evaluating interpretability of the classification model. Localization is evaluated using intersection over union (IoU) between the thresholded CAM and the bounding box. The localization is correct when the IoU is greater than a certain threshold T(IoU). Localization accuracy is calculated for several values of T(IoU).
We do not generate bounding boxes from the thresholded CAMs and follow the same approach proposed by Li et al. [5]. We only scales the CAMs to a range (we used [0 255]) and thresholds them by value. It does not depend on further postprocessing and bounding box generation approaches, thus directly evaluates the feature maps. We choose the thresholding value differently for each class based on its resulting localization performance on a validation set. The validation set is selected from the annotated images (from 880 images) in the test set. 20% (of 880 images) is chosen for finding the thresholding value and we perform 5fold cross validation for correct evaluation.
3.4 Adversarially Robust Optimization
The minmax optimization in Eq. 3 is solved iteratively by solving the inner maximization and then the outer minimization, hence a method that requires several update steps for solving the inner maximization makes the solution computationally expensive. Many methods have been proposed in the literature for finding approximate solutions to the inner maximization problem. We use the FGSM [2] method as it requires only one update step for finding a local maxima. If we start solving the minmax problem (Eq. 3) from a network not yet trained on the dataset, the adversarial examples make it hard for the network to learn the features of the dataset. Therefore, we initially train the network without adversarial loss, and after convergence, we continue training with the adversarial loss in Eq. 3. We observed that it is also helpful for training convergence, not to perturb all examples during training. Hence we only perturb half [11]
of the input examples in each epoch during training with the adversarial loss (Eq.
3). The amount of perturbation allowed for the FGSM method is defined by . FGSM finds a local maxima in Eq. 2 limited by the allowed perturbation set . Several values for are chosen in the experiments in order to see the effect of the amount of perturbation during training on the learned features of the network Fig. 4). For the rest of the experiments is used.4 Results and Discussion
In this section and the Figures 2 and 3 we refer to the work of Wang et al. [12] as NIH method, Rajpurkar et al. [8] as CheXNet, Li. et al. [5] as Supervised and our method that is based on adversarially robust optimization as Robust method.
4.1 Baseline Model vs. State Of the Art
CheXNet method achieves the highest AUC (Fig. 2). However, it has the lowest localization accuracy (Fig 3), indicating that the model’s learned features do not align well with the pathologies. The high AUC and lack of interpretable features of CheXNet can be attributed to its unweighted binary cross entropy loss where the imbalance between positive and negative examples is ignored (Refer to suplementary materials).
The Supervised method uses 80% of annotated images during training, hence it achieves the highest localization accuracy. We report it as an upper bound for the localization accuracy, and it cannot be fairly compared with other methods since they are only trained on labels. However, our baseline surpasses the Supervised method in Nodule localization as it generates higher resolution CAMs.
For localization accuracy, NIH uses a different evaluation method than ours (section 3.3). NIH method uses an adhoc CAM thresholding and bounding box generation approach. Therefore, in order to compare our baseline fairly with NIH, we implemented NIH (Using ResNet50 without transition layer) method and evaluated it using the procedure in section 3.3, which is shown as NIH* in Fig. 3.
4.2 Robust Model vs. Baseline
Quantitative Evaluation: Our Robust model (trained with ) shows improvement in localization accuracy for Cardiomegaly, Pneumonia, and Infiltration, yields lower accuracy for the Nodule class and comparable (still higher) results for the rest. Nevertheless, while quantitative results provide a means for measuring feature interpretability of the model on the entire test set, visually explainable features may still be essential for the clinical community.
Visual Evaluation: The robust model yields significantly more interpretable gradients with respect to the input image as seen in saliency maps in Fig. 1. These are vanilla saliency maps [10]
, the gradients are only clipped by three standard deviations and scaled to [0 1] and no further processing (e.g. smoothing) is performed. The effect of increasing the amount of perturbation
during adversarial robust optimization on visual interpretability is presented in Fig. 4. It can be seen that increasing the amount of perturbation during training steers the model toward focusing on the most salient feature of the image. It is also interesting in future research to study the effects of other perturbation sets such as rotations on feature interpretability.5 Conclusion
In this work, we demonstrated that adversarially robust optimization improves the feature interpretability of neural network classifiers both quantitatively and visually. Saliency maps of our adversarially trained models show significantly more interpretable features. The method does not have any dependency on the neural network architecture and the dataset. We also demonstrated that evaluating the model only using classification accuracy is not reliable since the high accuracy of a model could be due to its reliance on features that are not relevant to the pathologies.
References
 [1] Biffi, C., Oktay, O., Tarroni, G., Bai, W., De Marvao, A., Doumou, G., Rajchl, M., Bedair, R., Prasad, S., Cook, S., et al.: Learning interpretable anatomical features through deep generative models: Application to cardiac remodeling. In: International Conference on Medical Image Computing and ComputerAssisted Intervention. pp. 464–471. Springer (2018)
 [2] Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
 [3] Greenspan, H., Van Ginneken, B., Summers, R.M.: Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE Transactions on Medical Imaging 35(5), 1153–1159 (2016)

[4]
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4700–4708 (2017)
 [5] Li, Z., Wang, C., Han, M., Xue, Y., Wei, W., Li, L.J., FeiFei, L.: Thoracic disease identification and localization with limited supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 8290–8299 (2018)
 [6] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
 [7] Miotto, R., Wang, F., Wang, S., Jiang, X., Dudley, J.T.: Deep learning for healthcare: review, opportunities and challenges. Briefings in bioinformatics 19(6), 1236–1246 (2017)
 [8] Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz, C., Shpanskaya, K., et al.: Chexnet: Radiologistlevel pneumonia detection on chest xrays with deep learning. arXiv preprint arXiv:1711.05225 (2017)
 [9] Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Gradcam: Visual explanations from deep networks via gradientbased localization. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 618–626 (2017)
 [10] Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)

[11]
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy. stat
1050, 11 (2018)  [12] Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestxray8: Hospitalscale chest xray database and benchmarks on weaklysupervised classification and localization of common thorax diseases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2097–2106 (2017)
 [13] Yao, L., Poblenz, E., Dagunts, D., Covington, B., Bernard, D., Lyman, K.: Learning to diagnose from scratch by exploiting dependencies among labels. arXiv preprint arXiv:1710.10501 (2017)

[14]
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2921–2929 (2016)