Benchmarking Differentially Private Residual Networks for Medical Imagery

05/27/2020 ∙ by Sahib Singh, et al. ∙ 1

Hospitals and other medical institutions often have vast amounts of medical data which can provide significant value when utilized to advance research. However, this data is often sensitive in nature, and as such is not readily available for use in a research setting, often due to privacy concerns. In this paper, we measure the performance of a deep neural network on differentially private image datasets pertaining to Pneumonia. We analyze the trade-off between the model's accuracy and the scale of perturbation among the images. Knowing how the model's accuracy varies among various perturbation levels in differentially private medical images is useful in these contexts. This work is contextually significant given the corona-virus pandemic, as Pneumonia has become an even greater concern owing to its potentially deadly complication of infection with COVID-19.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Pneumonia is an infection that causes inflammation in the alveoli of the lungs, and can be caused by numerous infectious agents. Often pneumonia is the resulting complication of an existing infection. Infectious agents include Corona-viruses, like SARS‑CoV‑2, Influenza Viruses, and various bacterial species. (Howley, 2020). According to the Centers for Disease Control and Prevention, there are about 250,000 hospitalization and about 50,000 deaths every year owing to pneumonia (Howley, 2020). Patient condition and immune response to pneumonia can vary based on the specific conditions and factors involved in the patient’s own health and physiology, as well as the characteristics of the infectious agent. (Howley, 2020). Key contributing factors include:

  • Pre-existing comorbidities and general state of overall health in the infected individual

  • Virulence level of the infectious organism.

  • The level of exposure to the infectious agent. Increased proximity leads to significantly larger risk of infection severity, along with chance of infection. This is due to the increased inhalation of the infectious agent through various mediums. (Howley, 2020)

The percentage of deaths attributed to pneumonia and influenza is 8.2% in the United States, exceeding the threshold of epidemic classification, which is 7.2% (2). Recently, deaths due to pneumonia have sharply increased due to the worldwide presence of COVID-19 and the SARS‑CoV‑2 virus. The rapid construction and evaluation of relevant models to track, diagnose, or support the treatment and mitigation of the COVID-19 is critical given global circumstances. Given the urgent need for these developments and the inherently sensitive nature of medical data, training and evaluating models while maintaining obfuscation of critical personal information in the data corpus is vital. The field of Differential Privacy approaches these constraints through various methods, including the direct obfuscation of data (Dwork et al., 2006a, b). In this work, we analyze the impact of differentially private datasets on the performance of a popular image classification model, Resnet (He et al., 2016; Szegedy et al., 2017). We compare the model performance on the Chest X-Ray Dataset (Kermany et al., 2018) with different levels of Perturbation while ensuring the images are differentially private. This analysis aims to aid medical professionals better understand the tradeoff between accuracy and data privacy, and may serve as a useful reference to better evaluate how sensitive information must be preserved while still ensuring the data remains useful for research purposes.

This text is organized as follows: First, we introduce Differential Privacy methods as they pertain to the our findings. Following this, the Experimental Design and corresponding Analysis of Results is covered. Finally, we discuss the significance of these findings and the potential future directions.

We would additionally like to acknowledge similar work which were done earlier in a more theoretical setting- (Mireshghallah et al., 2020) and (Fan, 2019). Our paper builds upon their work and applies it towards the Health-care domain, and is relevant in Medical Research in particular.

2 Methods

In this section we discuss the fundamental privacy preserving concepts used throughout the paper.

2.1 Differential Privacy (DP). (Dwork et al., 2006a, b)

The central idea in differential privacy is the introduction of randomized noise to ensure privacy through plausible deniability. Based on this idea, for , an algorithm is understood to satisfy Differential Privacy if and only if for any pair of datasets that differ in only one element, the following statement holds true.

Where and are differing datasets by at most one element, and

denotes the probability that

is the output by . This setting approximates the effect of individual opt-outs by minimizing the inclusion effect of an individual’s data.

However, one major limitation of this kind of Differential Privacy is that the data owners will have to trust a central authority, i.e. the database maintainer, to ensure their privacy. Hence in order to ensure stronger privacy guarantee we utilize the concept of Local Differential Privacy (LDP) (Bebensee, 2019). We say that an algorithm satisfies -Local Differential Privacy where if and only if for any input .

For the privacy loss is captured by . Having ensures perfect privacy as = 1, on the other hand, provides no privacy guarantee. The choice of is quite important as the increase in privacy risks is proportional to .

2.2 Laplace Distribution. (Dwork et al., 2014)

The Laplace distribution, also known as the double-exponential distribution, is a symmetric version of the exponential distribution. The distribution centered at 0 (i.e.

) with scale

has the following probability density function:

The variance of this distribution is


2.3 Laplace Mechanism. (Dwork et al., 2006b, 2014)

Laplace Mechanism independently perturbs each coordinate of the output with Laplace noise (from the Laplace distribution having mean zero) scaled to the sensitivity of the function.

Given and a target function , the Laplace Mechanism is the randomizing algorithm

where x is random variable drawn from a Laplace distribution , corresponding to perturbation.

corresponds to the global sensitivity of function , defined as over all dataset pairs that differ in only one element .

3 Experimental Result

The experiments discussed in this section used an 18-Layer Residual Network (ResNet) previously trained to achieve convergence on the ImageNet task. ResNets share many ideas with the popular VGG architecture

(Simonyan and Zisserman, 2014; Szegedy et al., 2017)

, with significantly fewer filters and overall decreased complexity. They make use of identity connections between sets of layers as a solution to the common problem of gradient signals vanishing during backpropagation in very deep networks

(He et al., 2016). The experimental setup consisted of training the selected model on an image classification task consisting of the Chest X-Ray dataset (Kermany et al., 2018). The Chest X-Ray dataset consists of approximately 5,800 images sourced from chest radiography, which is used by medical specialists to confirm pneumonia and other medical concerns, though they are not often the sole point of diagnosis. Different radiographic images taken at separate time intervals, such as before and during an illness, are often useful to physicians during the diagnosis. In general, these images form up an important part of an often multi-stage diagnosis process.

The dataset was used in a binary classification setting, with candidate data samples corresponding to either Normal or Pneumonia classes. The original Chest X-Ray dataset was directly used in this experiment, as well as 3 other versions generated by the addition of different perturbations to the images. These alternate, differentially-private datasets were generated by drawing random samples from the Laplacian Mechanism mentioned in (Dwork et al., 2006b) with and varying levels of scale i.e. .

These perturbations were added directly to the input image to create a noisy representation for subsequent training. These 4 datasets were used in different experiments to train the Resnet-18 model to convergence. To train the model on these images, some pre-processing steps were undertaken. Input images passed to the deep neural network were scaled to pixels, and normalized to 1. Therefore, the function defined in Section 2.1 is the identity function and the sensitivity is 1.

Figure 1:

Accuracy vs Training Epochs for Resnet-18

The experiments are all carried out using Python 3.8.2 and PyTorch 1.4.0. We trained separate instances of the pre-trained Resnet-18 model on these 4 image datasets, over 120 epochs. The best models from these runs were saved and analyzed. The tradeoff in accuracy with these varying scales of perturbations in the images was examined. Best model accuracies on both the train and test set are included in Table 1, along with the learning curves over 120 epochs of training in Figure 1. A validation set was used to tweak and improve training performance over the whole set of training epochs, and was excluded from the figures for clarity.

There are some interesting insights that emerge from the data included in Table 1 and Figure 1. These findings emphasize the intuition that accuracy clearly diminishes as scale of perturbation of the images is increased, represented by . This relationship generalizes to the whole training process across all 120 epochs. Another interesting finding is the behavior of the training and test curves at different perturbation levels with respect to each other. The best trade-off between model bias and variance seems to be with a value of 2, and the other perturbation levels also seem to exhibit better general training than that of the model trained on the original dataset, which demonstrates overfitting to the training data. This may highlight the potential for differential privacy methods in improving model generalization to unseen data, which may be useful under certain considerations.

4 Conclusion And Future Directions

In this paper, we provide empirical evaluation of a computationally tractable 18-Layer Resnet on a medically relevant classification task using the publicly available Chest X-Rays dataset. This is an effective approach in many medical contexts, where the interactions of the system are sufficiently complex and avoid easy to capture analysis. We find that differentially private noise mechanisms lead to generally different results with different perturbation quantities, and highlight the inherent trade-offs in these decisions. We also highlight interesting behavior in model performance on unseen data as a function of perturbation levels. This work also demonstrates the usefulness of transfer learning in privacy-preserving scenarios, as a pre-trained Resnet achieved good performance on another classification task across various perturbations.

There are several useful directions in benchmarking Privacy-Preserving methods on both medical and non-medical tasks where data is sensitive but progress is critical. Our future work on this topic will include the profiling of additional ML models, both Neural Network based systems and otherwise, across a variety of Privacy-Preserving settings, including Federated Learning. We also intend on understanding and improving the use of transfer learning in these settings where privacy is a necessary consideration. Another derivative research direction we intend on pursuing is the effect of various perturbation levels on entire neural network topologies, such as those generated through meta learning methods (Elsken et al., 2018). Furthermore, different data modalities including audio files, video, text, tabular data, and various forms of cyber data can also be benchmarked in a similar way.


  • B. Bebensee (2019) Local differential privacy: a tutorial. arXiv preprint arXiv:1907.11908. Cited by: §2.1.
  • [2] (2020) COVIDView week 13. Technical report Centers for Disease Control and Prevention(CDC). Cited by: §1.
  • C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor (2006a) Our data, ourselves: privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 486–503. Cited by: §1, §2.1.
  • C. Dwork, F. McSherry, K. Nissim, and A. Smith (2006b) Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pp. 265–284. Cited by: §1, §2.1, §2.3, §3.
  • C. Dwork, A. Roth, et al. (2014) The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9 (3–4), pp. 211–407. Cited by: §2.2, §2.3.
  • T. Elsken, J. H. Metzen, and F. Hutter (2018) Neural architecture search: a survey. arXiv preprint arXiv:1808.05377. Cited by: §4.
  • L. Fan (2019) Differential privacy for image publication. Cited by: §1.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 770–778. Cited by: §1, §3.
  • E. K. Howley (2020) What is coronavirus pneumonia?. Technical report US News and World Report. Cited by: 3rd item, §1.
  • D. S. Kermany, M. Goldbaum, W. Cai, C. C. Valentim, H. Liang, S. L. Baxter, A. McKeown, G. Yang, X. Wu, F. Yan, et al. (2018)

    Identifying medical diagnoses and treatable diseases by image-based deep learning

    Cell 172 (5), pp. 1122–1131. Cited by: §1, §3.
  • F. Mireshghallah, M. Taram, A. Jalali, A. T. Elthakeb, D. Tullsen, and H. Esmaeilzadeh (2020) A principled approach to learning stochastic representations for privacy in deep neural inference. arXiv preprint arXiv:2003.12154. Cited by: §1.
  • K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §3.
  • C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi (2017)

    Inception-v4, inception-resnet and the impact of residual connections on learning


    Thirty-first AAAI conference on artificial intelligence

    Cited by: §1, §3.