The availability of high quality public clinical data sets (Johnson et al., 2016; Pollard et al., 2018) has greatly accelerated research into the use of machine learning for the development of clinical decision support tools. However, the majority of clinical data remain in private silos and are broadly unavailable for research due to concerns over patient privacy, inhibiting the collaborative development of high fidelity predictive models across institutions. Additionally, standard de-identification protocols provide limited safety guarantees against sophisticated re-identification attacks (El Emam et al., 2011; Gkoulalas-Divanis et al., 2014; Kleppner and Sharp, 2009). Furthermore, patient privacy may be violated even in the case where no raw data is shared with downstream parties, as trained machine learning models are susceptible to membership inference attacks (Shokri et al., 2017), model inversion Fredrikson et al. (2015), and training data extraction Carlini et al. (2018).
In line with recent work Beaulieu-Jones et al. (2018); Vepakomma et al. (2018b), we investigate the extent to which several hospitals can collaboratively train clinical risk prediction models with formal privacy guarantees without sharing data. In particular, we employ federated averaging McMahan et al. (2017) and differentially private stochastic gradient descent McMahan et al. (2017, 2018); Abadi et al. (2016) to train models for in-hospital mortality and prolonged length of stay prediction across thirty one hospitals in the eICU Collaborative Research Database (eICU-CRD) Pollard et al. (2018).
1.1 Federated Learning
Federated learning McMahan et al. (2017) is a general technique for decentralized optimization across a collection of entities without sharing data, typically employed for training machine learning models on mobile devices. In the variant known as federated averaging
, each entity trains a local model for a fixed number of epochs over the local training data and transfers the resulting weights to a central server. The server returns the average of the weights to each entity and the process repeats. This satisfies an intuitive notion of privacy, since no entity shares data with the central server or with any other entity. However, federated learning alone provides no formal accounting for the privacy cost incurred via the communication of local model weights with the central server.
1.2 Differential Privacy
Formally, a randomized algorithm : with domain and range satisfies (, ) differential privacy Dwork et al. (2014) if for any two adjacent data sets , and for any subset of outputs ,
are defined by adding, removing, and modifying the data for one record. This formulation can be informally interpreted as one where the inclusion of a record does not affect the probability distribution over learned model weights by more than a factor, where bounds the probability of the restriction not holding. Notably, this notion allows us to bound and quantify the capability for an adversary to determine whether a record belonged to the training data set, regardless of their access to auxiliary information Dwork et al. (2014).
In practice, stochastic gradient descent can be made differentially private if the record-level gradients are clipped to a maximum norm
and the Gaussian noise with standard deviationadded to the mean of the clipped gradients McMahan et al. (2018)
over a batch of training data. The privacy loss over the procedure may then be accounted for with the moments accountantAbadi et al. (2016); McMahan et al. (2018) and Renyi differential privacy (Mironov, 2017). In this setting, the privacy cost of a training procedure is fully specified by the noise multiplier , the ratio of the batch size to the training set size, and the number of training steps McMahan et al. (2018). McMahan et al. (2017) demonstrate that it is straightforward to formulate federated learning in a way that is conducive to differentially private training if DP-SGD is used as the local optimization algorithm.
1.3 Related Work
Our work is most similar to Beaulieu-Jones et al. (2018) in that they also investigate decentralized and differentially private machine learning in the context of mortality prediction in the context of the eICU-CRD, but use cyclical weight transfer Chang et al. (2018) rather than federated averaging for distributed optimization. Another related technique is split learning Gupta and Raskar (2018); Vepakomma et al. (2018a, )
where the layers of a neural network are partitioned across several entities, enabling learning across entities that may contribute different data modalities without exposing the raw data or the local network architecture. As an alternative, recent workBeaulieu-Jones et al. (2019); Xie et al. (2018) has proposed the use of differentially private generative models to publicly release synthetic data with privacy guarantees.
All experiments are based on data derived from the eICU Collaborative Research Database Pollard et al. (2018), a freely and publicly available intensive care database containing data from 139,367 unique patients admitted between 2014 and 2015 to 208 unique hospitals. Each patient may have one or more recorded hospital admissions, each composed of one or more ICU stays.
We make predictions at 24 hours into hospital admissions that last at least 24 hours. We assign binary outcome labels for in-hospital mortality and prolonged length of stay if the patient dies during the remainder of the hospital admission or if the admission last longer than 7 days, respectively.
To construct a training set for supervised learning, we first partition the set of admissions by hospital and then split the data within each hospital by patient such that 80%, 10%, and 10% of the patients are used for training, validation, and testing, respectively. We allow for multiple hospital admissions per patient, but no patient exists in more than one partition within the same hospital. We retain all hospitals with greater than 1,000 hospital admissions in its corresponding training data set. This procedure produces a cohort of 65,509 labeled hospital admissions across 31 unique hospitals. The incidence of in-hospital mortality and prolonged length of stay in the aggregate population is 7.3% and 34.4%, respectively.
We construct a feature representation as a function of data recorded within each hospital stay up to 24 hours into the stay. We extract all lab orders, lab results, medication orders, diagnoses, and active treatments, as well as the patient age at admission, gender, ethnicity, unit type, and admission source. Lab results and age are binned into three and four bins, respectively. We aggregate over time, assigning a one for each feature if it is observed anywhere in the admission prior to 24 hours and a zero otherwise.
For all supervised learning tasks, we consider only logistic regression and feedforward networks with one hidden layer. We perform model selection on the basis of the area under the receiver operating curve (AUC-ROC) evaluated on the corresponding validation set following a grid search over relevant hyperparameters. Model performance is reported as the 95% confidence interval of the AUC-ROC on the corresponding test set derived via DeLong’s MethodDeLong et al. (1988). We similarly derive confidence intervals for the difference in the AUC-ROC between models to facilitate model comparisons.111It should be noted that this procedure produces a confidence interval for the difference in the AUC-ROC between models, taking into account the correlated nature of the predictions made by two models. The Adam Kingma and Ba (2014) optimizer is used in each case.
2.1 Experimental Design
We conduct a series of experiments designed to evaluate the relative benefits of centralized and federated learning, and the associated privacy costs, over learning using only local data at each hospital. We evaluate the following experimental conditions:
Local training with no collaboration. We identify a high performing model for each hospital using only data from that hospital following a grid search over learning rates, batch size, and hidden layer size if the model is a feedforward network.
Centralized training. We simulate the setting where all of the records are available in a central repository, selecting the best global model on the basis of the performance on the aggregated records and evaluate the model on the local data from each hospital.
Centralized training with differential privacy. We modify the centralized training procedure to use DP-SGD for optimization McMahan et al. (2018). Here we additionally search over the discrete grid of [0.1, 1, 10] for both the noise multiplier
and the gradient clipping threshold. We assess privacy in terms of the that results from training with a fixed .
Federated learning. We employ the federated averaging algorithm described in McMahan et al. (2017). For each round of federated learning, we conduct one epoch of training using the local data at each hospital and then synchronize the weights across all hospitals with an average. We maintain a record of the local performance at each hospital over the federated learning procedure and perform local model selection on the basis of the best validation AUC-ROC observed over the procedure. Model selection for the best federated hyperparameters is determined on the basis of the best mean local validation AUC-ROC across hospitals.
Federated learning with differential privacy. We repeat the federated averaging experiment as previously described, but use DP-SGD as the local optimizer at each hospital, similar to the algorithm described in McMahan et al. (2017). We experiment with fixed global DP-SGD hyperparameters and with local hyperparameters selected independently at each hospital. For the local hyperparameter search at each hospital, we use , , and selected log uniformly from , performing model selection on the basis of the DP-SGD hyperparameters that maximize local AUC-ROC in ten epochs of training without any collaboration. We then perform federated learning for ten rounds with the selected local DP-SGD hyper-parameters.
3 Results and Discussion
Prior to experimentation with differentially private training, we aimed to establish the efficacy of federated learning over centralized and local learning. We find that while there is often a benefit to federated learning over local learning, often attaining an AUC-ROC comparable with that of centralized learning, the improvements are often not large enough to be rendered statistically significant on the basis of the 95% confidence interval for the difference in AUC-ROC between either the central or federated model with the corresponding local model (Table 1). In particular, centralized and federated learning for prediction of prolonged length of stay improve on local learning for thirteen and twelve hospitals, respectively, whereas centralized and federated learning only benefit mortality prediction in seven and five cases, respectively.
When the records from all hospitals are aggregated for differentially private centralized training, it is feasible to attain relatively strong privacy guarantees () if and (Figure 1) with a relatively minor reduction in terms of the validation AUC-ROC at the end of training (prolonged length of stay 0.763 vs. 0.73; mortality 0.876 vs. 0.832). When attempting to perform federated learning in a differentially private manner, we find that even with DP-SGD hyperparameters selected on the basis of local training, the models derived from differentially private federated learning often perform poorly in terms of both AUC-ROC and , and that this effect is exacerbated for mortality prediction (Table S1). It is likely that a practical tuning strategy for differentially private federated averaging could be identified with further experimentation, but it is unclear if such a strategy would generalize to similar data sets and prediction tasks. This is problematic, for both this and related work, as neglecting to account for the privacy cost of model selection produces optimistic underestimates of the privacy costs Liu and Talwar (2018); Chaudhuri and Vinterbo (2013). In future work, it is of interest to conduct controlled experiments to directly compare our approach to cyclical weight transfer Beaulieu-Jones et al. (2018) and split learning Gupta and Raskar (2018); Vepakomma et al. (2018a, ) to gain insight into the relative efficacy of differentially private federated averaging over alternatives.
We thank Michaela Hardt and Abhradeep Thakurta for valuable mentorship and feedback. We further thank Steve Chien and all contributors to the Tensorflow Privacy project for enabling this work.
- Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, New York, NY, USA, pp. 308–318. External Links: Cited by: §1.2, §1.2, §1.
- Privacy-preserving generative deep neural networks support clinical data sharing. Circulation: Cardiovascular Quality and Outcomes 12 (7), pp. e005122. Cited by: §1.3.
- Privacy-Preserving Distributed Deep Learning for Clinical Data. External Links: Cited by: §1.3, §1, §3.
- The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets. External Links: Cited by: §1.
- Distributed deep learning networks among institutions for medical imaging. Journal of the American Medical Informatics Association : JAMIA 25 (8), pp. 945–954 (eng). External Links: Cited by: §1.3.
- A stability-based validation procedure for differentially private machine learning. In Advances in Neural Information Processing Systems, pp. 2652–2660. Cited by: §3.
Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.. Biometrics 44 (3), pp. 837–45. External Links: Cited by: §2.
- The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9 (3–4), pp. 211–407. Cited by: §1.2, §1.2.
- A systematic review of re-identification attacks on health data. PloS one 6 (12), pp. e28071. Cited by: §1.
- Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security - CCS ’15, New York, New York, USA, pp. 1322–1333. External Links: Cited by: §1.
- Publishing data from electronic health records while preserving privacy: a survey of algorithms. Journal of biomedical informatics 50, pp. 4–19. Cited by: §1.
- Distributed learning of deep neural network over multiple agents. Journal of Network and Computer Applications 116, pp. 1–8. Cited by: §1.3, §3.
- MIMIC-iii, a freely accessible critical care database. Scientific data 3, pp. 160035. Cited by: §1.
- Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §2.
- Committee on ensuring the utility and integrity of research data in a digital age. National Academy of Sciences, pp. 4. Cited by: §1.
- Private Selection from Private Candidates. External Links: Cited by: §3.
Communication-Efficient Learning of Deep Networks from Decentralized Data.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, A. Singh and J. Zhu (Eds.), Proceedings of Machine Learning Research, Vol. 54, Fort Lauderdale, FL, USA, pp. 1273–1282. External Links: Cited by: §1.1, §1, 4th item.
- A General Approach to Adding Differential Privacy to Iterative Training Procedures. External Links: Cited by: §1.2, §1.2, §1, 3rd item.
- Learning Differentially Private Recurrent Language Models. External Links: Cited by: §1.2, §1, 5th item.
- Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), Vol. , pp. 263–275. External Links: Cited by: §1.2.
- The eicu collaborative research database, a freely available multi-center database for critical care research. Scientific data 5. Cited by: §1, §1, §2.
- Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. Cited by: §1.
-  Reducing leakage in distributed deep learning for sensitive health data. Cited by: §1.3, §3.
- Split learning for health: Distributed deep learning without sharing raw patient data. External Links: Cited by: §1.3, §3.
- No Peek: A Survey of private distributed deep learning. External Links: Cited by: §1.
- Differentially Private Generative Adversarial Network. External Links: Cited by: §1.3.