FADL:Federated-Autonomous Deep Learning for Distributed Electronic Health Record

11/28/2018 ∙ by Dianbo Liu, et al. ∙ Harvard University 0

Electronic health record (EHR) data is collected by individual institutions and often stored across locations in silos. Getting access to these data is difficult and slow due to security, privacy, regulatory, and operational issues. We show, using ICU data from 58 different hospitals, that machine learning models to predict patient mortality can be trained efficiently without moving health data out of their silos using a distributed machine learning strategy. We propose a new method, called Federated-Autonomous Deep Learning (FADL) that trains part of the model using all data sources in a distributed manner and other parts using data from specific data sources. We observed that FADL outperforms traditional federated learning strategy and conclude that balance between global and local training is an important factor to consider when design distributed machine learning methods , especially in healthcare.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Electronic Health Record (EHR) data, patient generated health data from mobile devices and other health related information are valuable for improving health outcomes, especially for precision medicinekohane2015ten ; beam2016translating . However, there are many challenges in utilizing these data efficiently. One of them is data access. Healthcare records are stored in different locations and data silos, including but not limited to hospitals, pharmacies, payors, and personal devicesGoldstein2017OpportunitiesReview ; mohammed2010centralized ; bhatt2017internet ; islam2015internet . Traditionally, healthcare data distributed across sites centralized in a database for access for analysis liebowitz2017actionable ; holzinger2016machine ; hashem2015rise . However, healthcare data transfers are complex because of strict regulations and sensitivity of the data mandel2016smart . These hurdles not only make data utilization expensive but also slow down information flow in healthcare where timely updates are often important.

The process of using supervised machine learning for data analysis can be roughly divided into model training, where some datasets are used to optimize the model parameters, and prediction, where a trained model is used to make predictions on unseen databishop2012pattern . The motivation for federated or distributed machine learning is to train algorithms on different data sources in a distributed manner and aggregate the learned models (Figure 1)konevcny2016federated ; McMahan2016Communication-EfficientData ; konecny2016federated . In this paradigm, the algorithms that can learn from parts of the data are sent to each of the data sources for distributed training. Parameters of all the locally trained models are then sent back to the analyzer to build a new ensembled model. This cycle repeats for a certain number of iterations. The machine learning model can be designed in such a way that it will not be possible to retrieve individual-level data of patients from the model. Data-providing nodes retain health data within their institutional walls through this federated information flow. We used hospital ICU data as an example to demonstrate how federated machine learning can train models using data unevenly distributed on multiple sources, and propose a new method called Federated-Autonomous learning that balances global and local training.

Figure 1: Federated machine learning allows machine learning model be trained from multiple data sources without moving the data.

2 Dataset, study cohort and methods

The eICU Collaborative Research Database was developed by the by Philips eICU programpollard2018eicu and populated with data from 208 critical care units throughout the continental United States for patients admitted in 2014 and 2015. We included data from 58 hospitals comprising 1,264,89 ICU admissions with discharge status information (alive or expired) (Table 1). We developed a model that takes as inputs, the medications taken in the first 24 hours to predict mortality during the ICU admission. The cohort was administered 1400 different medicines in total. Binary information of whether a patient took each of the medicines in the first 24 hours after admission were used as input features. 5.5% of patients died during the stay. We used 70% of the ICU admissions for training, 10% as validation set, and 20% as test set.

Information Alive Deceased
0-18 4269 403
18-60 48777 1945
>60 66867 4558
Female 54286 3094
Male 65286 3789
Medical information
Length of stay in hrs, mean(std) 65.5(92.6) 98.3(162.5)
Number of drug started in the first 24 hours 12.9(10.0) 13.6(9.2)
Table 1: Study cohort (count by admission)

2.1 Neural network model for ICU mortality prediction

In order to make a binary prediction of mortality of each ICU admission, we built a 3-layer fully connected artificial neural network model with 500, 100, and 1 neurons in corresponding layers using ReLU activation at hidden layers and sigmoid activation at the output layer. Cross-entropy was used as the loss function for training. Patient mortality was used as the binary label with 0 for alive and 1 for deceased. Each layer was L2 regularized with

=0.01. The first training configuration, representing an upper bound, trains a centralized model in the traditional way, as if the ICU data from 58 hospitals can be moved to the same centralized database.

Figure 2: Federated autonomous deep learning (FADL) . Parts of the neural network were trained globally in a federated manner using all data sources. Other parts of the neural networks are specialized in each data source.

2.2 Federated learning on distributed hospital data

Next, to mimic the real world medical setting, we assume all the ICU data from each of the 58 hospital stays at its local hospital. To train the deep learning model, we sent out models with identical parameters to all simulated hospitals nodes. The models were trained locally within each hospital using only data from that hospital. Parameters of the models were then sent back to the analyzer for aggregation, by averaging the parameters weighted by sample size konevcny2016federated ; McMahan2016Communication-EfficientData ; konecny2016federated . After model aggregation, the updated model was sent out to all hospitals again to repeat the global training cycle. Formally, the weight update is specified by:


where is the combined parameter, is the number of data sources, is the number of admissions in the data source, is the total number of admissions, and is the parameters learned locally from the data source. is the global cycle number in the range of [1,T]. The objective function (cross-entropy) for this federated learning algorithm is:



is the feature vector with dimension

and is the binary mortality label. is the neural network model.

2.3 Federated-Autonomous learning

From the objective function in Equation 2, It can be intuitively understood that the aim of the of the original federated learning algorithm is to minimize the average classification errors across all the data sources (hospitals) using information from all sources simultaneously. However, when there are a large number of data sources with different amounts of data of different properties, it may be difficult to balance what the model learns globally with locally-relevant information from each data source. To tackle this challenge, we propose an FADL strategy where the first half of the neural network is trained globally using data from all sources and the second half of the neural network is trained locally to specialize in each data source (Figure 2). The algorithm is designed as in algorithm 1

. The same number of global cycles and local epochs were used for FADL and original federated Learning. It is worth emphasizing that both global and local training were conducted in a distributed manner , therefore, no data aggregation is needed.

3 Results

As the ICU mortality data are largely imbalanced, we used both AUCROC and AUCPR as measurements of accuracybuckland1994relationship ; davis2006relationship . When we conducted the model training in a centralized manner, the model was trained for 30 epochs with batch size=100. The model achieved AUCROC of 0.79 and AUCPR 0.21 on the test data (Table 2).

When assuming data are stored in a distributed manner and can not be moved for centralized model training, we first trained the neural network using original federated learning. The same model architecture was used for federated learning as centralized learning. The model was trained for 20 global cycles involving all the 58 data sources. In each global cycle, the model was trained on each data source for 5 epochs before aggregation. The model trained using original federated learning achieved AUCROC of 0.75 and AUCPR of 0.16. Next, we trained the model using a FADL strategy. FADL consist of two stages. In the first stage, all layers of the model were trained distributed across all sources, as in federated learning, for 10 global cycles with 5 local epochs each cycle. In the second stage, the parameters of the first neural network layer are fixed and only the 2nd and 3rd layers were trained for 50 epochs on each data source to generate 58 different models specialized for different hospitals. The first layer of the 58 models are identical and the 2nd and 3rd layers are different among models. When predicting mortality, each specialized model was used for corresponding hospital. Models trained using FADL perform at an AUCROC of 0.79 and AUCPR of 0.23 ,which is similar to our centralized learning model and superior to the federated learning model (Table 2).

Training method AUCROC AUCPR
Centralized learning 0.79 0.21
Original federated learning 0.75 0.16
Federated autonomous deep learning (FADL) 0.79 0.23
Table 2: Performance of centralized, original federated and Federated-autonomous learning
1:procedure FADL() features,labels, stage1 and stage2 length
2:     Initialize weight of NN model
4:     for t in  do First stage
5:         for i in  do Parallel training, K is the number of data sources
6:              train and obtain          
8:     Freeze the first layer of the neural Second stage
9:     for i in  do Parallel training
10:          is the trained model from first stage
11:         Train 2nd and 3rd layers of on data source K for epochs      
12:     return
Algorithm 1 Federated autonomous learning algorithm

4 Conclusion

We proposed a distributed neural network training method that balances global model training utilizing all data sources and local specialization that trains part of the model specifically on one data source. We showed that our FADL strategy outperformed traditional federated learning and had similar accuracy to centralized learning. The balance between global and local learning is an important factor to consider when designing distributed machine learning methods,especially on health dataliu2017deepfacelift ; Rudovic2018PersonalizedTherapy .


  • (1) Beam, Andrew L and Isaac S Kohane.

    “Translating artificial intelligence into clinical care,”

    Jama, 316(22):2368–2369 (2016).
  • (2) Bhatt, Chintan, et al. “Internet of things and big data technologies for next generation healthcare,” (2017).
  • (3) Bishop, Christopher M.

    Pattern recognition and machine learning, 2006,”

    60(1):78–78 (2012).
  • (4) Buckland, Michael and Fredric Gey. “The relationship between recall and precision,” Journal of the American society for information science, 45(1):12–19 (1994).
  • (5) Davis, Jesse and Mark Goadrich. “The relationship between Precision-Recall and ROC curves.” Proceedings of the 23rd international conference on Machine learning. 233–240. 2006.
  • (6) Goldstein, Benjamin A, et al. “Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review,” Journal of the American Medical Informatics Association (2017).
  • (7) Hashem, Ibrahim Abaker Targio, et al. “The rise of “big data” on cloud computing: Review and open research issues,” Information Systems, 47:98–115 (2015).
  • (8) Holzinger, Andreas. Machine Learning for Health Informatics: State-of-the-Art and Future Challenges, 9605. Springer, 2016.
  • (9) Islam, SM Riazul, et al. “The internet of things for health care: a comprehensive survey,” IEEE Access, 3:678–708 (2015).
  • (10) Kohane, Isaac S. “Ten things we have to do to achieve precision medicine,” Science, 349(6243):37–38 (2015).
  • (11) Konecnỳ, Jakub, et al. “Federated optimization: Distributed machine learning for on-device intelligence,” arXiv preprint arXiv:1610.02527 (2016).
  • (12) Konečnỳ, Jakub, et al. “Federated learning: Strategies for improving communication efficiency,” arXiv preprint arXiv:1610.05492 (2016).
  • (13) Liebowitz, Jay and Amanda Dawson. Actionable Intelligence in Healthcare. CRC Press, 2017.
  • (14) Liu, Dianbo, et al.

    “DeepFaceLIFT: Interpretable Personalized Models for Automatic Estimation of Self-Reported Pain,”

    Journal of Machine Learning Research, 66:1–16 (2017).
  • (15) Mandel, Joshua C, et al. “SMART on FHIR: a standards-based, interoperable apps platform for electronic health records,” Journal of the American Medical Informatics Association, 23(5):899–908 (2016).
  • (16) McMahan, H. Brendan, et al. “Communication-Efficient Learning of Deep Networks from Decentralized Data,” (2016).
  • (17) Mohammed, Noman, et al. “Centralized and distributed anonymization for high-dimensional healthcare data,” ACM Transactions on Knowledge Discovery from Data (TKDD), 4(4):18 (2010).
  • (18) Pollard, Tom J, et al. “The eICU Collaborative Research Database, a freely available multi-center database for critical care research,” Scientific data, 5 (2018).
  • (19) Rudovic, Ognjen, et al. “Personalized Machine Learning for Robot Perception of Affect and Engagement in Autism Therapy,” (2018).