Electronic Health Record (EHR) data, patient generated health data from mobile devices and other health related information are valuable for improving health outcomes, especially for precision medicinekohane2015ten ; beam2016translating . However, there are many challenges in utilizing these data efficiently. One of them is data access. Healthcare records are stored in different locations and data silos, including but not limited to hospitals, pharmacies, payors, and personal devicesGoldstein2017OpportunitiesReview ; mohammed2010centralized ; bhatt2017internet ; islam2015internet . Traditionally, healthcare data distributed across sites centralized in a database for access for analysis liebowitz2017actionable ; holzinger2016machine ; hashem2015rise . However, healthcare data transfers are complex because of strict regulations and sensitivity of the data mandel2016smart . These hurdles not only make data utilization expensive but also slow down information flow in healthcare where timely updates are often important.
The process of using supervised machine learning for data analysis can be roughly divided into model training, where some datasets are used to optimize the model parameters, and prediction, where a trained model is used to make predictions on unseen databishop2012pattern . The motivation for federated or distributed machine learning is to train algorithms on different data sources in a distributed manner and aggregate the learned models (Figure 1)konevcny2016federated ; McMahan2016Communication-EfficientData ; konecny2016federated . In this paradigm, the algorithms that can learn from parts of the data are sent to each of the data sources for distributed training. Parameters of all the locally trained models are then sent back to the analyzer to build a new ensembled model. This cycle repeats for a certain number of iterations. The machine learning model can be designed in such a way that it will not be possible to retrieve individual-level data of patients from the model. Data-providing nodes retain health data within their institutional walls through this federated information flow. We used hospital ICU data as an example to demonstrate how federated machine learning can train models using data unevenly distributed on multiple sources, and propose a new method called Federated-Autonomous learning that balances global and local training.
2 Dataset, study cohort and methods
The eICU Collaborative Research Database was developed by the by Philips eICU programpollard2018eicu and populated with data from 208 critical care units throughout the continental United States for patients admitted in 2014 and 2015. We included data from 58 hospitals comprising 1,264,89 ICU admissions with discharge status information (alive or expired) (Table 1). We developed a model that takes as inputs, the medications taken in the first 24 hours to predict mortality during the ICU admission. The cohort was administered 1400 different medicines in total. Binary information of whether a patient took each of the medicines in the first 24 hours after admission were used as input features. 5.5% of patients died during the stay. We used 70% of the ICU admissions for training, 10% as validation set, and 20% as test set.
|Length of stay in hrs, mean(std)||65.5(92.6)||98.3(162.5)|
|Number of drug started in the first 24 hours||12.9(10.0)||13.6(9.2)|
2.1 Neural network model for ICU mortality prediction
In order to make a binary prediction of mortality of each ICU admission, we built a 3-layer fully connected artificial neural network model with 500, 100, and 1 neurons in corresponding layers using ReLU activation at hidden layers and sigmoid activation at the output layer. Cross-entropy was used as the loss function for training. Patient mortality was used as the binary label with 0 for alive and 1 for deceased. Each layer was L2 regularized with=0.01. The first training configuration, representing an upper bound, trains a centralized model in the traditional way, as if the ICU data from 58 hospitals can be moved to the same centralized database.
2.2 Federated learning on distributed hospital data
Next, to mimic the real world medical setting, we assume all the ICU data from each of the 58 hospital stays at its local hospital. To train the deep learning model, we sent out models with identical parameters to all simulated hospitals nodes. The models were trained locally within each hospital using only data from that hospital. Parameters of the models were then sent back to the analyzer for aggregation, by averaging the parameters weighted by sample size konevcny2016federated ; McMahan2016Communication-EfficientData ; konecny2016federated . After model aggregation, the updated model was sent out to all hospitals again to repeat the global training cycle. Formally, the weight update is specified by:
where is the combined parameter, is the number of data sources, is the number of admissions in the data source, is the total number of admissions, and is the parameters learned locally from the data source. is the global cycle number in the range of [1,T]. The objective function (cross-entropy) for this federated learning algorithm is:
is the feature vector with dimensionand is the binary mortality label. is the neural network model.
2.3 Federated-Autonomous learning
From the objective function in Equation 2, It can be intuitively understood that the aim of the of the original federated learning algorithm is to minimize the average classification errors across all the data sources (hospitals) using information from all sources simultaneously. However, when there are a large number of data sources with different amounts of data of different properties, it may be difficult to balance what the model learns globally with locally-relevant information from each data source. To tackle this challenge, we propose an FADL strategy where the first half of the neural network is trained globally using data from all sources and the second half of the neural network is trained locally to specialize in each data source (Figure 2). The algorithm is designed as in algorithm 1
. The same number of global cycles and local epochs were used for FADL and original federated Learning. It is worth emphasizing that both global and local training were conducted in a distributed manner , therefore, no data aggregation is needed.
As the ICU mortality data are largely imbalanced, we used both AUCROC and AUCPR as measurements of accuracybuckland1994relationship ; davis2006relationship . When we conducted the model training in a centralized manner, the model was trained for 30 epochs with batch size=100. The model achieved AUCROC of 0.79 and AUCPR 0.21 on the test data (Table 2).
When assuming data are stored in a distributed manner and can not be moved for centralized model training, we first trained the neural network using original federated learning. The same model architecture was used for federated learning as centralized learning. The model was trained for 20 global cycles involving all the 58 data sources. In each global cycle, the model was trained on each data source for 5 epochs before aggregation. The model trained using original federated learning achieved AUCROC of 0.75 and AUCPR of 0.16. Next, we trained the model using a FADL strategy. FADL consist of two stages. In the first stage, all layers of the model were trained distributed across all sources, as in federated learning, for 10 global cycles with 5 local epochs each cycle. In the second stage, the parameters of the first neural network layer are fixed and only the 2nd and 3rd layers were trained for 50 epochs on each data source to generate 58 different models specialized for different hospitals. The first layer of the 58 models are identical and the 2nd and 3rd layers are different among models. When predicting mortality, each specialized model was used for corresponding hospital. Models trained using FADL perform at an AUCROC of 0.79 and AUCPR of 0.23 ,which is similar to our centralized learning model and superior to the federated learning model (Table 2).
|Original federated learning||0.75||0.16|
|Federated autonomous deep learning (FADL)||0.79||0.23|
We proposed a distributed neural network training method that balances global model training utilizing all data sources and local specialization that trains part of the model specifically on one data source. We showed that our FADL strategy outperformed traditional federated learning and had similar accuracy to centralized learning. The balance between global and local learning is an important factor to consider when designing distributed machine learning methods,especially on health dataliu2017deepfacelift ; Rudovic2018PersonalizedTherapy .
Beam, Andrew L and Isaac S Kohane.
“Translating artificial intelligence into clinical care,”Jama, 316(22):2368–2369 (2016).
- (2) Bhatt, Chintan, et al. “Internet of things and big data technologies for next generation healthcare,” (2017).
Bishop, Christopher M.
“Pattern recognition and machine learning, 2006,”60(1):78–78 (2012).
- (4) Buckland, Michael and Fredric Gey. “The relationship between recall and precision,” Journal of the American society for information science, 45(1):12–19 (1994).
- (5) Davis, Jesse and Mark Goadrich. “The relationship between Precision-Recall and ROC curves.” Proceedings of the 23rd international conference on Machine learning. 233–240. 2006.
- (6) Goldstein, Benjamin A, et al. “Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review,” Journal of the American Medical Informatics Association (2017).
- (7) Hashem, Ibrahim Abaker Targio, et al. “The rise of “big data” on cloud computing: Review and open research issues,” Information Systems, 47:98–115 (2015).
- (8) Holzinger, Andreas. Machine Learning for Health Informatics: State-of-the-Art and Future Challenges, 9605. Springer, 2016.
- (9) Islam, SM Riazul, et al. “The internet of things for health care: a comprehensive survey,” IEEE Access, 3:678–708 (2015).
- (10) Kohane, Isaac S. “Ten things we have to do to achieve precision medicine,” Science, 349(6243):37–38 (2015).
- (11) Konecnỳ, Jakub, et al. “Federated optimization: Distributed machine learning for on-device intelligence,” arXiv preprint arXiv:1610.02527 (2016).
- (12) Konečnỳ, Jakub, et al. “Federated learning: Strategies for improving communication efficiency,” arXiv preprint arXiv:1610.05492 (2016).
- (13) Liebowitz, Jay and Amanda Dawson. Actionable Intelligence in Healthcare. CRC Press, 2017.
Liu, Dianbo, et al.
“DeepFaceLIFT: Interpretable Personalized Models for Automatic Estimation of Self-Reported Pain,”Journal of Machine Learning Research, 66:1–16 (2017).
- (15) Mandel, Joshua C, et al. “SMART on FHIR: a standards-based, interoperable apps platform for electronic health records,” Journal of the American Medical Informatics Association, 23(5):899–908 (2016).
- (16) McMahan, H. Brendan, et al. “Communication-Efficient Learning of Deep Networks from Decentralized Data,” (2016).
- (17) Mohammed, Noman, et al. “Centralized and distributed anonymization for high-dimensional healthcare data,” ACM Transactions on Knowledge Discovery from Data (TKDD), 4(4):18 (2010).
- (18) Pollard, Tom J, et al. “The eICU Collaborative Research Database, a freely available multi-center database for critical care research,” Scientific data, 5 (2018).
- (19) Rudovic, Ognjen, et al. “Personalized Machine Learning for Robot Perception of Affect and Engagement in Autism Therapy,” (2018).