ISTHMUS: Secure, Scalable, Real-time and Robust Machine Learning Platform for Healthcare

09/29/2019 ∙ by Akshay Arora, et al. ∙ 54

In recent times, machine learning (ML) and artificial intelligence (AI) based systems have evolved and scaled across different industries such as finance, retail, insurance, energy utilities, etc. Among other things, they have been used to predict patterns of customer behavior, to generate pricing models, and to predict the return on investments. But the successes in deploying machine learning models at scale in those industries have not translated into the healthcare setting. There are multiple reasons why integrating ML models into healthcare has not been widely successful, but from a technical perspective, general-purpose commercial machine learning platforms are not a good fit for healthcare due to complexities in handling data quality issues, mandates to demonstrate clinical relevance, and a lack of ability to monitor performance in a highly regulated environment with stringent security and privacy needs. In this paper, we describe Isthmus, a turnkey, cloud-based platform which addresses the challenges above and reduces time to market for operationalizing ML/AI in healthcare. Towards the end, we describe three case studies which shed light on Isthmus capabilities. These include (1) supporting an end-to-end lifecycle of a model which predicts trauma survivability at hospital trauma centers, (2) bringing in and harmonizing data from disparate sources to create a community data platform for inferring population as well as patient level insights for Social Determinants of Health (SDoH), and (3) ingesting live-streaming data from various IoT sensors to build models, which can leverage real-time and longitudinal information to make advanced time-sensitive predictions.



There are no comments yet.


page 4

page 5

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Scalable ML/AI driven solutions that are tightly integrated in the clinical decision support system [1] have the potential to improve outcomes, reduce cost, and increase efficiency in healthcare. While healthcare has made steady progress through the years in using data to make evidence-based clinical decisions through randomized clinical trials [2], ], it has lagged to adopt ML/AI driven solutions at scale. Developments over the last decade have made use of Electronic Health Records (EHRs) widespread in healthcare organizations, which has resulted in an abundance of structured and unstructured clinical and non-clinical data [3],but the industry has struggled to generate value from this data. The technical reasons for this slow uptake include uneven data quality, privacy, and confidentiality issues in building workflows, and a lack of interoperability between different systems [4]. To counter the concerns above and deploy ML predictive models in the clinical workflows, we developed a turnkey ML platform called Isthmus to seamlessly develop, test, deploy, evaluate, and retrain predictive models.

Developed under clinicians’ guidance and oversight from Parkland Health & Hospital System’s stakeholders, Isthmus balances the domain-rich customization needed for building clinical, predictive models in healthcare with the need of having a turnkey, secure, HIPAA compliant solution. Consequently, the platform reduces time to market to integrate and update AI/ML models to improve the quality and safety of a patient’s hospital experience and drive costs down by impacting both clinical and operational outcomes.

2 Methodologies

2.1 Isthmus Framework

Our vision was to build an end-to-end machine learning framework which would make predictive model development, deployment, evaluation, and retraining seamless by reducing the time to market for integrating clinical predictive insights in clinical workflows to make them actionable. To achieve the goals above we built a cloud hosted, machine learning platform called “Isthmus.”

Isthmus’ end-to-end workflow will reduce implementation time when there are multiple models developed and deployed on it,by reusing the environment toolsets, technologies, and data/feature engineering pipelines to either train or score using an already deployed pre-trained model.

Figure 1: Isthmus Framework: End-to-end cloud hosted machine learning platform for healthcare

Part of the motivation to build such a flexible but scalable and configurable framework was due to the curated set of data transformation techniques that data scientists must perform in terms of imputation, categorical encoding of continuous variables or aggregation of healthcare datasets before building them into features to train a predictive model

[5]. Our framework will reuse these techniques as packages, which can be invoked in the deployment flow so that there is consistency in the way features are created for model training and model scoring. Thus, there is standardization of the training and deployment/scoring workflow, which helps in quickly learning through prospective testing the key components which can trigger data or feature drifts as the model runs in a real environment.

Doing this in the same controlled environment, which can ingest either historical or real-time data through the same APIs or secure connection, is key to productionize this within healthcare providers’ and payers’ systems to drive value. To achieve this, we hosted the entire framework in a secure HIPAA compliant cloud infrastructure to deploy as a turn-key solution as outlined below.

2.2 Cloud Hosted Secure Healthcare Specific Turnkey Platform

Isthmus is hosted on a cloud-based infrastructure (Microsoft Azure Cloud Platform TM)[6]. This infrastructure is set up with all the state-of-the-art functionalities, including network security, data replication, disaster recovery, and fault tolerance necessary for a robust and enterprise-grade software-as-a-service (SaaS). Cloud resources – compute and storage, both leverage economies of scale to keep the technology costs at a realistic level without having a need to maintain a large number of on-prem resources. Thus, being cost-effective as well as scalable and configurable, Isthmus can be adopted by healthcare organizations and systems of all sizes.

2.2.1 Data Engineering Pipeline

Healthcare data by its very nature is highly complex, high dimensional. and of inconsistent quality. For this data to be useful, it needs a systematic data ingestion approach to collect, store and integrate data-driven insights into clinical and operational processes. To quickly ingest this multi-dimensional data and scale, we developed a configurable and flexible data ingestion pipeline solution, which enables the ingestion of all the relevant health data, including clinical data, claims data, Social Determinants of Health, and streaming IoT data. The data ingestion pipeline is also ready to allow the ingestion of genomics data and high-quality diagnostic imaging data as it becomes available.

The data ingestion pipeline is based on a simplified architecture (Figure 2), which enables user defined transformations for real-time scoring, cleaning, and de-duplication without requiring additional middleware. The raw data is pulled via RESTful API calls to the EHR’s API servers or through regular intervals of data fetch using secure file transfer process. Generally, these API servers are the hub for all the API requests, which facilitates the connection between the EHR organizational users and the operational database management system to stream near real-time data seamlessly as a JSON response through the web service APIs upon service request.

Figure 2: Data Orchestration Engine

The clinical data is mainly comprised of patient demographics, encounter related information, laboratory test result components, medication orders and laboratory vitals. These are pulled through APIs and are parsed through a configurable data munging script where the JSON objects are transformed using user defined templates that are specifically designed for each API response and are aggregated into patient level JSON responses. The pipeline is fully automated, and it ingests the patient level data in batch mode, where the batch size is based on SLA requirements. Thus, the pipeline is scheduled to trigger based on SLA requirements and it continuously pulls the data from the APIs and performs the desired transformation and filtering operations.

This patient level raw JSON data is then preprocessed using an imputation and filter logic, which transforms this data into clinically relevant features that are fed to the machine learning models using a scoring logic script to predict the risk of the acute care condition based on the pre-trained model.The scoring script generates the score response which encompasses the transformed features and the identified risk levels associated with the patient. These responses are aggregated in batch mode, and after cleaning, they are converted into SQL tables using a database operation script and ingressed into the PostgreSQL database. The data is stored in a very secure and reliable manner within the PostgreSQL database. The raw JSON responses are pushed to the Azure Data LakeTM to preserve the raw patient level information for audit purposes.

2.2.2 Configurability, Extensibility and Experimentation

Predictive models in the healthcare industry are discrete in nature, especially for acute care systems [11], where we observe a high complexity of imputation logic and disparate features. The rest of the workflow has unified commonalities for all the models with identical evaluation patterns. If the configuration of these models is distributed and model-specific, it will add an overhead of maintaining and versioning multiple configurations for a given model as well as different environments. This will also increase the cost of additional infrastructure to manage different models.

The Isthmus platform provides a unique way of deploying and executing a model workflow for scoring using a single code base, which can support multiple models and versions using a single configuration file as shown in figure 3. It is also designed to use a single infrastructure cluster to execute any number of scoring workflow pipelines in parallel and automate the scoring process using Continuous Integration and Continuous Delivery processes (CI/CD). The use of the configuration methodology facilitates easy upgrading to an existing model or serving a new model in the pipeline workflow as it has a very short delivery cycle.

Figure 3: Configuration based workflow

2.2.3 Security and governance

All data collected in the context of healthcare is subject to specific privacy and regulatory requirements. Depending upon the type of data, and the context of the data collection and use, it can be Personally Identifiable Information (PII) or Protected Health Information (PHI). This information is often the target of security breaches. This is another reason why every access, analysis, and appending (if any) of this data needs to be audited, monitored, and documented. In the US, compliance with the HIPAA privacy law was mandated among all entities which either create or access PHI. This, in addition to the mandatory business continuity safeguards to avert a possibility of the health system facing outages of care services, puts additional constraint on the traditional feature and data engineering pipelines in a machine learning system. Challenges with data security through traditional on-prem systems and support of life-critical applications are solved by the Isthmus platform by adopting a cloud-first strategy. Additionally, the following AzureTM enabled features utilized on Isthmus address the security concerns above.

Access Control
The Isthmus platform leverages unique cloud-based security policies, such as the Azure active directory-based service principal for access control as an identity to manage applications and hosted services on the cloud and handle sensitive information (PHI). This eliminates the need for user level login to the cloud applications. Role Based Access Control (RBAC) uses Active Directory policies for managing the authentication. Isthmus provides a single role-based access to multi-institutional EHR data.

Additionally, the platform also provides a comprehensive, immutable log management service with easy access across deployed applications using elastic search and KibanaTM dashboard. This ensures a single point of reference to test for any application or system level logs in a responsible manner. With the help of app-insight notifications, the Isthmus platform has provisioned real-time alerts for any configured event, including an exception in application or missing data from the source API to provide real-time alerts.

Server maintenance
In a traditional healthcare setting, on-premise infrastructure solutions are constricted by cost and complexity of maintaining the hardware, firewall, software licensing, and the additional overhead of patch management software and its upgrades. This integration capability is limited to automate any patch management process and to roll out the access policy in a distributed project environment.

The Isthmus platform is engineered to easily automate with a cloud-based solution in a limited build and setup time. Moreover, there are prebuilt images developed on this platform which can be customized based on the application requirements. Thus, it makes our infrastructure scalable to meet the minimum performance requirements as well as enables the cloud applications to be easily migrated/replicated to another environment.

2.2.4 Data replication and fault tolerance

Data replication and fault tolerance with an on-premise infrastructure needs enormous upfront investment. With an increase in processing and data scaling based on utilization, it becomes challenging to manage the business continuity needs. A distributed architecture application like a prediction model has serious challenges related to a single point of failure, which makes it substantially difficult to manage the continuous workflow without interruption when one or more of its components fail. This entails the need of the fault tolerant system design to prevent disruptions due to any single point of failure.

Figure 4: Disaster Recovery and Fault tolerance

The Isthmus application is engineered to overcome these shortcomings and has the capabilities to scale up and accelerate the prediction model workloads to meet the needs of high-performance computing, low-latency, high-bandwidth network communication, and memory-intensive requirements. We adapted this cloud-based solution to resolve the lift and shift the problems like infrastructure upgrade, scalability, transfer and deployment at multiple locations using automated processes and containerization. This has considerably reduced the cost of infrastructure and engendered flexibility for migration/deployment on the cloud environments with minimal application level changes for the code, database, and data model architecture.

Isthmus applications have been developed with well-defined replication graphs and disaster recovery strategies for their database and support systems by imposing identical servers running in parallel replication with a mirrored backup of database and system level logs to ensure high levels of data availability. These applications are designed using the micro-services based architecture to reduce the redundancies from all the key components by performing similar activities in each workflow.

2.2.5 Log management and alerting system

Logging is an essential component for any automated system as it serves as a flashlight on the black box module which is being deployed on a disturbed cluster. This logging information is generated in real-time on the Isthmus platform and helps in validating the stability of the system through warning and debug logs. This log data is fed to a high scale analytical engine (elastic search), which is a full-text search engine. This data is then neatly integrated with a visualization dashboard like KibanaTM and provides feeds to self-hosted web front applications using restful APIs. This visualization provides monitors and performance metrics based on application level logs of the automated pipeline for Predictive and analytical applications. This also ensures quality delivery of the model serving on this platform and a quick debugging capability for any production outage.

For any production environment which is automated, having a notification system is critical given the fact that no workflow/infrastructure is perfect. In addition to the log management system, a slack based notification service is also integrated with the Isthmus

platform to get real-time alerts about the production pipeline. This service has been deployed in real-time in production and is helping the engineering and the data science team to be fully aware of the live status of the pipeline and the patient risk scores. It is found to be useful to behave as the first line of action in the case of anomaly detection. The notification system captures both infrastructure and application failures/exceptions. Thus, this alerting system ensures immediate action and remediation in case of any failed events.

2.3 Model Evaluation

The platform is designed to be a generic multipurpose data science engine. The flexible architecture of this platform has allowed the plug in and use of functional decision-making modules that can run asynchronously without disrupting the integrity of the system. The prediction service on the platform can be leveraged by the model evaluation service where real-time predictions can be interpreted by the models on the fly thereby making it extremely useful for the data scientists and clinicians (or stakeholders) to get actionable insights.

2.3.1 Model Interpretability

The storage data lakes provisions access of historical data set while the model is running live in production. This data set is pushed to a model explainer script. This script helps in interpreting the model predictions by extracting the top contributing features that helped in making the predictions. This has been proved to be extremely useful by clinicians for making real-time decisions. It has also allowed them to provide necessary feedback to improve the existing models or generate newer models.

2.3.2 Silent Mode Testing

Before the model goes live in production, the model is validated in a silent-mode testing environment where the model performance is evaluated by the team of data scientists and clinicians to ensure that the model gives meaningful results and is ready for integration into actual patient care workflows. The Isthmus platform provisions this silent mode testing framework to follow all the best practices for the model deployment.

2.4 Model Retraining and Development

In a real-time setting, it is always important that the model is highly stable and easily configurable. Thus, to address this problem, the platform allows retraining capabilities using the same data that was ingested into the model through APIs. The platform leverages this data and helps to generate multiple versions of the model which can be deployed by simply editing the model signature. The platform facilitates the data scientists to perform statistical tests to keep these models updated with the new upcoming data streams.

These features make this platform unique as the same infrastructure with few modifications can be used for model deployment and model development. In addition to the functionalities mentioned above, there is a re-usability of the code and interoperability of the system which makes this platform highly scalable. Thus, this Isthmus platform is an end-to-end system which encompasses all the software development processes into a robust function system.

2.5 Multi-disciplinary team

Healthcare is not the only highly regulated industry, but it is an industry with a strong ethical, legal and moral underpinning to its regulations. Every clinical decision made either by a physician using either her judgement or the insights given to her by an ML system can have significant legal, financial, and/or life impacting consequences. Against this backdrop, there has been seminal work [8] conducted in demonstrating how individual-level data can be affected by historical prejudices and hence engender systematic biases in the decision of the ML system trained on it. Although there are many statistical methods to adjust for this unfairness, we believe that engaging clinical, operations and financial experts, and other stakeholders from the beginning is critical to ward off such biases and tune the algorithmic features to thresholds acceptable for the environment for which the model is being trained.

Figure 5: ML workflow with Clinical Decision Support System

Involving clinicians helps in not only understanding the intimate relationship between different features, but it also gives a realistic understanding of issues related to physician cognitive overload [9] to improve user experience and help navigate actionable insights. Unlike being a fire-and-forget scenario, healthcare model development processes for clinical interventions are iterative in nature and need both retrospective as well as prospective testing [10,11]. To that effect, we adapted traditional statistical model development to a multi-faceted process, which combines pure statistical validation with clinical validation and brings in elements of clinical decision support systems.

3 Discussions and Results

3.1 Trauma case study (end-to-end framework)

Trauma surgeons work in a high-stress world of rapid decision-making. One of the most critical decisions they must make is whether a patient is stable enough to operate on, or if the Trauma team needs to focus on providing life-saving care. Trauma surgeons must make this decision quickly and accurately, as operating could save or take the life of their patient. Traditionally, prediction of survivability in trauma patients has always been based on risk assessment algorithms like TRISS, RTS, ISS [16]. However, all these algorithms are dependent on the Abbreviated Injury Scale (AIS), an anatomical scoring system that only considers a patient’s injuries on arrival . There is not a standard system in place to track the trauma patient’s status continuously over time, which is necessary for clinical decision-making as the patient’s condition changes while in hospital.

In collaboration with Parkland Trauma Center Team, PCCI created a Trauma predictive model with the aim to monitor and guide clinical decision making during the first 12-72 hours of inpatient stay. This model accurately predicts mortality within the next 48 hours with 93 % accuracy in test data. However, to impact clinical decision-making

PCCI’s Trauma predictive model needed a platform to bring the model to fruition with predictions made periodically every hour. Without integration into clinical workflows, PCCI’s Trauma model predictions would just add noise to provider decision-making. It was critical that the Trauma Score would be easily accessible for each patient that the trauma surgeons needed to make a timely decision on how to intervene appropriately.

3.1.1 Isthmus as the Solution

Figure 6: Trauma Model Deployment Cycle

Every hour, up-to-the-second data is pulled directly from the EMR via an API and ingested securely into the Isthmus platform. In less than a minute, Isthmus cleans and transforms the API data, generates the Trauma Score, pushes it back to the EMR, and securely persists the data and results for later analysis. Providers can then view the score directly in the patient chart. Isthmus enables continuous, hourly predictions in the most critical window, 12-72 hours post-admission. Providers know they are always seeing a score based on up-to-date labs and vitals, so they can make decisions confidently. Using APIs and custom patient chart builds, the Trauma Score is delivered directly to the front page of the patient chart (an item called Storyboard that provides the most critical information for patient care in a column that persists as providers move through the patient chart). At a glance, providers can view and monitor the score, without a single click in the patient chart.

Isthmus has several abilities that have made it a perfect launchpad for the go-live of the Trauma model. These include its (1) ability to run near real-time predictive models end-to-end via data retrieval from APIs, (2)services to perform data cleaning, data transformations, feature engineering, scoring as per the need of the model, (3) functionality to integrate with EHRs to add the scores to patient chart, (4) mechanism to securely persist the data of the scored patients for future analysis, and (4) capability to monitor/ evaluate model’s performance.

3.1.2 Trauma Score Deployed

The Trauma Score has been running live at Parkland Hospital for 1 month, as of the time of publication. It is currently in a clinical validation stage, with providers using the score to validate their own assumptions. So far, the Trauma Score has been generated for 134 patients, correctly predicting 4 out of 4 mortalities. Anecdotally, providers say the score matches their own predictions, formed after a thorough read of the patient chart. Providers have indicated that promising results so far mean that soon they will begin to trust the score and use it for decision-making. This will save them critical minutes, allowing them to focus on operating or life-saving care over reviewing patient charts. These critical minutes could be the difference between life and death for high-risk patients. 111CDI was supported in part through a grant from Community Council of Greater Dallas (CCGD) and was developed collaboratively with CCGD, DFW Hospital Council Foundation and University of Texas at Dallas

3.2 Community Data Initiative (CDI) on Isthmus

The healthcare industry is converging around a common realization that understanding the full context of an individual and the community where s/he lives is critical to improving outcomes and lowering costs. Factors such as transportation access, safe neighborhoods, food availability, school performance, and economic opportunity have always been part of the conversation around healthcare but now, there is an urgency to quantify their impact on the health of individuals and communities. However, data for these social determinants of health is not easily accessible. Documenting social determinants of health is typically not part of the standard documentation workflow during an individual’s clinical encounter. And zip code level indicators are not specific enough to make meaningful interventions.

With help from partners across the Dallas community, we utilized the Isthmus platform to develop a block group level, longitudinal collection of about 60 key indicators that can be used to measure health, resiliency, and economic vibrancy of neighborhoods. We made these indicators available in an interactive format through a dashboard hosted on the same platform.

Figure 7: CWDI Infrastructure on Isthmus

By hosting everything on a single platform, we have been able to create a streamlined process for ingesting, cleaning, analyzing, and visualizing data. Through an automated pipeline using Apache NiFi, data is ingested into the blob storage, where data is cleaned according to predefined scripts. This cleaned data is eventually ingested to the PostgreSQL database management system, which is a free and open-source database. The dashboard tool pulls data from the PostgreSQL database management system.

To quantify the impact of CDI data on specific risk conditions, we evaluated its ability to predict the risk of pre-term birth when combined with clinical and claims data. This predictive model was developed at PCCI and uses social determinants of health (SDOH) features extracted from the CDI dataset. These SDOH features proved to be significant for making predictions with this model. Thus, the data from the CDI platform is helping the data scientists and machine learning engineers demonstrate a holistic view of factors that impact preterm birth. By including the data from external sources into a model that uses clinical data, the overall model performance has improved.

3.3 Internet-of-Things on Isthmus

The use of internet of things (IoT) applications in the healthcare industry is rapidly growing. The Isthmus platform has piloted the IoT data ingestion pipeline with indoor air quality sensors, where real-time data from these sensors’ APIs were ingested into the platform. This data was massaged and munged in batch mode using the data cleaning modules developed on the platform.

There was high reusability of the configuration which was developed for the machine learning system as the codebase was made generic to reduce redundancies across multiple projects on the platform. The IoT data is stored and maintained in a PostgreSQL database on the platform with fault tolerance and disaster recovery functionalities. The futuristic goal is to integrate this rich IoT data with the existing machine learning models to have additional features which are believed to improve the model predictions.

4 Conclusions

This paper introduced the Isthmus Platform, an end-to-end system for developing and deploying ML models. Using Isthmus, data scientists can use their familiar ML toolkits and libraries to create models, perform statistical tests, and deploy them.

We addressed the challenges of integrating ML models into application development and model sharing. The proposed architecture supports the sharing of pretrained models across different ML modules run-time environments. As illustrated by the case studies, Isthmus provides project level isolation and focuses on code reusability, rather than reinventing the development pieces. The platform demonstrated versatility in terms of serving a prediction service, ingesting IoT data, and integrating an SDOH feature engine.

In the future, we hope to (1) integrate real-time model training with streaming data by tuning the model- coefficients on the fly and (2) perform A/B testing and multi-armed bandit testing to ensure the model stability and reliability over time.

5 Acknowledgments

We are extremely grateful to Lyda Hill and the Parkland Foundation for providing financial support to design and build this platform to serve the needs of the New Parkland Hospital and its patients. We are also grateful to the leadership team at the Parkland Health and Hospital System for providing strategic guidance regarding how to best maximize the platform’s usefulness to Parkland and to the Parkland IT team for helping with various aspects of the platform build and integration with the EHR.

We would also like to express our deep gratitude to Dr. Manjula Julka and Dr. George Oliver, our clinician supervisors, for their patient guidance, enthusiastic encouragement, and useful critiques of this work. We would also like to thank Dr. Shelley Chang, for her providing her clinical expertise, advice, and assistance. Our grateful thanks are also extended to the PCCI leadership team for their support and encouragement throughout this journey. Finally, we would like to thank everyone on the Isthmus team for making Isthmus a success.

6 Disclaimer

Trademark products mentioned in this paper are properties of respective trademark owners.


  • [1] Jameson, J. L. Longo, D. L. Precision medicine–personalized, problematic, and promising. N. Engl. J. Med. 372, 2229–2234 (2015).
  • [2] Sterne JA1, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls In BMJ 2009;338:b2393, 2009.
  • [3] Bates, D. W., Saria, S., Ohno-Machado, L., Shah, A. Escobar, G. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. In Health Aff. 33, 1123–1131 (2014).
  • [4] Lukasz M. Mazur, PhD1,2,3; Prithima R. Mosaly, PhD1,2,3; Carlton Moore, MD1,2,4; et al Association of the Usability of Electronic Health Records With Cognitive Workload and Performance Levels Among Physicians In AMA Netw Open. 2019;2(4):e191709. doi:10.1001/jamanetworkopen.2019.1709, 2019.
  • [5] Alvin Rajkomar 1,2, Eyal Oren1, Kai Chen1, Andrew M. Dai1, Nissan Hajaj1, Michaela Hardt1, Peter J. Liu1, Xiaobing Liu1, Jake Marcus1, Mimi Sun1,

    Scalable and accurate deep learning with electronic health records

    In npj Digital Medicine (2018)1:18 ; doi:10.1038/s41746-018-0029-1, 2018.
  • [6] Microsoft. Microsoft Azure for Research—Microsoft Research In Microsoft Research [Internet]. [cited 22 Sep 2017]. Available from:
  • [7] D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips Hidden Technical Debt in Machine Learning Systems In NIPS Paper 5656, 2015.
  • [8] Charles Safran, Meryl Bloomrosen, W. Edward Hammond, Steven Labkoff, Suzanne Markel-Fox, Paul C. Tang, Don E. Detmer, Toward a National Framework for the Secondary Use of Health Data: An American Medical Informatics Association White Paper In J Am Med Inform Assoc. 2007 Jan-Feb; 14(1): 1–9. doi: 10.1197/jamia.M2273, 2007.
  • [9] Oladimeji Farri, MBBS,1 David S. Pieckiewicz, PhD,1 Ahmed S. Rahman, BS,1 Terrence J. Adam, MD, PhD,1,2 Serguei V. Pakhomov, PhD,1,2 and Genevieve B. Melton, MD, MA1,3 A Qualitative Analysis of EHR Clinical Document Synthesis by Clinicians In AMIA Annu Symp Proc. 2012; 2012: 1211–1220..
  • [10] Solon Barocas Andrew D. Selbst Big Data’s Disparate Impact In DOI:, 2014.
  • [11] Bret Nestor,Matthew B. A. McDermott Rethinking clinical prediction: Why machine learning must consider year of care and feature aggregation In 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada, 2009.
  • [12] Shuai Zhao1, 2, Manoop Talasila1, Guy Jacobson1, Cristian Borcea2, Syed Anwar Aftab1, and John F Murray1 Packaging and Sharing Machine Learning Models via the Acumos AI Open Platform In The 17th IEEE International Conference on Machine Learning and Applications (IEEE ICMLA’18), Orlando, Florida, USA, 2018.
  • [13] Parikh, R. B., Kakad, M.Bates, D. W. Integrating predictive analytics into high- value care: the dawn of precision delivery. In JAMA 315, 651–652 (2016).
  • [14] Krumholz, H. M. Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system. In Health Aff. 33, 1163–1170 (2014).
  • [15] Goldstein, B. A., Navar, A. M., Pencina, M. J. Ioannidis, J. P. A. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. 24, 198–208 (2017).
  • [16] Tohira, H., Jacobs, I., Mountain, D., Gibson, N. and Yeo, A Systematic review of predictive performance of injury severity scoring tools In Scandinavian Journal of Trauma, Resuscitation and Emergency Medicinevolume 20, Article number: 63 ,2012