Learning Patient Engagement in Care Management: Performance vs. Interpretability

06/19/2019 ∙ by Subhro Das, et al. ∙ ibm 0

The health outcomes of high-need patients can be substantially influenced by the degree of patient engagement in their own care. The role of care managers includes that of enrolling patients into care programs and keeping them sufficiently engaged in the program, so that patients can attain various goals. The attainment of these goals is expected to improve the patients' health outcomes. In this paper, we present a real world data-driven method and the behavioral engagement scoring pipeline for scoring the engagement level of a patient in two regards: (1) Their interest in enrolling into a relevant care program, and (2) their interest and commitment to program goals. We use this score to predict a patient's propensity to respond (i.e., to a call for enrollment into a program, or to an assigned program goal). Using real-world care management data, we show that our scoring method successfully predicts patient engagement. We also show that we are able to provide interpretable insights to care managers, using prototypical patients as a point of reference, without sacrificing prediction performance.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Care management is a patient-centered approach to population health that is “designed to assist patients and their support systems in managing medical conditions more effectively.” (Center for Health Care Strategies, 2007) The aims of care management decision support (CMDS) may include: (a) identifying populations with modifiable risks, (b) aligning care management services to population needs, and (c) identifying and training personnel to deliver care management services (Agency for Healthcare Research and Quality, 2015). The focus of our paper is on the first of these three aims. To improve the identification of populations with modifiable risks, the Agency for Healthcare Research and Quality (AHRQ) summarized a set of recommendations (Agency for Healthcare Research and Quality, 2015). Among the recommendations was that researchers should investigate (a) the benefit of care management services to different patient segments and (b) the parameters that affect modifiable risks.

With respect to the AHRQ recommendation for achieving the benefits of CMDS, understanding patient segments to drive patient engagement is an inextricable part of the equation for success. By patient engagement, we refer to “the actions individuals take to obtain the greatest benefit from the health care services available to them.” (Michigan Care Management Resource Center Home, [n. d.]) In addition, it is also essential to develop methods that can identify segment-differentiating parameters that affect modifiable risk factors. These risk factors contributed to a significant portion of global disease burden, especially those with chronic conditions and those who are transitioning from one care setting to another (Tuomilehto et al., 2001; West et al., 1997; Brown et al., 2012). The return on investment of care management depends not only on how much the patient stands to gain from a clinical perspective, but also on how likely the patient is to actively engage in care management interventions.

At the same time, we recognize that, despite its high potential, the adoption of machine learning methods in CMDS scenarios has been slow. This is partly due to the gap between how humans and machines make decisions – the “black-box” nature of high-performing ML methods. To bridge the gap, there is a recent push for more studies in model explainability in AI/ML to help human decision makers understand how the insights were derived and how they can act on the data-driven insights

(DARPA, [n. d.]; Caruana et al., 2015).

As these real-world applications need to work in production environments that often do not come with clearly defined data schema, we will further discuss how to incorporate the developed methods in a Behavioral Engagement Scoring (BES) pipeline (including dynamic feature engineering and API) to enable its applications on care management transaction records in a schema-agnostic fashion. The pipeline developed is expected to enhance decision support for care managers, by helping them prioritize their efforts based on eventual outcomes/success, but not just clinical health risk. Our methods are informed and validated using real-world care management records for patients who are either transitioning from “hospital to home” or are eligible for a chronic disease management program.

The rest of this paper is organized as follows. We will first discuss how the current study is positioned in the related work in CMDS application and model interpretability. Then, we will introduce the CMDS dataset used in this study, as well as the data-driven methods developed for two CMDS tasks: program enrollment and goal attainment. We will discuss the trade-off between model interpretability and performance observed in the development of machine learning models for identifying explainable engagement behavioral profiles and for personalized engagement scoring from real-world care management data.

2. Related Work

In this paper, we focus our investigation on implementing quantitative, data-driven ML methods to identify patient segments who are most likely to engage in care management, and on care manager decision support for explaining for why they may or may not engage. While a few quantitative survey-based methods for measuring patient engagement exist Graffigna and Lozza (2015); Hibbard and Tusler (2004), no prior studies have successfully measured patient engagement from real-world data.

Moreover, although prior studies have attempted to identify risk stratification and disease progression parameters that differentiate clinical risk and longer-term outcomes (Luo and Rumshisky, [n. d.]; Liu et al., 2018), to our knowledge, this is the first study that has proposed to learn engagement strategies from data, based on the understanding of quantifiable difference of patient engagement levels and segment-differentiating parameters that affect modifiable risk factors.

To bridge the gap between human and machine understanding in the context of CMDS, this study has also explored the potential effect of providing explainable insights on the performance of our models. Recent reviews (Hsueh et al., 2017; Lakkaraju et al., 2016) have shown that a majority of studies in model interpretability are tied to the optimization of certain model properties that are presumed to be beneficial for improving human understanding of the models. Among them, many studies focused on reducing model complexity. Examples include applying regularization operators to reduce the number of parameters (Feldman, 2000), restricting policy search to a subspace of simpler forms (Sridharan and Tesauro, 2002; Hu et al., 2017), bringing semantically similar items together (Dey et al., 2012)

, re-training easier-to-understand models to classify on the results obtained by black-box models for model-agnostic explanation

(Ribeiro et al., 2016).

To make AI/ML more “actionable” for health decision makers (either health professionals or patients themselves), human-computer interaction researchers have been conducting qualitative studies to identify interpretability-impeding confounders and understand individual differences (Robins et al., 1994)

. With the emergence of deep learning approaches in AI/ML, more studies are now learning patterns that can be represented in explicitly presentable formats (e.g., temporal visualization

(Choi et al., 2016), natural language rationalization explanations (Lei et al., 2016)), as well as developing interactive tools to untangle models learned in a high-dimensional space (google, [n. d.]).

To further differentiate engagement strategies from CMDS data, in this paper we particularly focus on methods that account for case-based reasoning to improve the interpretability of clustering models, e.g., selecting prototypical cases to represent the learned clusters (Kim et al., 2014). In particular, we incorporate locally supervised metric learning (Sun et al., 2010) and prototypical case-based reasoning in a machine learning model to identify explainable engagement behavioral profiles and to produce personalized engagement scores.

The main contributions of our paper are as follows: First, we present a quantitative and personalized approach to identifying patients who are more likely to engage in care management, and demonstrate empirically, using real-world data, that our methods provide more accurate engagement behavior predictions compared to a ”one-size-fits-all” population approach; Second, these insights are made explainable by identifying prototypical patients within a personalized patient segment; we show that, in our case, explainability does not come at the expense of model performance.

3. Data

3.1. Care Management Decision Support

For patients with complex care needs, it is important to coordinate across the patients’ care givers and providers to account for the differing advice received from clinicians, the varying medications, and the adverse drug events (Long, 2017). In practice, this is often achieved by implementing structured care programs, in which a predetermined set of rules are given to care managers to coordinate with patients and bridge care gaps between hospital and home. Care management history, therefore, captures the important transactions between care managers and patients during the care coordination process, and is an important and growing source of data for behavioral understanding.

The CMDS workflow from which this dataset was derived is depicted in Fig.  1. At the center of this figure is the Care Manager (e.g., a licensed nurse, social worker or other certified specialist), who attempts to engage the patient, typically via the telephone, and whose primary objective is to influence modifiable and prioritized risk factors, as identified by the Patient’s engagement strategy. The Care Manager receives her assigned pool of patients from the Quality Director, whose primary objective is to align care manager skills with patient needs, and to determine the appropriate care strategies. Finally, the Patient responds to the Care Manager’s feedback and coaching and may provide his/her own input on the goals to be set and how to achieve them. The interactions between the care manager and patient are captured in both structure and unstructured format. In the current study, we use only the structured data contained in the care management transaction records.

Figure 1. Care Management flow.

Care Management flow.

Figure 2. Program enrollment timeline
Figure 3. Goal attainment timeline.
Figure 2. Program enrollment timeline

3.2. Care Management Records

We apply our method to care program logs of a private, not-for-profit healthcare network, including 4,504 transition of care and 440 chronic care patient interactions over a 22-month period. Those program engagement records were collected between December 2015 and October 2017. For each patient engagement timeline, we extracted 53 features ranging from the basic demographic information (e.g., age, gender), to the patients’ care program context (e.g., program experience, whether the patient enrolled in the program, days in the program, number of days until completion of the program) and the interactions between care managers and patients (e.g., the date when the recorded call occurred).

We then prepared datasets with respect to the two realworld tasks we aim to apply the BES pipeline for decision support: program enrollment (“ENROLL”) and goal attainment (“GOAL”).

3.3. Program Enrollment

Table 1 summarizes the CM records used to generate the ENROLL dataset from all patients with different enrollment status for each assigned care program. The type of assigned care programs include those transitioning from hospital to home after being discharged from the hospital (”Transition”) and those programs involving chronic disease management process (”Involve”).

Status Involve (Chronic Care) Transition
Completed Program 30.00% 67.30%
DidNotEnroll 4.09% 5.71%
Disenrolled 42.27% 25.95%
Enrolled 23.64% 1.04%
Table 1. Enrollment status of patients.

The structural CM transaction records capture the dates when a care program is assigned, started and ended. These records also contain the indicators of whether the care program is completed with its goals attained, or ended pre-maturely. The assigned program takes 16 days to complete on average and 279 days to complete at the maximum. The structural CM records of program enrollment also show that enrolling into the programs on average takes one day and a maximum of 15 days. Fig 4 summarizes the program enrollment status based on the day of making the recorded call to the target patient. We observed most of the decisions are made in the calls made in the starting of the week.

Figure 4. Enrollment distribution over the week.

3.4. Goal Attainment

The GOAL dataset is composed of 28 different goals which we classified into six focus areas: Educational (e.g., demonstrates understanding of post discharge, diabetes education), Implementation (e.g., adequate functional, transportation , support for healthy coping), Medications (e.g., adherence with medication regimen), Reducing Risks (e.g., resolving care gaps), Self Care (e.g., understands benefits of/demonstrates being physically active, healthy diet needs, failure of symptoms management), and Other (e.g., effective care transition and management plans). Fig 5 summarize the goals assigned based on the age category.

Figure 5. Goal assignment across different focus areas.

Every patient has multiple goals to achieve: 90% of the patients have fewer than 3 goals, and 65% patients have fewer than 2 goals. The goal attainment status is indicated by a binary flag (i.e., 1 for meeting the goal and 0 for otherwise). Table 6 summarizes the goal attainment percentage (as indicated by the status shown in the CM records) for each goal focus area, i.e., the number of goals whose status has been shown as ’met’ divided by the total number of goals assigned for each key area.

Focus Areas Coaching Coordination Education Referral Screening Tracking Other Total Status Met
Educational 138 18 277 100 65 0 23 621 81.62%
Implementation 3 192 4 7 0 0 0 206 98.54%
Medications 7 96 30 7 90 0 10 240 84.91%
Reducing Risks 0 1 0 545 53 0 1 600 97.00%
Self-care 130 0 189 12 116 0 49 496 78.69%
Other 29 0 2561 14 29 12 19 2644 99.19%
Total 307 307 3061 685 353 12 102 4827
Figure 6. Goal attainment records of patients distributed across focus areas & intervention categories.

The interventions of each goal area are grouped into seven categories: Referral (e.g., referral to see a nutritionist for diabetes diet education), Education (e.g., educate patients on the importance of physical activity), Coordination (e.g., follow up with providers on refills), Screening (e.g., assess breathing symptoms), Coaching (e.g., provide a log for side effect recording), and Other (including following up with provider treatment).

4. Behavioral Engagement Scoring Pipeline

In this paper, we aim to address two key questions: (1) What are the patient segments that lead to the difference of engagement benefits of CM services? (2) What drive differential behavioral responses in CMDS? To answer these questions, we develop a Behavioral Engagement Scoring (BES) pipeline to identify the engagement outcome-differentiating factors in care management. The task of engagement scoring serves as a key step in the pipeline to quantify patient engagement tendency for care plan personalization and downstream decision support.

Specifically, the BES pipeline is composed with four components: 1) dynamically extract behavioral features and outcomes based on care management transaction records, 2) apply engagement outcome-driven feature transformation through locally supervised distance metric learning, 3) uncover distinctive patient segments and their behavioral profiles (including prototypical users) based on hierarchical clustering, and 4) learn a BES scorer for each behavioral profile based on a Generalized Linear Model (GLM) to estimate the propensity to respond, e.g., whether a patient is inclined to enroll in a certain program, or complete a goal, given the intervention assigned by his/her care manager.

4.1. Dynamic Feature Engineering

In addition to the main research goal of answering the two key questions, another developmental goal of BES is to enable scalable and flexible engagement scoring over incoming provider data so as to assist in the devising of engagement strategies for real-life CMDS tasks in a production environment. To support this developmental goal, the component of dynamic feature engineering provides a common data model and standard run-time that supports a standards-based analytics environment so as to allow feature generation rules to be written once and used repeatedly with different provider data sources.

Feature engineering is the process that converts raw data into explanatory factors. These factors are used in the BES pipeline to train engagement scoring models. A multitude of knowledge-based and data-driven approaches are available for feature generation. On one hand, knowledge-based features can be generated from the literature on related topics. On the other hand, data-driven approaches are applied to convert elements of the raw data into features, and then to train models for understanding which features are the most important in gauging patient engagement level.

In this study, we adopt a hybrid knowledge-augmented, data-driven approach to perform feature engineering. To save time of manual feature generation process and enable better model generalizability across CM data from different providers, we further automate the BES pipeline to generate features in a provider-agnostic fashion, through implementing a back-end logic module with embedded rules that contain knowledge-based rules for converting features from data based on a universal schema coded in configuration files.

The dynamic feature generation process has further resulted over 700 features. This component has also employed an automatic feature selection procedure based on L1 and L2-based regularization.

4.2. Engagement Outcome-driven Distance Learning

The primary motivation to learn an engagement outcome-driven distance metric is to project patient feature-based vectors onto a subspace wherein patients with similar engagement outcomes are closer to each other, whereas those with opposite engagement outcomes are far away from each other. In this transformed vector subspace, we could then further identify cohesive behavioral profiles (using clustering) that drive differential patient responses.

In this study, we adapt Locally Supervised Metric Learner (LSML) (Sun et al., 2010, 2012)

, which helped estimate the engagement outcome-adjusted distances among patients in the newly transformed vector subspace. The patient features extracted using the protocols mentioned in Subsection 

4.1 are represented as , where is the vector space, and the class labels . For the task of program enrollment, the outcome variable is , if the status is ”Enrolled” or ”Completed”, and, , if the status is ”DidNotEnroll” or ”Disenrolled”. Similarly, for the task of goal attainment, the outcome variable is , if the status is ”Met”, and, , otherwise. Considering there are  program enrollment records and -dimensional features, then the feature matrix is . A similar operational definition is also applicable to goal attainment.

Here, we consider a generalized Mahalanobis distance, ,

(1)

where, is a positive semi-definite matrix. We aim to minimize the following distance  over the matrix :

(2)

where, , the homogeneous neighborhood of , is the nearest data points of with same outcome, and, , the heterogeneous neighborhood of , is the nearest data points of with opposite outcomes. Since  is positive semi-definite and symmetric, it can be decomposed as . The  that minimizes (2), renders the data into the desired space, where records with similar outcomes are compact and those with opposite outcome are distant,

(3)

Refer to (Sun et al., 2012) for the complete LSML algorithm that derives , the feature transformation matrix. We employed  to obtain the projected feature set, ,

(4)

For the rest of the pipeline, we leverage the outcome-adjusted projection of features  (4) and the corresponding Mahalanobis distance (1) between patient-based vectors to learn patient segments and to estimate each patient’s propensity to respond to care managers’ interventions.

4.3. Learning Patient Segment-based Behavioral Profiles

Because we hypothesize that their exists patient segments where within each segment they tend to exhibit certain levels of similarity, we aim to capture that similarity into behavioral profiles and understand engagement-indicative patterns with respect to the profiles they fit in. However, it is not known ”a priori” what is the optimal number of patient segments to be clustered into. With that objective and constraint in mind, hierarchical clustering (Friedman et al., 2001) is employed to identify patient segments on the outcome-adjusted distances in the newly projected vector subspace and learn the key factors that drive the differential engagement outcomes.

Among a variety of linkage methods for hierarchical clustering, we choose Complete Linkage to compute inter-segment similarity  of the furthest pair from segments, say  and , as,

(5)
(6)

Experimentation with other linkage methods, e.g., Ward’s method, further confirms that the Complete Linkage method uncovers patient segmentation leading to the highest engagement outcome-differentiating power (as measured by ANOVA scores across segments). The remaining challenge is thus to determine the number of segments. As such, an automatic tuning algorithm, Elbow method (Ketchen and Shook, 1996), is applied to compute the optimal number of segments. The Elbow method tracks the acceleration of distance growth among segments and thresholds the agglomeration at the point where the acceleration is the highest. Hence, our population is clustered into  segments, , each capturing distinctive patterns that drive differential patient responses in engagement.

In addition, as we expect it to be easier to interpret patient need from behavior profiles by examples, we propose to identify prototypical patients in each of the patient segments. The prototypical patient cases in each segment is expected to serve as examples to showcase the distinctive patterns in their behavioral profile. This will help interpret the engagement scores output by the BES pipeline. The prototypical patient cases, , are defined as the  subjects with positive engagement outcome, i.e., , who are closest to the centroid of each patient segment  in the engagement outcome-adjusted vector subspace,

(7)

where,  is the centroid of the patient segment . In our analysis, we have chosen . The advantages of having a prototypical case-based component in the pipeline is illustrated in Figure 9 using a synthetic set of 2-D data.

The advantages of using prototypical patient cases include: (a) removing model training noise due to the ambiguous cases near the segment borders, and (b) improving computation efficiency as it takes significantly less run-time to update the models learned on a significantly reduced set of data.

4.4. Estimating propensity to respond

For each of the patient segments projected on the transformed vector subspace, we learn a separate generalized linear model (GLM) to compute engagement scores for each patient in that segment who has been assigned a program to enroll or a goal to attain. The engagement scores are the estimation of each patient’s propensity to respond to his/her care manager’s engagement calls or interventions. The GLM for each segment  is represented by

(8)

where the feature weights  are computed by minimizing the least squared errors over all the data points from the segment 

. Using the optimized feature weights for each segment, the propensity to respond is estimated for each patient based on his/her features. We then use the computed engagement scores to train a Support Vector Machine (SVM) classifier to predict the engagement outcome of each patient. The feature weights of the GLM also provide us more explainable insights specific to each patient segment and to the patients belonging to that segment.

5. Results & Discussion

In this study, we introduce the Behavioral Engagement Scoring pipeline to gauge patient engagement level based on patient segmentation and identify distinctive patterns driving differential responses to engagement. The pipeline is designed to (1) uncover patient segments that lead to the difference of engagement benefits of care management services, (2) identify behavioral profiles that drive differential engagement responses, and (3) enable scalable and flexible engagement scoring in a production environment for real-life care management tasks at each touch point.

The BES pipeline first segment patients based on the patterns exhibited during patient-CM interactions and engagement outcomes. Our hypothesis is that although each feature contains only weak signals to differentiate overall engagement outcomes, when considered collectively in a segment, the combined feature sets can explain rich engagement behaviors for care planning.

Take the task of program enrollment for example. The BES pipeline first identifies a group of five patient segments from the ENROLL dataset, each of which is found to associate with one behavioral profile of distinctive engagement characteristics. Each segment is then exemplified by incorporating information about the prototypical patient cases (as defined as a subset of top 20 patient cases that are the most representative of the identified segment). The BES pipeline also identifies segment-specific interaction patterns that are specific to program enrollment behaviors for further interpretation and trains GLM models for predicting engagement outcomes. The same is then repeated for training the BES pipeline for the task of goal attainment from the GOAL dataset.

5.1. Performance evaluation on engagement outcome prediction

Method Prediction Program Enrollment Goal
Performance Attai
Metrics nment
Involve Transition
(Chronic
Care)
Population Based (BASELINE) Accuracy 0.96 0.96 0.89
Precision 0.95 0.94 0.93
Recall 0.99 1 0.95
F1 0.97 0.97 0.94
Behavior Profile-Driven Accuracy 0.93 0.74 0.79
Precision 0.95 0.94 0.92
Recall 0.96 0.65 0.83
F1 0.95 0.76 0.87
Prototypical User-Driven Accuracy 0.93 0.74 0.95
Precision 0.95 0.93 0.99
Recall 0.96 0.65 0.96
F1 0.95 0.76 0.97
Table 2. Performance comparison with 5-fold cross-validation.

To evaluate the performance of the BES pipeline, the predicted engagement outcome of each patient task is compared with what actually happened as indicated in the care management transaction records. The results across all patient tasks are then aggregated to evaluate the overall performance of models in terms of precision, recall, accuracy and F1-score.

Two versions of the BES pipeline are evaluated to understand the trade-off between model performance and interpretability. The first ”Behavioral Profile-Driven” version trains engagement scoring models for each of the patient segments using all available data belonging to patients in that segment. The second ”Prototypical User-Driven” version trains models using only the prototypical patient cases. The performance of the two BES pipelines are also compared with the baseline condition, wherein the ”Population-Based” version trains a SVM classification model for engagement outcome prediction using all available data without differentiating patient segments.

Performance evaluation with 5-fold cross validation is shown in Table 2. Results show that the BES pipeline yields engagement response prediction models of high precision, which implies a high percentage of successful engagements if following the BES recommendations for prioritization. It helps predict patient responses (“whether to engage”) for each type of engagement tasks with high precision (¿90%).

We also explore the potential effect of providing explainable insights on the performance of our models. The precision-based performance metrics of both the Behavioral Profile-Driven and Prototypical User-Driven version are comparatively similar or better than the BASELINE, e.g., training engagement scoring models from the entire population data. The results are encouraging as our proposed BES solution produces more explainable insights based on patient segments and prototypical patient cases, without sacrificing on model performance.

5.2. Drivers of Differential Patient Response

Figure 7. Interpretable engagement insights for Care Managers for shared decision making.

For each of the two care management tasks, we identify five behavioral profiles for tailoring care management strategies of patient engagement and surface insights for each target patient based on the behavioral profile of his/her closely related patient segment. To achieve this, we analyze feature weights of the model trained for each segment to pinpoint drivers that contribute the most to engagement outcome prediction. This is for generating interpretations for Care Managers to understand the rationale of the predictions offered by the behavioral engagement scorer in the pipeline.

Figure 8. Feature weights that indicate patient response driver across the five different behavioral profiles for goal attainment.

Fig 8 demonstrates the variability of differential patient response drivers across the different behavioral profiles for goal attainment. The feature rankings are significantly different when compared the population-level feature rankings with those among patient segments (as indicated by Spearman’s coefficient ). Most of the patient segments exhibit a complex pattern of behavioral response than the population-level ones.

In some patient segments, we observe strong indicators of care manager influence, e.g., how long the care manager (CM) been trying to help this patient obtain this goal, the number of attempts before goal attainment or before the CM decided to close a goal. In some other patient segments, we observe that certain goals yield more positive engagement responses than the others. For example, in the first and 3rd segment, Self-care and Educational goals are more likely to be attained. The opposite has also been observed in some segments. For example, in the second segment, Medication-related goals are less likely to engage patients in this segment to attain. Moreover, call context does matter in some segment. For example, for patients in the 4th segment, calling on Tuesday would be more likely to help yield better engagement responses.

It is worthy to note that although we do include age and gender as a part of the features in the analysis. Results show that patient demographics matter less than expected, yielding only neutral contribution to the modeling of engagement response prediction.

5.3. Interface and API for Shared Decision Making based on Engagement Insights

The BES pipeline also includes a web-based user interface that provides access to explainable insights about the gauged engagement level and the predicted response. In addition, the pipeline derives best practice insights using the example illustrated by the prototypical patient cases in each segment. Care managers can use the interface to learn about the target patients and their related patient segments for prioritization and best practice learning.

Fig 7 shows the interactive tooling based on a demo API customized for care managers to make their decisions before the call regarding which patients to call first for program enrollment and for goal attainment, and what are the reasons that they might or might not be engaged.

Figure 9. Pipeline evaluation using prototypical patient cases.

6. Conclusion & Future Work

The main contributions of our paper are as follows: First, we present a quantitative and personalized approach to identifying patients who are more likely to engage in care management, and demonstrate empirically, using real-world data, that our methods provide more accurate engagement behavior predictions compared to a “one-size-fits-all” population-based approach; Second, these insights are made explainable by identifying prototypical patient cases within a personalized patient segment. Performance evaluation results show that, in our case, explainability does not come at the expense of model performance.

Analyzing observational transaction data of care management interaction logs regarding to clinical factors only overlooks an important part of equation leading to better outcomes. It is expected that segment-level incidence rates might result in biased estimates of the effect of interventions. Applying the BES pipeline, which properly adjusts for individual and patient segment information, enables a more accurate estimate of engagement effect and support care management decision-making and in a shared decision-making scenario. The quantification of heterogeneous engagement effects in patient segments goes beyond the existing care quality metrics to add another perspective of behavioral understanding to providers using care management programs.

As for the issue of bridging the gap between human and machine decision making, simply optimizing model properties is not sufficient to warrant actions from health decision makers (Chen and Asch, 2017)

. More research is needed to identify additional evaluation metrics that can serve as a proxy measure of the performance in real-life user tasks

(Karkar et al., 2015). This line of research can help evaluate how to make sense of the models and analytical results in order to support decision makers’ actions, as well as how to validate the derived insights directly to automate decisions. Doing this is a truly interdisciplinary work and expected to enhance future milestones to develop tools for creating deployment and feedback process and aligning with the need of generating real-world evidence on best practice.

References

  • (1)
  • Agency for Healthcare Research and Quality (2015) Agency for Healthcare Research and Quality. 2015. Implications for Medical Practice, Health Policy, and Health Services Research. Technical Report. Rockville, MD, USA.
  • Brown et al. (2012) Randall S Brown, Deborah Peikes, Greg Peterson, Jennifer Schore, and Carol M Razafindrakoto. 2012. Six features of Medicare coordinated care demonstration programs that cut hospital admissions of high-risk patients. Health Affairs 31, 6 (2012), 1156–1166.
  • Caruana et al. (2015) Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible Models for HealthCare. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’15. ACM Press, New York, New York, USA, 1721–1730. https://doi.org/10.1145/2783258.2788613
  • Center for Health Care Strategies (2007) Center for Health Care Strategies. 2007. Care Management Definition and Framework. Technical Report.
  • Chen and Asch (2017) Jonathan H. Chen and Steven M. Asch. 2017. Machine Learning and Prediction in Medicine — Beyond the Peak of Inflated Expectations. New England Journal of Medicine 376, 26 (jun 2017), 2507–2509. https://doi.org/10.1056/NEJMp1702071
  • Choi et al. (2016) Edward Choi, Mohammad Taha Bahadori, Joshua A. Kulas, Andy Schuetz, Walter F. Stewart, and Jimeng Sun. 2016. RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism. (aug 2016). arXiv:1608.05745 http://arxiv.org/abs/1608.05745
  • DARPA ([n. d.]) DARPA. [n. d.]. ARPA explainable AI Program. Retrieved January 28, 2019 from https://www.darpa.mil/program/explainable-artificial-intelligence
  • Dey et al. (2012) Sanjoy Dey, Kelvin Lim, Gowtham Atluri, Angus MacDonald, Michael Steinbach, and Vipin Kumar. 2012. A pattern mining based integrative framework for biomarker discovery. In Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine - BCB ’12. ACM Press, New York, New York, USA, 498–505. https://doi.org/10.1145/2382936.2383000
  • Feldman (2000) Jacob Feldman. 2000. Minimization of Boolean complexity in human concept learning. Nature 407, 6804 (oct 2000), 630–633. https://doi.org/10.1038/35036586
  • Friedman et al. (2001) Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The elements of statistical learning. Vol. 1. Springer series in statistics New York, NY, USA:.
  • google ([n. d.]) google. [n. d.]. Google AI experiment: high-dimensional space. Retrieved January 28, 2019 from https://experiments.withgoogle.com/ai/visualizing-high-dimensional-space
  • Graffigna and Lozza (2015) Barello S. Bonanomi A. Graffigna, G. and E. Lozza. 2015. Measuring patient engagement: development and psychometric properties of the Patient Health Engagement (PHE) Scale. Frontiers in psychology 6, 274 (2015).
  • Hibbard and Tusler (2004) Stockard J. Mahoney E. R. Hibbard, J. H. and M. Tusler. 2004. Development of the Patient Activation Measure (PAM): conceptualizing and measuring activation in patients and consumers. Health services research 39, 4 Pt 1 (2004), 1005–26.
  • Hsueh et al. (2017) Pei-Yun S. Hsueh, S. Dey, S. Das, and T. Wetter. 2017. Making sense of patient-generated health data for interpretable patient-centered care: The transition from ”More” to ”Better”. Vol. 245. https://doi.org/10.3233/978-1-61499-830-3-113
  • Hu et al. (2017) Xinyu Hu, Pei-Yun S Hsueh, Ching-Hua Chen, Keith M Diaz, Ying-Kuen K Cheung, and Min Qian. 2017. A First Step Towards Behavioral Coaching for Managing Stress: A Case Study on Optimal Policy Estimation with Multi-stage Threshold Q-learning. AMIA … Annual Symposium proceedings. AMIA Symposium 2017 (2017), 930–939. http://www.ncbi.nlm.nih.gov/pubmed/29854160http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC5977571
  • Karkar et al. (2015) Ravi Karkar, Jasmine Zia, Roger Vilardaga, Sonali R Mishra, James Fogarty, Sean A Munson, and Julie A Kientz. 2015. A framework for self-experimentation in personalized health. Journal of the American Medical Informatics Association (2015).
  • Ketchen and Shook (1996) David J Ketchen and Christopher L Shook. 1996.

    The application of cluster analysis in strategic management research: an analysis and critique.

    Strategic management journal 17, 6 (1996), 441–458.
  • Kim et al. (2014) Been Kim, Cynthia Rudin, and Julie A. Shah. 2014. The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification. , 1952–1960 pages.
  • Lakkaraju et al. (2016) Himabindu Lakkaraju, Stephen H. Bach, and Jure Leskovec. 2016. Interpretable Decision Sets. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16. ACM Press, New York, New York, USA, 1675–1684. https://doi.org/10.1145/2939672.2939874
  • Lei et al. (2016) Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing Neural Predictions. (jun 2016). arXiv:1606.04155 http://arxiv.org/abs/1606.04155
  • Liu et al. (2018) Bin Liu, Ying Li, Zhaonan Sun, Soumya Ghosh, and Kenney Ng. 2018. Early Prediction of Diabetes Complications from Electronic Health Records: A Multi-Task Survival Analysis Approach. AAAI (2018). https://www.semanticscholar.org/paper/Early-Prediction-of-Diabetes-Complications-from-A-Liu-Li/28dec33fc71b9139e7e1b6c4a1b32b2947c53176
  • Long (2017) Peter V Long. 2017. Effective Care for High-need Patients: Opportunities for Improving Outcomes, Value, and Health. National Academy Of Medicine.
  • Luo and Rumshisky ([n. d.]) Yen-Fu Luo and Anna Rumshisky. [n. d.]. Interpretable Topic Features for Post-ICU Mortality Prediction. ([n. d.]). http://www.cs.uml.edu/{~}arum/publications/YFLuo{_}AMIA{_}2016.pdf
  • Michigan Care Management Resource Center Home ([n. d.]) Michigan Care Management Resource Center Home. [n. d.]. Patient Engagement. Retrieved January 28, 2019 from https://micmrc.org/topics/patient-engagement-0
  • Ribeiro et al. (2016) Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ”Why Should I Trust You?”. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16. ACM Press, New York, New York, USA, 1135–1144. https://doi.org/10.1145/2939672.2939778
  • Robins et al. (1994) James M. Robins, Andrea Rotnitzky, and Lue Ping Zhao. 1994. Estimation of Regression Coefficients When Some Regressors Are Not Always Observed. J. Amer. Statist. Assoc. 89, 427 (sep 1994), 846. https://doi.org/10.2307/2290910
  • Sridharan and Tesauro (2002) Manu Sridharan and Gerald Tesauro. 2002. Multi-agent Q-learning and Regression Trees for Automated Pricing Decisions. Springer, Boston, MA, 217–234. https://doi.org/10.1007/978-1-4615-1107-6_11
  • Sun et al. (2010) Jimeng Sun, Daby Sow, Jianying Hu, and Shahram Ebadollahi. 2010. Localized supervised metric learning on temporal physiological data. In Pattern Recognition (ICPR), 2010 20th International Conference on. IEEE, 4149–4152.
  • Sun et al. (2012) Jimeng Sun, Fei Wang, Jianying Hu, and Shahram Edabollahi. 2012. Supervised patient similarity measure of heterogeneous patient records. ACM SIGKDD Explorations Newsletter 14, 1 (2012), 16–24.
  • Tuomilehto et al. (2001) Jaakko Tuomilehto, Jaana Lindström, Johan G Eriksson, Timo T Valle, Helena Hämäläinen, Pirjo Ilanne-Parikka, Sirkka Keinänen-Kiukaanniemi, Mauri Laakso, Anne Louheranta, Merja Rastas, et al. 2001. Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. New England Journal of Medicine 344, 18 (2001), 1343–1350.
  • West et al. (1997) Jeffrey A West, Nancy H Miller, Kathleen M Parker, Deborah Senneca, Ghassan Ghandour, Mia Clark, George Greenwald, Robert S Heller, Michael B Fowler, and Robert F DeBusk. 1997. A comprehensive management system for heart failure improves clinical outcomes and reduces medical resource utilization. American Journal of Cardiology 79, 1 (1997), 58–63.