A Machine Learning System for Retaining Patients in HIV Care

06/01/2020 ∙ by Avishek Kumar, et al. ∙ ITAM Columbia University Carnegie Mellon University proton mail 0

Retaining persons living with HIV (PLWH) in medical care is paramount to preventing new transmissions of the virus and allowing PLWH to live normal and healthy lifespans. Maintaining regular appointments with an HIV provider and taking medication daily for a lifetime is exceedingly difficult. 51 are non-adherent with their medications and eventually drop out of medical care. Current methods of re-linking individuals to care are reactive (after a patient has dropped-out) and hence not very effective. We describe our system to predict who is most at risk to drop-out-of-care for use by the University of Chicago HIV clinic and the Chicago Department of Public Health. Models were selected based on their predictive performance under resource constraints, stability over time, as well as fairness. Our system is applicable as a point-of-care system in a clinical setting as well as a batch prediction system to support regular interventions at the city level. Our model performs 3x better than the baseline for the clinical model and 2.3x better than baseline for the city-wide model. The code has been released on github and we hope this methodology, particularly our focus on fairness, will be adopted by other clinics and public health agencies in order to curb the HIV epidemic.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

HIV has become one of the most devastating global pandemics in modern history, infecting over 75 million people on all parts of the planet and leading to 32 million deaths globally(1). In the City of Chicago HIV diagnoses are 1.5x the national rate and only 36% of the HIV population is retained in medical care(CDPH, 2018). The state-of-the-art HIV treatment is antiretroviral therapy (ART). HIV-positive individuals who are retained in care and taking antiretroviral therapy are able to suppress their HIV viral level in their serum to undetectable levels, effectively eliminating the risk of transmitting HIV to others and live normal lifespans (Gardner et al., 2011; HIV, 2019). For ART to be effective, a person must take medication everyday and regularly see a doctor for the entirety of their life. If the majority of persons living with HIV (PLWH) were virally suppressed through ART, it is possible to have functionally zero new HIV infections and end the HIV epidemic. The problem of combating HIV no longer solely lies in developing effective treatment. Now, the problem lies in keeping PLWH retained in care and virally suppressed for their lifetime. In response, there has been a call for new ideas and interventions to scale-up participation in the HIV care continuum to achieve universal viral suppression.

The HIV care continuum describes the stages of care necessary to achieve viral suppression: 1) diagnosis of HIV; 2) linkage-to-care, link to medical care and prescription of ART; 3) retention in care, attending medical appointments on a regular basis; and 4) viral suppression, no detectable HIV(for Disease Control, 2018). Retention in care is not only important for the individual health of people living with HIV, but also for public health. Accordingly, retention and accessing care is a critical pillar of public health agency plans to eliminate HIV transmission in the United States.

However, in the U.S., less than half of individuals living with HIV are retained in care. The causes for the low retention are multi-faceted. There are several state and federal programs, such as the Ryan White HIV/AIDS program, to provide funding for HIV care visits and medications. Despite these programs, many PLWH still do not regularly attend medical appointments. Many social, economic, and personal factors play a role in retention in care, including mental illness, substance use, insecure housing, poverty, neighborhood violence, and stigma (Cunningham et al., 2014; Giordano et al., 2009; Almirol et al., 2016; Giordano et al., 2005; Cook et al., 2007; Zuniga et al., 2016). Effective interventions take personalized approaches and include intensive case management, peer navigation, and multi-faceted outreach programs (Horstmann et al., 2010; Okeke et al., 2014; Gardner et al., 2014; Higa et al., 2012).

While these interventions are effective, they are also resource intensive. Further, most clinics and public health settings have limited resources at their disposal. It is estimated that 86% of PLWH in the U.S. are diagnosed as of 2015 

(Pecoraro et al., 2013). However, only 49% of them are retained in care and 51% are virally suppressed(for Disease Control, 2018). Therefore, methods are needed to identify and prioritize PLWH who are at the highest risk for falling out of medical care in order to prioritize their needs of being kept in care.

Existing work on this problem has focused on two aspects: 1) using retrospective analysis to identify coarse, population level subgroups at risk for dropping out of care, such as African-American men who have sex with other men (Mayer et al., 2014; Geng et al., 2010), and 2) understanding root causes and barriers to retention in care, such as mental illness, substance use, insufficient means of transportation, lack of insurance, homelessness, disruption of social and sexual networks, unemployment, and neighborhood characteristics(Aidala, 2005; Baillargeon et al., 2009; Grinstead Reznick et al., 2011; Hedrich et al., 2012; Cooper et al., 2016; McFadden et al., 2014; Mayer et al., 2014). These approaches are not actionable because while retrospective analysis to find subgroups is useful in describing the at-risk groups, it is not useful in proactively targeting resources. Targeting interventions using coarse, group level risk factors (e.g., men who have sex with other men) waste scarce resources because it presumes that all members have uniform risk, neglecting individual circumstances and behaviors. In contrast, a more fine-grained machine learning approach can overcome these shortcomings.

1.1. Our Contribution

In this paper, we describe an HIV-specific machine learning platform built in collaboration with the Chicago Department of Public Health and the University of Chicago HIV Clinic that increases engagement in the HIV continuum by shifting from reactive interventions–patient has dropped out of care, possibly locating, and re-linking to care–to proactive interventions. Improved targeting of at-risk individuals can reduce the incidence and prevalence of HIV by keeping more PLWH individuals in-care and virally suppressed, eliminating HIV-transmission channels. The platform explores the use of administrative, surveillance, electronic medical records, and domain knowledge from HIV experts through machine learning models that are scalable, adaptive, and produce patient-level predictions with associated risk factors for each prediction for proactive intervention. The system is designed to optimize and balance i) different resource constraints of different settings, ii) stability in performance over time, and iii) fairness to prevent biases in protected groups. The system is designed to support interventions both when a patient is at a clinic (at the time of an appointment) as well as routine proactive outreach by a public health department, informing everyday treatment decisions (e.g., which clinic is best suited for an individual) and policy decisions (e.g., what types of programs lead to a successful intervention).

2. HIV Retention Problem

This work was done with two partners: The University of Chicago Medicine HIV clinic (UCM) and the Chicago Department of Public Health (CDPH) to 1) support two different use cases and deployment scenarios and 2) test the effectiveness on two different data sets.

2.1. HIV Clinic

Figure 1. Approximately 10% of all appointments at the UCM HIV clinic have no follow-up.

The University of Chicago Medicine HIV clinic (UCM) has a predominantly reactive approach to re-linking patients where they attempt to contact patients via phone if they have not been seen in the last 12 months. Over the study period, patients attended at least one HIV care appointment, accounting for a total of 1200 visits. Of these appointments, between 8-12% of appointments were not followed by a subsequent appointment at least 90 days apart in a 12-month period, indicating a lack of retention in care for that time period (Figure 1). Using another measure of engagement, access to care, approximately 10% of the appointments did not have a subsequent appointment in a six month period, meaning that patient did not access care within six months. Our system seeks to identify these patients before they drop out of care and prioritize them to keep them in medical care.

2.2. Department of Public Health

The Chicago Department of Public Health (CDPH) also currently takes a reactive approach to re-linking PLWH back into care by identifying a set of people who have already been been out of medical care for approximately 18 months. On average 28% of patients do not access HIV care within a 12 month period, approximately 4000 persons. A list of PLWH who have not been in care is then provided to bridge-to-care workers who do outreach to locate and re-link persons to care on a monthly basis. The scope of the current work is to provide lists of PLWH at high risk of dropping-out-of-care on a monthly basis before they drop out of care to the bridge-to-care workers using the data that is already being collected by CDPH.

3. Data Sources

3.1. HIV Clinic Data

The clinic cohort includes HIV-positive individuals 18 years of age and older who attended at least one medical appointment at the University of Chicago adult HIV care clinic between January 1, 2008 and May 31, 2015. For all eligible patients, the following data is available from the EMR (Electronic Medical Record): demographics, appointment history, insurance information, other medical conditions, medications, HIV care provider, substance use history, and laboratory test results. Laboratory test results collected included HIV viral load, lymphocyte subset data (e.g., CD4 count), sexually transmitted infection (STI) test results, and toxicology test results. Table 1 provides a list of the types of the data provided by the UChicago HIV Clinic.

Data Type Fields
Demographics Age, Gender, Race, Address
Lab Tests CD4, Viral Load
Appointment History Number/Date of Appointments with all providers
Diagnoses Psychiatric Illness, Opportunistic Infection, STD, Substance Abuse
Medications All Medications Prescribed
Table 1. Data provided by UChicago HIV Clinic

Patients’ addresses are geocoded and the travel distance and travel time to the clinic as well as the crime rate along the travel route are calculated (Ridgway et al., 2018). Using data from the American Community Survey (US Census Bureau), characteristics of a patient’s community at the census tract level are collected, specifically, the racial composition, fraction of population on Supplemental Nutrition Assistance Program, commute characteristics, and education levels (United States Census Bureau, ).

3.2. Chicago Department of Public Health Data

Data Type Fields
Demographics Age, Gender, Race, Zip Code
Lab Tests CD4 Count, Viral Load
Opportunistic Infections Infection Type (e.g., Pneumonia, Tuberculosis, Cancer)
Transmission Risk How a person was infected with HIV (e.g., Perinatal, Sexual Contact, Percutaneous)
Table 2. List of Data provided by CDPH

The data used for the city-wide model is data typically used for what is called surveillance in epidemiology of the PLWH individuals in the city of Chicago by the Chicago Department of Public Health. In order to protect the identity of the patients, the data used for building and testing models is deidentified: Names were removed, location was used at the granularity of zip code, and dates are shifted by a predefined number of days. Results are reported based on the shifted dates.

The cohort that is evaluated are PLWH within the CDPH jurisdiction between the years 2041-2046 (dates were shifted to preserve anonymity) that currently live in the city of Chicago and have had a CD4 or Viral Load test in the last 12 months. Table 2 provides a summarized list of the type of data provided by CDPH.

Demographic data include age, gender, race, and zip code, and HIV transmission category (e.g., Perinatal, Sexual Contact, Percutaneous). Lab test data contains viral load tests, CD4 level tests, opportunistic infections and the date of each test. In HIV care, a viral load test measures the amount of virus present in a person’s blood serum. An HIV patient with a viral load below 200 copies of the virus per ml is considered virally suppressed, unable to spread the virus to others and able to live a normal lifespan, and is therefore the goal of HIV care. A CD4 count is a blood test to measure the level of CD4 cells in the body. CD4 cells are a type of white blood cell that are important in the immune system. A person with a CD4 count lower than 200 cells/mm is considered immunosuppressed and given a diagnosis of AIDS.

3.2.1. Data Limitations

The data that CDPH collects has several limitations that is typical of what is available to public health departments. EMR data regarding patients’ diagnoses, medications, etc. may be inaccurate if providers do not accurately document and update patient data at each visit. Prior studies have shown wide variability in accuracy of billing diagnoses and incomplete problem list documentation in the EMR. We attempted to limit inaccuracy due to poor documentation by incorporating multiple fields from the EMR. For example, patients with a history of substance abuse were detected not only by examining billing diagnoses for substance abuse, but also by collecting clinician-assigned diagnoses in the problem list, social history documentation of substance abuse, and toxicology screen results. Furthermore, certain factors that may have an important impact on retention in care may not be captured within structured fields of the EMR, i.e., life stressors, social support, child care or other responsibilities, etc. In the future, we plan to incorporate natural language processing of unstructured clinical notes and case works interviews into the model to detect these factors.

4. Methodology

We use our open-source machine learning toolkit, Triage

(for Data Science and Public Policy, ) to build this system. Triage allows for rapid and iterative creation of end to end machine learning systems. Aequitas (Saleiro et al., 2018), a bias and fairness audit tool to inspect results of machine learning models for bias in order to make informed and equitable decisions, was used to measure fairness in models.

4.1. Feature Engineering

Feature design was guided by prior literature and domain expertise of HIV retention of our team. Factors previously shown to be associated with retention in HIV care are age, CD4 count, substance use, psychiatric illness, and history of prior visits. For each feature, measures were aggregated by time (e.g., number in the past six months, mean for the past year, etc.) or space (e.g., the number of assaults in the patient’s residential census tract in the past year). A range of values for time (6 months, 1 years, 3 years, all history) and space (by zip code and census tract) aggregations as well as different aggregation functions (mean, min, max, standard deviation) were calculated for each feature. Categorical variables (such as race, transmission category) were dummified using one-hot encoding. Missing data was imputed with the imputation method depending on the variable missing (e.g., a missing birth date resulted in an age assignment of the mean age of the population). An indicator flag for whether a feature was imputed was used as an additional feature, allowing the model to use missingness itself as a feature.

The HIV Clinic model used categories of features including demographics, diagnoses, location-based features, laboratory test results, medical visits, and specific providers seen resulting in 800 features. The Department of Public Health model used categories of features include demographics, lab tests, opportunistic infections, transmission category, and diagnosis status, resulting in 2̃00 features.

4.2. Retention and Access Labels

We looked at predicting two different types of outcomes based on discussions with our partners at CDPH and UCM: 1) retention in care and 2) access to care. Retention in care is defined as attending at least 2 HIV care visits greater than 90 days apart within a 12-month period (Mugavero et al., 2010). This definition of retention is defined by the Health Resources and Services Administration HIV/AIDS Bureau (HRSA HAB). While there is no true gold standard of retention in care, this definition has been shown to be correlated with patient health outcomes including HIV viral suppression (Mugavero et al., 2012). It is the definition of interest to the UCM HIV clinic. Access to care is defined as having a single HIV care visit within a 6 or 12 month period (Mugavero et al., 2012; for Disease Control, 2018). As the name indicates, this label predicts whether a patient will access HIV medical care within a 6 or 12 month period. This metric is used by public health departments for the purposes of surveillance (CDPH, 2018). The risk score predicted by the model can be used to inform and prioritize interventions to improve retention and access to care.

5. Modeling Approach and Results

The problem of identifying HIV patients at risk of dropping out of care was cast as a binary classification problem using a variety of labels that are of interest to CDPH and UCM. Training and test sets are created for every month (public health model) or year (clinic model) in order to mimic the business process of the clinic and public health department, respectively. A variety of classification methods over a hyperparamter grid (Decision Trees, Logistic Regression, Random Forest, Gradient Boosted Decision Trees) were used to develop models before performing model selection.

5.0.1. Temporal Cross Validation

Model selection was performed using temporal cross validation. Temporal cross validation was used instead of k-fold cross-validation to account for serial correlation and temporal effects in the data and accurately model the business process in deployment. Temporal cross validation also allows us to assess model stability over time which is not possible in conventional -fold cross validation. The data were divided into training and test sets split by time. For example in the proactive outreach scenario (CDPH), if we are assessing the risk of an individual not accessing care within the next 12 months at the time we are selecting individuals to contact (1st of every month for example), then the model is trained at the beginning of every month (e.g., January 1, 2018) using all the information available from the past data (the training set). The model can then predict on all individuals in the cohort for that month (the test set). This mimics how the model will be used in deployment and prevents temporally leaking information.

5.0.2. Model Selection

Model performance was evaluated using precision with a population threshold based on the resource constraints of the setting. Rather than optimizing the model for AUC, an aggregate metric, models selection is done by locally optimizing the precision-recall space to tune the model to the resource constraints of the deployment setting, providing an accurate measure of performance of the deployed model.

In order to use retention resources efficiently, the system needs to minimize false positives, which minimize wasting resources on patients who will not drop-out-of-care. To prioritize a small number of individuals for intervention, precision for the top %, where is determined by resource constraints, ensures the model selected will minimize false positives within the intervention set. The final model selected was chosen for having consistently high performance over the last five time periods in order to ensure consistent performance over time. Specifically, the model that most frequently was within 5% of the precision of the best possible model over the last fives time periods was chosen (e.g., if the best possible precision for a time period was 0.7, all models above 0.65 precision were selected). This method of selection ensures both stability and performance in the final model deployed to the clinic or health department.

5.1. HIV Clinic Model

The clinic model is designed to make a prediction at the time of each patient’s HIV care appointment, replicating the workflow (and data available) in the clinic, where the patient arrives for their appointment and then receives a risk score. An intervention can then be initiated during their appointment. In the case of the HIV clinic, models were selected based on precision for the top 10% of risk scores. This value was chosen based on the intervention capacity of the HIV clinic. If the model were to be adopted by other clinics, the threshold can be adjusted to meet the resources of that clinic.

In the cohort used to validate the model, patients had to have attended at least one HIV care appointment. Of these appointments, between 8-12% of appointments were not followed by a subsequent appointment at least 90 days later within a 12-month period, indicating a lack of retention in care for that time period (Figure 1). Also, of these appointments, approximately 11% of the appointments did not have a subsequent appointment in a six month period (access to care).

5.1.1. HIV Clinic Access to Care Model

Figure 2. Precision@10 of UCM Access to Care. The best performing model is a Random Forest based on its precision@10% and the stability of results over the last five time periods. The Random Forest model is 3x better than the baseline and 1.7x better then Expert Rules given by domain experts.
Model Precision@10% Number of
Appointments
Correctly Flagged
Random Forest 0.30 0.05 21
Logistic Regression 0.29 0.09 21
Expert Rules 0.18 0.06 13
Baseline 0.11 0.02 8
Table 3. Model Performance of UCM Access to Care Model

The HIV Clinic Access to Care Model predicts the risk of a patient not accessing care within six months. Models were evaluated based on stability over time and Precision@10%. The best performing model for predicting access to care is a Random Forest (1000 trees, 10 min samples/leaf, no max depth). The Random Forest model model is 3x better than the baseline, correctly flagging approximately 3x more appointments than the baseline (prior) and 1.7x better then Expert Rules provided by domain experts at the clinic. The Expert Rules are based on age, length of time on ART, substance abuse and viral suppression. Figure 2 shows a Logistic Regression model has similar performance to the Random Forest but is less consistent over time; therefore, the Random Forest model was selected. The performance of the expert rules is highly inconsistent over time and the performance decays over time, indicating the rules are not taking into account a shift in the data over time. The most predictive features of the UCM access model are appointment history and retention history, particularly the number of days between appointments and number of completed appointments. This indicates that the history of appointments as well as the cadence a patient has in accessing care is important for remaining in medical care.

5.1.2. UCM Retention in Care Model

The HIV Clinic Retention in Care Model predicts the risk of a patient not be being retained in care (not having 2 appointments within 90 days in a 12 month period). This label has been shown to be correlated with effective care and is therefore of great interest to the UCM clinic. Models were evaluated based on stability over time and Precision@10%, matching the clinic’s capacity for intervention (150 appointments/year). The best performing UCM Retention model is a Random Forest (5000 estimators, max depth 5, 10 minimum samples split). The Random Forest model is 2x better than the baseline (prior) and 1.7x better than a simple Decision Tree and previously published Expert Logistic Regression Model(Ridgway et al., 2018), leading to flagging 18 and 15 more appointments, respectively. The Random Forest model is also considerably more stable over time compared to all other models. The most predictive features of the model include the consecutive days a patient has been retained, the number of days between appointments, and the number of viral load tests where a patient has been virally suppressed. These features indicate that the history of retention and viral suppression are predictive of future retention in care.

Figure 3. Precision@10% of HIV Clinic Retention in Care Model. The best performing model is a Random Forest based on its precision@10% and the stability of results over the last five time periods. The Random Forest model is 2x better than the baseline and 1.7x better then a simple Decision Tree and previously published Expert Logistic Regression Model.
Model Precision@10% Number of
Appointments
Correctly Flagged
Random Forest 0.25 0.02 38
Decision Tree 0.15 0.04 23
Expert Logistic Regression 0.14 0.04 21
Baseline 0.13 0.10 20
Table 4. Model Performance of UCM Clinic Retention in Care Model

5.1.3. Feature Importances

In order to sanity check the models as well as help explain to clinicians at UCM and public health experts at CDPH what signals the machine learning models were picking up on, we generated feature importances for our selected models. The models for both retention and access to care rely on similar predictor variables, sharing 80% of the top 20 predictors. Behavioral features such as past history of retention in care and previous HIV care encounters were found to be most predictive in both access and retention models. The regression model found demographic features–race, ethnicity, days since diagnosis–to be the most predictive. The best Random Forest initially found demographic features to be important, but the models on later time periods found behavioral features to be more predictive than demographic features.

5.2. Health Department Retention Model

Figure 4. The Department of Public Health model for Access to Care is on average 2.5x better than the baseline and 1.5x better than simply ranking patients by their viral load.
Figure 5. Precision and Recall for Last Month of Access to Care 12 months: The Precision at 1% for the last time-split is 0.57, which results in identifying 92/161 people individuals that will not access care. This is twice as many people as the baseline rate of 45/161 people
Model Precision@1% Number of
People
Correctly Flagged
Random Forest 0.65 0.13 107
Ranking by Viral Load 0.43 0.06 71
Baseline 0.28 0.02 46
Table 5. Model Performance of Department of Public Health Access Care within 12 months Model

CDPH is focused on flagging individuals who are at high risk of not accessing care within 12 months, i.e., whether a HIV person will see a doctor within a year from the date of prediction. In the current process, CDPH will generate a list of patients, approximately 100-150 patients, that have already dropped out of care on at monthly basis, and then spend the month attempting to re-link those patients to medical care. The model is designed to mimic the current process by optimizing for precision for the top 1% (roughly 100-150 people/month).

The specific label used by the Department of Public Health model was assessing the risk of a PLWH not accessing HIV medical care within the next 12 months from the date of prediction. The best model for access to care in 12 months is a Random Forest (1000 trees and max depth of 2) model that is 2.3x better than the baseline (prior), meaning the model is capturing 230% more people (107 people) than simply labeling everyone as out-of-care (46 people). The best performing model was also compared to a more realistic practice, ranking patients by their viral load. Viral load measures how much virus is present in a person’s serum. If a person is not virally supressed they are able to transmit the virus to others. The best model was 1.5x better than a simple viral load ranking, indicating that the viral load is not perfectly predictive of whether a patient will not access care in the next 12 months. Table 5 summarises the performance of the Random Forest, Ranking, and Baseline.

Figure 5 is the Precision and Recall for the top k% of last year’s CDPH 12 month access model. The Figure can be used as a menu for forecasting the resources needed to achieve the desired results. For instance, CDPH currently has capacity for the top 1% of the HIV population, resulting in a precision@1 of 0.57 (flagging 92/161 people). If, say, CDPH wanted to intervene on half of all people who will not access care (recall@50%) they would need to intervene on approximately 30% of the population (5000 people) to achieve that goal. The precision-recall curve can provide a policy menu for understanding the type of results to expect given the amount of resources used for interventions.

The top predictive features of the model are previous appointment history, viral load tests, and CD4 tests. Notably, demographics, transmission category, diagnosis status, and zip code were not as predictive in the model in the later years. This is typical in machine learning models where demographic variables are not as predictive as behavioral variables, especially as the amount of behavioral data increases.

6. Bias and Fairness of our Models

Machine learning models deployed in these two settings with many at-risk groups involved have the potential to disproportionately affect some sub-groups and exacerbate disparities. It is not sufficient to only select models based on their efficiency and effectiveness. For instance, if the rate of dropping out of care were the same for white and black HIV patients, but the model consistently selected white PLWH for intervention, the act of using the model for intervention would create a racial disparity. Models should be audited for biases as a part of the model selection process. A model should then be selected based on both its performance and fairness.

The goal of a system should be taken into account when deciding the methodology used for measuring bias(Rodolfa et al., 2020). This system prioritizes individuals for an assistive intervention to ensure they remain in medical care. A patient at risk for retention failure who does not receive an intervention loses an opportunity for their underlying challenges causing their risk to be addressed. In this deployment setting where people are being flagged for an assistive intervention, bias is measured through metrics that measure disproportionate false negatives. Disproportionately failing to detect people of a certain group who are at risk for retention failure is more harmful than detecting false positives, where resources are wasted, because it can create a disparity between groups. The system should, therefore, not disproportionately miss any at risk groups as this type of bias could exacerbate an already existing disparity or create new disparities. A false negative risk assessment carries less negative impact to the patient, though resource allocation can become more inefficient as interventions are wasted on patients who are falsely identified as high risk, leading to a trade-off between efficiency and fairness.

An ideal model in production would be both efficient at reaching individuals (as captured by precision) and have minimal bias in missing individuals. To measure bias, we calculate the False Omission Rate (FOR), the ratio of the number of false negatives to the number of negative predictions, of protected groups. Given the racial composition of the population of PLWH in Chicago, we focused our attention on auditing models for parity in FOR by race. We considered a model to be disparate if its FOR ratio of Black vs White is less than 0.9 or greater than 1.1 (indicated by the purple band in the figures). Models were audited for bias using Aequitas (Saleiro et al., 2018), a python toolkit for auditing models for bias.

Figure 7, 6, 8 captures the criteria for an optimal model – performance on the x-axis and bias on the y-axis. The purple band is the parity band with FOR ratio between 0.9 and 1.1. The dark lines are the 25th and 75th percentiles and the lighter lines are the minimum and maximum bias/performance observed, representing a measure of both stability of performance and bias.

6.1. HIV Clinic Bias Audit

6.1.1. HIV Clinic Retention in Care Bias Analysis

Figure 6. Bias in HIV Clinic Retention Models (Top) False Omission Rate among Black vs White in the model (black) compared to the logistic regression model (red) and Decision Tree model. Notably, the bias appears to be reducing in later time periods, FOR ratio is more often in the purple parity band in later years. (Bottom) FOR ratio compared to the precision@10%. All the models are on average biased, but the random forest model has the best performance and the models have less bias in the later time periods. In this setting, the model selected for deployment is the random forest model since all other models have the same FOR ratio (bias) but have worse performance.

The selected model for retention in care had FOR 0.26 0.16 for black patients compared to 0.31 0.17 for white patients (Figure 6). The expert logistic regression model had FOR of 0.27 0.17 and 0.32 0.17 for black and white patients respectively. Notably, as Figure 6 shows the bias appears to be reducing in later time periods potentially indicating that the disparity is reducing with time. The FOR ratios for these models are all similar, indicating that the best model to choose would be the best performing model.

6.1.2. HIV Clinic Access to Care Bias Analysis

Figure 7.

Bias in HIV Clinic Access Model (Top) False Omission Rate among Black vs White in the UCM access model (black) compared to the expert rules (red) and logistic regression (green) over time. While the Random Forest model has greater variance then the logistic regression or expert rules model, it falls within the parity band (purple) more often than the regression model. (Bottom) FOR for Black/African-American vs White compared to the precision@10%. An ideal model will be farther right with higher precision and fall within the parity band. The Random Forest model’s average FOR falls within the parity band and has a higher precision@10% then the logistic regression model and expert rules, making it the chosen model

The selected model for access to care in six months had FOR 0.24 0.04 for black patients compared to 0.25 0.08 for white patients (Figure 7). The expert rules model had FOR of 0.26 0.05 and 0.29 0.08 for black and white patients respectively. It should be noted, the FOR ratios are calculated over a relatively small sample (120 appointments/year are flagged as high risk). As a result, this metric is susceptible to variation due to small population size. While the expert rules model has slightly less bias than the random forest, the trade-off in performance is high. This reduced performance as well as instability in the performance of the expert rules model make the Random Forest the model of choice.

6.2. Health Department Bias Audit Results

Figure 8. Bias in Public Health Retention model (Top) FOR ratio of Black/African-American vs White in our models compared to the ranking metric. (Bottom) FOR ratio vs precision@1%. As with the HIV Clinic retention model, there is not a significant difference in bias among the models, indicating that the random forest that performs best based on precision@1% is the best choice of model for deployment. Note that for all the models, the FOR ratio of Black vs White is above 1.2, indicating the model is disproportionately missing blacks more than whites.

The bias audit of the CDPH Access model found that the model was slightly biased towards missing more Black/African-American at-risk individuals (missing 20% more) than white individuals but better than competing methods, including the expert rules. The selected model for access to care had FOR 0.30 0.02 for black/African-American patients compared to FOR 0.24 0.01 for white patients (Figure 8). This bias should be monitored to understand if it is increasing or increasing disparity in retention outcomes before deployment. Models were also audited for bias across gender (male versus female) and risk category (MSM vs non-MSM) and did not find significant bias for those metrics. In addition, we tested across the model space and all models had similar or worse bias. For simplicity, we omitted these models from the figure. Given the similarity in bias across models, in this setting, the best choice would be selecting the best performing model and do further tests to ensure that it does not result in unfair outcomes.

In this setting, bias auditing is an important part of model selection; akin to selecting a model optimized for performance, a model can be optimized for fairness constraints to prevent disparity among protected groups.

7. Impact and Conclusions

This system demonstrates the potential of machine learning models to identify HIV patients that are at the highest risk of dropping out of medical care. The system can be used for point-of-care interventions in a clinic as well as proactive outreach by a public health department. The system has been specifically implemented for the University of Chicago HIV clinic and Chicago Department of Public Health. Moreover, this methodology facilitates model selection based on performance under resource constraints, stability of performance over time, and fairness. While most prior work regarding retention in care examines factors associated with retention at a single point in time, our model dynamically predicts retention longitudinally. Patients’ appointment attendance patterns change over time, with patients often transitioning in and out of care (Lee et al., 2018). The system provides risk score at the visit level and recalculates the score as new data becomes available.

Bias in models can have an unexpected long term adverse impact of on protected groups. To our knowledge, this is also the first use of bias auditing of predictive models in an HIV care setting. We hope this work will engender further work to understand how to mitigate the risk of exacerbating disparities in more than just the HIV care setting.

Our methodology prioritizes all the criteria of fairness, performance, and stability allowing for greater control of the real world impact of the model. It allows the clinic/public health department to balance the potentially competing goals. Other sites can replicate the process presented here for extracting electronic data and incorporating them into machine learning systems using the Triage framework (for Data Science and Public Policy, ) and our open source code222http://www.github.com/dssg.

Ethical Review of Study and Waiver of Consent

University of Chicago HIV Clinic:This UCM portion of this study was approved by the University of Chicago Institutional Review Board (IRB). The IRB waived the need for informed consent as part of the study approval. Research was carried out in accordance with the ethical standards in the Declaration of Helsinki.Chicago Department of Public Health: The CDPH portion of this study was exempt from IRB approval.

Acknowledgements.
The authors would like to acknowledge funding from the NIH-funded Third Coast Center for AIDS Research (CFAR) (P30 AI117943), Institute for Translational Medicine (UL1 TR000430), and National Institute of Health (K23MH121190-01).

References

  • [1] External Links: Link Cited by: §1.
  • A. Aidala (2005) Homelessness, housing instability and housing problems among persons living with HIV/AIDS. First National Housing and HIV/AIDS Research Summit, Emory University, Atlanta, GA. Cited by: §1.
  • E. A. Almirol, N. Lancki, J. Schmitt, R. Eavou, M. Taylor, D. Pitrak, and J. P. Ridgway (2016) HIV Care and Engagement: Demographics and Risk Factors associated with Retention and Viral Suppression in Chicago, IL. Poster. External Links: Link Cited by: §1.
  • J. Baillargeon, T. P. Giordano, J. D. Rich, Z. H. Wu, K. Wells, B. H. Pollock, and D. P. Paar (2009) Accessing antiretroviral therapy following release from prison. JAMA 301 (8), pp. 848–857 (eng). External Links: ISSN 1538-3598, Document Cited by: §1.
  • CDPH (2018) Chicago Department of Public Health HIV/STI Surveillance Report. Cited by: §1, §4.2.
  • J. A. Cook, D. D. Grey, J. K. Burke-Miller, M. H. Cohen, D. Vlahov, F. Kapadia, T. E. Wilson, R. Cook, R. Schwartz, E. Golub, K. Anastos, C. Ponath, L. Goparaju, and A. Levine (2007) Illicit Drug Use, Depression and their Association with Highly Active Antiretroviral Therapy in HIV-Positive Women. Drug and alcohol dependence 89 (1), pp. 74–81. External Links: ISSN 0376-8716, Link, Document Cited by: §1.
  • H. L. F. Cooper, S. Linton, M. E. Kelley, Z. Ross, M. E. Wolfe, Y. Chen, M. Zlotorzynska, J. Hunter-Jones, S. R. Friedman, D. C. Des Jarlais, B. Tempalski, E. DiNenno, D. Broz, C. Wejnert, G. Paz-Bailey, and National HIV Behavioral Surveillance Study Group (2016) Risk Environments, Race/Ethnicity, and HIV Status in a Large Sample of People Who Inject Drugs in the United States. PLoS ONE 11 (3), pp. e0150410. External Links: ISSN 1932-6203, Link, Document Cited by: §1.
  • C. O. Cunningham, J. Buck, F. M. Shaw, L. S. Spiegel, M. Heo, and B. D. Agins (2014) Factors associated with returning to HIV care after a gap in care in New York State. Journal of Acquired Immune Deficiency Syndromes (1999) 66 (4), pp. 419–427 (eng). External Links: ISSN 1944-7884, Document Cited by: §1.
  • [9]

    C. for Data Science and U. o. C. Public Policy

    Triage: Risk Modeling and Prediction. External Links: Link Cited by: §4, §7.
  • C. for Disease Control (2018) Understanding the hiv care continuum. External Links: Link Cited by: §1, §1, §4.2.
  • E. M. Gardner, M. P. McLees, J. F. Steiner, C. del Rio, and W. J. Burman (2011) The Spectrum of Engagement in HIV Care and its Relevance to Test-and-Treat Strategies for Prevention of HIV Infection. Clinical Infectious Diseases: An Official Publication of the Infectious Diseases Society of America 52 (6), pp. 793–800. External Links: ISSN 1058-4838, Link, Document Cited by: §1.
  • L. I. Gardner, T. P. Giordano, G. Marks, T. E. Wilson, J. A. Craw, M. Drainoni, J. C. Keruly, A. E. Rodriguez, F. Malitz, R. D. Moore, L. A. Bradley-Springer, S. Holman, C. E. Rose, S. Girde, M. Sullivan, L. R. Metsch, M. Saag, M. J. Mugavero, and Retention in Care Study Group (2014) Enhanced personal contact with HIV patients improves retention in primary care: a randomized trial in 6 US HIV clinics. Clinical Infectious Diseases: An Official Publication of the Infectious Diseases Society of America 59 (5), pp. 725–734 (eng). External Links: ISSN 1537-6591, Document Cited by: §1.
  • E. H. Geng, D. Nash, A. Kambugu, Y. Zhang, P. Braitstein, K. A. Christopoulos, W. Muyindike, M. B. Bwana, C. T. Yiannoutsos, M. L. Petersen, et al. (2010) Retention in care among HIV-infected patients in resource-limited settings: emerging insights and new directions. Current HIV/AIDS Reports 7 (4), pp. 234–244. Cited by: §1.
  • T. P. Giordano, C. Hartman, A. L. Gifford, L. I. Backus, and R. O. Morgan (2009) Predictors of retention in HIV care among a national cohort of US veterans. HIV clinical trials 10 (5), pp. 299–305 (eng). External Links: ISSN 1528-4336, Document Cited by: §1.
  • T. P. Giordano, F. Visnegarwala, A. C. White, C. L. Troisi, R. F. Frankowski, C. M. Hartman, and R. M. Grimes (2005) Patients referred to an urban HIV clinic frequently fail to establish care: factors predicting failure. AIDS care 17 (6), pp. 773–783 (eng). External Links: ISSN 0954-0121, Document Cited by: §1.
  • O. Grinstead Reznick, M. Comfort, K. McCartney, and T. B. Neilands (2011) Effectiveness of an HIV prevention program for women visiting their incarcerated partners: the HOME Project. AIDS and behavior 15 (2), pp. 365–375 (eng). External Links: ISSN 1573-3254, Document Cited by: §1.
  • D. Hedrich, P. Alves, M. Farrell, H. Stöver, L. Møller, and S. Mayet (2012) The effectiveness of opioid maintenance treatment in prison settings: a systematic review. Addiction (Abingdon, England) 107 (3), pp. 501–517 (eng). External Links: ISSN 1360-0443, Document Cited by: §1.
  • D. H. Higa, G. Marks, N. Crepaz, A. Liau, and C. M. Lyles (2012) Interventions to improve retention in HIV primary care: a systematic review of U.S. studies. Current HIV/AIDS reports 9 (4), pp. 313–325 (eng). External Links: ISSN 1548-3576, Document Cited by: §1.
  • T. L. HIV (2019) U=U taking off in 2017. The Lancet HIV 4 (11), pp. e475. External Links: Link, Document Cited by: §1.
  • E. Horstmann, J. Brown, F. Islam, J. Buck, and B. D. Agins (2010) Retaining HIV-infected patients in care: Where are we? Where do we go from here?. Clinical Infectious Diseases: An Official Publication of the Infectious Diseases Society of America 50 (5), pp. 752–761 (eng). External Links: ISSN 1537-6591, Document Cited by: §1.
  • H. Lee, X. K. Wu, B. L. Genberg, M. J. Mugavero, S. R. Cole, B. Lau, J. W. Hogan, and Centers for AIDS Research Network of Integrated Clinical Systems (CNICS) Investigators (2018) Beyond binary retention in HIV care: predictors of the dynamic processes of patient engagement, disengagement, and re-entry into care in a US clinical cohort. AIDS (London, England) 32 (15), pp. 2217–2225 (eng). External Links: ISSN 1473-5571, Document Cited by: §7.
  • K. H. Mayer, L. Wang, B. Koblin, S. Mannheimer, M. Magnus, C. del Rio, S. Buchbinder, L. Wilton, V. Cummings, C. C. Watson, E. Piwowar-Manning, C. Gaydos, S. H. Eshleman, W. Clarke, T. Liu, C. Mao, S. Griffith, D. Wheeler, and HPTN061 Protocol Team (2014) Concomitant socioeconomic, behavioral, and biological factors associated with the disproportionate HIV infection burden among Black men who have sex with men in 6 U.S. cities. PloS One 9 (1), pp. e87298 (eng). External Links: ISSN 1932-6203, Document Cited by: §1.
  • R. McFadden, A. Bouris, D. Voisin, N. Glick, and J. Schneider (2014) Dynamic social support networks of younger black men who have sex with men with new HIV infection. AIDS care 26 (10), pp. 1275–1282. External Links: ISSN 0954-0121, Link, Document Cited by: §1.
  • M. J. Mugavero, J. A. Davila, C. R. Nevin, and T. P. Giordano (2010) From access to engagement: measuring retention in outpatient HIV clinical care. AIDS patient care and STDs 24 (10), pp. 607–613 (eng). External Links: ISSN 1557-7449, Document Cited by: §4.2.
  • M. J. Mugavero, A. O. Westfall, A. Zinski, J. Davila, M. Drainoni, L. I. Gardner, J. C. Keruly, F. Malitz, G. Marks, L. Metsch, T. E. Wilson, T. P. Giordano, and Retention in Care (RIC) Study Group (2012) Measuring retention in HIV care: the elusive gold standard. Journal of Acquired Immune Deficiency Syndromes (1999) 61 (5), pp. 574–580 (eng). External Links: ISSN 1944-7884, Document Cited by: §4.2.
  • N. L. Okeke, J. Ostermann, and N. M. Thielman (2014) Enhancing linkage and retention in HIV care: a review of interventions for highly resourced and resource-poor settings. Current HIV/AIDS reports 11 (4), pp. 376–392 (eng). External Links: ISSN 1548-3576, Document Cited by: §1.
  • A. Pecoraro, C. Royer-Malvestuto, B. Rosenwasser, K. Moore, A. Howell, M. Ma, and G. E. Woody (2013) Factors contributing to dropping out from and returning to HIV treatment in an inner city primary care HIV clinic in the United States. AIDS care 25 (11), pp. 1399–1406 (eng). External Links: ISSN 1360-0451, Document Cited by: §1.
  • J. P. Ridgway, E. A. Almirol, A. Bender, A. Richardson, J. Schmitt, E. Friedman, N. Lancki, I. Leroux, N. Pieroni, J. Dehlin, and J. A. Schneider (2018) Which patients in the emergency department should receive preexposure prophylaxis? implementation of a predictive analytics approach. AIDS patient care and STDs 32 (5), pp. 202–207. External Links: Document, ISBN 1557-7449; 1087-2914, Link Cited by: §5.1.2.
  • J. P. Ridgway, E. A. Almirol, J. Schmitt, T. Schuble, and J. A. Schneider (2018) Travel Time to Clinic but not Neighborhood Crime Rate is Associated with Retention in Care Among HIV-Positive Patients. AIDS and behavior 22 (9), pp. 3003–3008 (eng). External Links: ISSN 1573-3254, Document Cited by: §3.1.
  • K. T. Rodolfa, E. Salomon, L. Haynes, I. H. Mendieta, J. Larson, and R. Ghani (2020) Case study: predictive fairness to reduce misdemeanor recidivism through social service interventions. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, New York, NY, USA, pp. 142–153. External Links: ISBN 9781450369367, Link, Document Cited by: §6.
  • P. Saleiro, B. Kuester, A. Stevens, A. Anisfeld, L. Hinkson, J. London, and R. Ghani (2018) Aequitas: A Bias and Fairness Audit Toolkit. arXiv:1811.05577 [cs]. Note: arXiv: 1811.05577 External Links: Link Cited by: §4, §6.
  • [32] United States Census Bureau American community survey, 2008-2015 american community survey 5-year estimates,. Note: http://www.census.gov/Accessed Feb 22, 2018 Cited by: §3.1.
  • J. A. Zuniga, M. Yoo-Jeong, T. Dai, Y. Guo, and D. Waldrop-Valverde (2016) The Role of Depression in Retention in Care for Persons Living with HIV. AIDS patient care and STDs 30 (1), pp. 34–38 (eng). External Links: ISSN 1557-7449, Document Cited by: §1.