Racial and ethnic disparities in access to healthcare in the United States is well-known and documented . Health disparities are defined to be differences in health outcomes and causes among different groups of people. Health equity is achieved when everyone has the same opportunity to be as healthy as possible.
We have a very good handle on the types of health disparities in the US healthcare system, but the causes for these disparities are complex [2, 3] - such as income, education, socio-economic conditions, neighborhood and community influence, public policy, and societal structure. Achieving health equity also necessitates a complex set of programs and interventions, and several public and private initiatives have tried to address this problem over the past decades.
Obermeyer and Mullainathan  analyze the significant racial bias in an algorithm that drives health decisions for over 70 million people in the United States. They find that black patients with highest health risk have significantly more chronic illnesses than white patients with the same risk score. A key observation they make is that hospitals and insurers treat healthcare expenses as a strong proxy for healthcare needs. This is an imperfect assumption that makes algorithms that predict expenses accurately biased in terms of health. Our work illustrates a similar phenomenon in the publicly available, and nationally representative, Medical Expenditure Panel Survey (MEPS) dataset .
In this work, we focus on inequity in health indicators in terms of race in the US population. Using the MEPS data, we show that when considering the top decile of high expense white and black patients, the blacks have worse health indicators. We also demonstrate that this carries over to statistical machine learning models for care management that use MEPS data to predict if an individual will incur high expense. Finally, we illustrate how simple bias mitigation methods[6, 7] can be used to make these prediction models fairer, such that blacks and whites in the predicted top decile have lesser health disparity.
2 Description of MEPS data
The Medical Expenditure Panel Survey (MEPS) dataset is produced by the US Department of Health and Human Services. It is a collection of surveys of families of individuals, medical providers, and employers across the country. Datasets are available from 1996, and contain two major components: the household component and the insurance component. We use the household component data in this work which contains detailed information on demographic characteristics, health conditions and status, healthcare utilization, access to care, health insurance coverage, income, employment, and charges and sources of payment.
A single panel consists of unique individuals interviewed in five rounds over two full calendar years. Every year, a new cohort is started so that there are two partially overlapping panels being conducted during any calendar year - rounds 1-3 of one panel overlap with rounds 3-5 of the previous panel. In any given dataset, each sample is weighted so that the total weight in each panel sums to the entire US civilian, non-institutionalized population.
3 Racial bias in the MEPS data
To explore the presence of racial bias in the MEPS data, we considered 2-year longitudinal data for the cohort initiated in 2015 (panel 20) and surveyed over five rounds during 2015-2016. We restricted the population to individuals who provided data during all five rounds, and indicated their ethnicity/race as non-Hispanic white, or non-Hispanic black. We did a similar analysis of another cohort started in 2014 and covering five rounds over 2014-2015 (panel 19).
One of the main attributes we studied was total healthcare expenditure. Healthcare expenditure, as a proxy for healthcare utilization, is being increasingly used to identify patients for care/disease management, reducing expenses, as well as evaluating the effects of different policies [8, 4, 9, 10, 11]. From the modeling side, significant amount of work has been done for predicting future healthcare expenses and identifying people likely to incur high medical expenditure, but little has been done in terms of understanding and quantifying the effect of racial disparities.
Since one is typically interested in future healthcare expenditure, we specifically looked at total healthcare expenditure incurred by an individual in the second year (2016). We considered the entire population, as well as the top expenditure decile. The 10% threshold has been commonly used in prior work on building prospective high medical expenditure models [8, 12].
In addition to total medical expenditure, we also looked at direct healthcare utilization, as measured by visits to the emergency room (ER) and the number of inpatient nights (IP) during the second year. These outcomes have long been considered as alternatives to total healthcare expenditure, and used to guide various expense and patient management programs as somewhat better proxies for patient health than expenditure . We analyzed these outcome metrics with respect to various individual health metrics in the first year of the panel. Prior work has demonstrated that the count of chronic conditions  as well as self health assessments and limitations [13, 9, 8] are significantly associated with future healthcare expenditures: A sicker patient typically has higher resultant medical expenditure/utilization.
In order to obtain a count of chronic conditions for an individual, we looked at the priority conditions, a set of conditions, including high blood pressure, diabetes, cancer, stroke, etc., that have been marked as such due to their frequency, expense, as well as importance to policy . We treated four different heart related conditions (coronary heart disease, angina, myocardial infarction, and other unspecified heart disease) as a single condition for the sake of our analysis. We also considered two different self assessed health status measures, one for perceived health status and one for perceived mental health status. Both measures were rated from 1 (excellent) to 5 (poor).
Tables 1 and 2 show the outcome and health indicator metrics for the entire population as well as those in the top decile of healthcare expenditure during the second year. Clearly, the individuals in the top decile are significantly more expensive, compared to the overall population. More importantly, there is substantial disparity across races, both at the overall level as well as in the high expense group. At the overall level, blacks incur substantially less expenditure, on average, as compared to whites ($4K versus $5.9K). This is despite the fact that blacks are typically sicker than whites across almost all metrics. In the top expenditure group, while blacks cost slightly more than whites, they are considerably more sicker than whites, compared to the overall population, as measures both by outcomes (ER visits and IP nights) as well as priority conditions and health status. This echoes the findings of Obermeyer and Mullainathan , in that blacks at the same risk level are typically much more sicker than whites. Tables 3 and 4 show similar results for the panel 19, 2014-2015 data, although it must be noted that blacks incurred substantially less expense than whites even in the high expense group in that dataset.
More importantly, there is substantial disparate impact across races when it comes to the rate at which blacks and whites incur costs in the top decile. While only 7.1% of blacks incurred top-decile expense versus 10.7% of whites in 2015-2016, 6.8% blacks incurred high expense in 2014-2015 compared to 10.6% of whites. Thus, not only blacks are sicker than whites at the same risk level, they also incur higher expenses at a lower rate. As we show in Section 4, this bias will likely be inherited by machine learning models designed to identify high expense individuals.
|Entire Population||Top Decile (second|
|Average expense (both races)||$5.6K||$34.9K|
|Top decile expense (both races)||$13.6K|
|% of race in top decile||10.7%||7.1%|
|Average number of ER visits||0.18||0.21||0.62||0.83|
|Average number of IP nights||0.33||0.45||2.61||4.91|
|% with ER visits||12.9%||15.5%||40.4%||48%|
|% with IP nights||6.8%||6.8%||44.7%||54.3%|
|Entire Population||Top Decile (Second|
|Average number of priority conditions||1.97||1.8||3.5||3.8|
|Average perceived physical health status||2.08||2.23||2.84||3.01|
|Average perceived mental health status||1.82||1.85||2.27||2.48|
|Entire Population||Top Decile (second|
|Average expense (both races)||$6.2K||$40K|
|Top decile expense (both races)||$15K|
|% of race in top decile||10.6%||6.8%|
|Average number of ER visits||0.21||0.20||0.81||0.74|
|Average number of IP nights||0.5||0.4||4.07||4.40|
|% with ER visits||14.2%||15%||42.6%||45.7%|
|% with IP nights||7.1%||6.8%||48.1%||54.4%|
|Entire Population||Top Decile (Second|
|Average number of priority conditions||2.0||1.8||3.69||3.74|
|Average perceived physical health status||2.12||2.19||2.98||3.09|
|Average perceived mental health status||1.84||1.85||2.24||2.34|
4 Predicting individuals that have high expected medical expenditure using the raw MEPS data
We built a logistic regression model to predict second year total medical expenditure of individuals, based on their demographics as well as health-related attributes in the first year. Besides features such as age, gender, and race, we used features on diagnoses received for various priority conditions (high blood pressure, diabetes, heart disease, cancer, etc.) as well as the count of these chronic conditions, physical and mental health assessments, and limitations (such as cognitive or hearing or vision limitation). We specifically left out certain features such as prior year healthcare expenditure, income, and employment status, that are known to be strong predictors of future healthcare expenditure[9, 11, 10]. Two reasons motivated this choice. One, from a true care management perspective, it makes sense to relate expected expenditure to factors that can be affected (e.g. chronic diseases) rather than factors that are highly predictive (e.g. prior year expenditure) but are non-actionable. Two, such models, based on diagnosis-related attributes, have been shown to be at least as good in predicting high prospective expenditure individuals as models based on prior expenditures .
We modeled this problem as a binary classification task - the objective being to predict whether an individual would be in the top decile of second year expenditure. The training data for the model consisted of the 2014-2015 Panel 19 data, whereby the model was learned to predict the top decile members of 2015 healthcare expenditure, based on 2014 demographic and health features. The learned model was then applied to the 2015-2016 Panel 20 data to predict the top 2016 expenditure individuals in the cohort based on their 2015 features. The balanced accuracy of the model on the test set was just shy of 73% (using a threshold obtained from the training data using cross validation). However, to enable a fair comparison with the raw data, the predicted model scores were sorted and only the individuals with scores in the top 10% were predicted to be high future expense persons.
Table 5 shows some outcome characteristics for individuals predicted to be in the top decile of expenditure in the second year, based on demographics and health conditions in the first year. The first point to note, compared to Table 1, is that the racial disparity evidenced in the underlying dataset is picked up by the model as well: only 6.8% of blacks are predicted to be in the top decile versus 10.6% of whites. The second point of note is that the corresponding expenses are lower than in the underlying dataset - that is to be expected as features such as prior year expenditure and income which are strong predictors of future expense were explicitly excluded from the feature set. However, the average expense for these individuals is still substantial - around three times the average and higher than the top decile cutoff in the base data.
More importantly, as evidenced from Table 6, these individuals are much more sicker than those in the top decile of expenditure in the underlying dataset. Nevertheless, blacks in this group are sicker, on average, than whites across the measured metrics. So, while the model as a whole is better suited for care management, as it focuses on individuals who are sicker rather than bigger consumers of healthcare services, it still results in an unfair bias against blacks - not only are they underrepresented in the high risk population, they also have to be sicker than whites to be included.
|% of race predicted to be high-expense||10.7%||6.8%|
|Average number of ER visits||0.45||0.65|
|Average number of IP nights||1.32||2.58|
|% with ER visits||28.6%||39.9%|
|% with IP nights||20.9%||23.2%|
|Average number of priority conditions||4.89||5.18|
|Average perceived physical health status||3.55||3.90|
|Average perceived mental health status||2.53||2.99|
5 Predicting high expected medical expenditure from the MEPS data after bias mitigation
To mitigate the racial bias in the MEPS data, and consequently exhibited by the model learned from the data to predict high expenditure individuals, we applied the data preprocessing technique, Reweighing , to process the training data (2014-2015 Panel 19 MEPS data), to make it more equitable towards blacks. Reweighing works by assigning weights to the tuples in the training data so as to mitigate bias in the data with respect to the sensitive feature, race in our case, without changing the actual labels. The processed Panel 19 data was then used to learn a logistic regression model which was then applied to the Panel 20 data to predict the individuals expected to incur high medical expenditure in the second year (2016), following the same steps as outlined in Section 4. The balanced accuracy of the model trained using the reweighed data a little more than 71 % on the test set.
The healthcare expenditure and utilization metrics for the predicted, high-expense individuals for the second year are shown in Table 7. The corresponding health metrics for those individuals during the first year are similarly shown in Table 8.
The first point to note is the disparity in the racial representation within the predicted high-expenditure group is much less than before: 11% of blacks are in this group against a slightly lower 9.9% of whites. This is consistent with the fact that the blacks in this group are still sicker than whites on almost all the metrics (ER visits, IP nights, as well as perceived status), though, importantly, the gap between blacks and whites has reduced on almost every metric compared to before (Tables 5 and 6) - whites are slightly sicker than before while blacks are slightly less sicker than earlier on every metric.
Thus, using a simple bias mitigation technique, we were able to reduce racial bias both in terms of the rate at which the two races were represented in the high risk group and also reduce the gap between them in terms of various measures of sickness.
|% of race predicted to be high-expense||9.9%||11%|
|Average number of ER visits||0.48||0.53|
|Average number of IP nights||1.41||2.33|
|% with ER visits||29.9%||33.9%|
|% with IP nights||21.9%||22.2%|
|Average number of priority conditions||4.96||4.84|
|Average perceived physical health status||3.58||3.62|
|Average perceived mental health status||2.57||2.76|
6 Conclusion and Future Work
The Medical Expenditure Panel Survey (MEPS) data is a set of publicly available, nationally representative surveys that provides one of the most complete pictures of the expense and utilization of healthcare for the civilian, non-institutionalized population of the United States. One common use of this data is to build predictive models of healthcare expenditure to guide decisions regarding care management, disease management, and cost management. However, none of this prior work has looked at the prevalence of racial bias as it relates to healthcare expenditure and its effect on these models. We show that the bias is also picked up by the models, which results in significant bias against blacks. Namely, blacks are less likely to be predicted as prospective high expenditure patients than white, and hence less likely to be offered care management. Moreover, at the same level of predicted risk, blacks tend to be much more sicker than whites. While this has been noted in other cases , the fact that MEPS is representative of the entire US healthcare system as a whole, makes this finding even more significant. Furthermore, we show that simple bias mitigation techniques can reduce the bias in models substantially. While it is obviously preferable to use metrics directly related to health conditions and needs, rather than expense, to make decisions about patient and disease management, the fact that various parties remain focused on expense means that understanding the effect of bias and mitigating it in predictive models provides a fairer approach to modeling expenditure.
Nevertheless, this work is still preliminary, and much needs to be done. First, many of the studies involving MEPS data have pooled together data from multiple panels to get larger, more robust data samples. A similar extension of this work is planned. Second, two components of the MEPS data that have not been used are the medical conditions and event files that provides further detailed information on medical conditions, such as prescriptions, as well as event level details, such as diagnoses received during an ER visit or IP stay. This detailed level data may provide further insight into how pervasive racial bias is across the entire healthcare space. Third, models are also commonly built to predict individuals expected to have high medical utilization such as ER and IP visits, as an alternative to total healthcare expenditure, and used to guide various cost and patient management programs . The presence of racial bias needs to be explored in such models as well.
-  Agency for Healthcare Research and Quality, “National Healthcare Quality and Disparities Reports,” https://www.ahrq.gov/research/findings/nhqrdr/index.html, Rockville, MD, USA, September 2019.
-  A. Nelson, “Unequal treatment: confronting racial and ethnic disparities in health care.” Journal of the National Medical Association, vol. 94, no. 8, p. 666, 2002.
-  Committee on Health Care Utilization and Adults with Disabilities, Health-Care Utilization as a Proxy in Disability Determination: 2, Factors That Affect Health-Care Utilization. National Academies of Sciences, Engineering, and Medicine; Health and Medicine Division; Board on Health Care Services., 2018, available from: https://www.ncbi.nlm.nih.gov/books/NBK500097/.
-  Z. Obermeyer and S. Mullainathan, “Dissecting racial bias in an algorithm that guides health decisions for 70 million people,” in Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 2019, pp. 89–89.
-  Agency for Healthcare Research and Quality, “Medical Expenditure Panel Survey (MEPS),” https://www.ahrq.gov/data/meps.html, Rockville, MD, USA, August 2018.
-  F. Kamiran and T. Calders, “Data preprocessing techniques for classification without discrimination,” Knowledge and Information Systems, vol. 33, no. 1, pp. 1–33, Oct 2012. [Online]. Available: https://doi.org/10.1007/s10115-011-0463-8
-  R. K. E. Bellamy, K. Dey, M. Hind, S. C. Hoffman, S. Houde, K. Kannan, P. Lohia, J. Martino, S. Mehta, A. Mojsilovic, S. Nagar, K. N. Ramamurthy, J. T. Richards, D. Saha, P. Sattigeri, M. Singh, K. R. Varshney, and Y. Zhang, “AI Fairness 360: An extensible toolkit for detecting, understanding, and mitigating unwanted algorithmic bias,” CoRR, vol. abs/1810.01943, 2018. [Online]. Available: http://arxiv.org/abs/1810.01943
-  J. A. Fleishman and J. W. Cohen, “Using information on clinical conditions to predict high-cost patients,” Health Serv Res., vol. 45, no. 2, pp. 532–553, 2009.
-  L. R. Wherry, M. E. Burns, and L. J. Leininger, “Using self-reported health measures to predict high-need cases among medicaid-eligible adults,” Health Serv Res., vol. 49, no. 2, pp. 2147–2172, 2014.
-  S. Sushmita, S. Newman, J. Marquardt, P. Ram, V. Prasad, M. D. Cock, and A. Teredesai, “Population cost prediction on public healthcare datasets,” in Proceedings of the 5th International Conference on Digital Health 2015, 2015, pp. 87–94.
M. A. Morid, K. Kawamoto, T. Ault, J. Dorius, and S. Abdelrahman, “Supervised learning methods for predicting healthcare costs: Systematic literature review and empirical evaluation,” inAMIA Annu Symp Proc., 2017, pp. 1312–1321.
-  J. F. Farley, C. R. Harley, and J. W. Devine, “A comparison of comorbidity measurements to predict healthcare expenditures,” Am J Manag Care, vol. 12, no. 2, pp. 110–117, 2006.
-  A. S. Ash and R. P. Ellis, “Risk-adjusted payment and performance assessment for primary care,” Med Care., vol. 50, no. 8, pp. 643–653, 2012.
-  Agency for Healthcare Research and Quality, “Priority Conditions,” https://meps.ahrq.gov/data_stats/MEPS_topics.jsp?topicid=41Z-1, Rockville, MD, USA, September 2019.
-  A. S. Ash, Y. Zhao, R. P. Ellis, and M. S. Kramer, “Finding future high-cost cases: comparing prior cost versus diagnosis-based methods,” Health Serv Res., vol. 36, no. 6, pp. 194–206, 2001.