I Introduction
Opioids are drugs, prescribed by health professionals to relieve patients from pain. Unfortunately, these drugs often lead to addiction. This addiction has emerged as a full blown epidemic in the United States. In the last few years, there has been an alarming increase in Opioid related deaths, resulting in the loss of 63,600 lives in 2016 alone. In October 2017, the epidemic was declared as a public health emergency by the US government [1]. Although a few health related companies and commercial firms have examined this important issue from various available data sources, to the best of our knowledge, the academic community has not been engaged in research in this important topic. Arguably, the study of the epidemic from the data analytics perspective, is in its infancy. Given that a significant amount of Opioid related data is available in public domain, it provides the academic community an opportunity to analyze such data to provide recommendations to the public health authorities to mitigate the impact of the epidemic. In that vein, we collected some publicly available data to analyze the important contributing factors of the epidemic. In particular, we examine the role of the individuals prescribing Opioid drugs on the spread of the epidemic. In addition, we examine the impact of income level, age and educational level of various neighborhoods in a large US city, on Opioid related incidences, to find any correlation between them.
In the last few years, a small number of health and commercial companies have undertaken studies on Opioid related incidences, involving data analytic techniques. Blue Cross Blue Shield [2] for one, stated in their 2017 report that 21 of their commercially insured members filled at least one opioid prescription in 2015. Their data shows that members, with an opioid use disorder diagnosis, grew to over a seven year period, from 2010 to 2016. Their report also summarizes that women, over 45, have higher Opioid overdose rates than their counterparts in the same age bracket. On the other hand, men, under 45 have higher overdose rates than women under 45. Finally, they report that the Opioid overdose treatment rates are lower in the Southern states and in parts of the Midwest.
The Centers for Medicare and Medicaid Services [3], an agency of the U.S. Department of Health and Human Services (HHS), maintains almost 24 million Opioid related prescriptions, written by 1 million unique health professionals (prescribers), in the U.S in 2014. The details of the data provided in these prescriptions are described in Section. II. A small subset of this dataset with 25,000 unique prescribers, is available on [4].
Data Science researchers from IBM Research and experts IBM Watson Health have recently embarked on applying data analytics and machine learning techniques to uncover new insights to address the opioids problem [5]. Their effort is directed towards the analysis of the relationship between factors surrounding an initial opioid prescription, and a subsequent diagnosis of addiction. The goal of this research is to identify causal factors that lead to addiction diagnosis, taking into account all the variables associated with the initial prescription, such as opioid class, quantity, and related medical procedures and diagnoses. Some other efforts in this direction include, Mackey’s study [6] on illegal sales of prescription opioids online, utilizing Twitter data. Chary et. al. in [7] also analyzed Twitter data with a goal of identifying the location of the Opioid related Tweet.
In this paper, we try to analyze the important contributing factors of the epidemic from publicly available datasets. In particular, we attempt to provide answers to the following questions,

Q1: Is there a correlation between the prescribers, prescriptions and opioid related deaths in U.S. states?

Q2: Which prescribers are likely to prescribe more than 10 Opioid related prescriptions in a year?

Q3: Is there a correlation between the income level and Opioid related incidences, in a neighborhood?

Q4: Is there a correlation between the age and Opioid related incidences, in a neighborhood?

Q5: Is there a correlation between the education level and Opioid related incidences, in a neighborhood?
Our analysis shows a moderate level of correlation between Opioid related incidences with both the number of prescribers and the number of prescriptions. Researchers from IBM [5] also examined the question of “Which prescribers are likely to prescribe more than 10 Opioid related prescriptions in a year?” Using multiple machine learning algorithms, they computed accuracy of their predictions with values ranging from 60 to 84 [8]. Treating the IBM accuracy results as the benchmark accuracy, we used boosting algorithms, to reach an accuracy of 85
, a multilayer perceptron, to get an accuracy of 89
, and a random forest classifier, which also had an accuracy of
. Our perceptron model did not take into account the specialty of the prescribers, but produced a model with better decision boundary. After analyzing the role of prescribers in the country as a whole, we examine the role of prescribers by state. We see a higher Opioid prescription rate in the southern states. We dive deeper and analyze the Opioid Prescription rate by prescriber specialty. We illustrate the top 10 Opioid prescribing specialties, in Fig. 4. Finally, our analysis found a small negative correlation between Opioid related deaths with income, age, education level, when considered separately. Our analysis is presented in Section III.Ii Datasets for Analysis
In order to answer to the questions listed in the previous section, we first collected data from multiple sources and then munged the collected data to create additional datasets. The details of our data collection and data munging are provided in Sections IIA and IIB.
Iia Data Collection
Our collected data comprises of four different datasets , , and . In the following we describe each one of them.
: It is the U.S. Opiate Prescriptions/Overdoses dataset available on [4]. This dataset comprises of 25000 unique prescribers, across the U.S., and the prescriptions written by them in 2014. This is a subset of the dataset maintained by the Centers for Medicare and Medicaid Services [3], that contains almost 24 million Opioid related prescriptions, written by 1 million unique health professionals (prescribers), in the U.S in 2014. Each record in includes National Provider Identifier number, provider state, gender, credentials and the number of Opioid related drugs prescribed (among the set of 250 different drugs) by the provider. In addition, it provides the information whether or not the provider prescribed more or less than 10 Opioid related prescriptions in 2014. It may be noted that determination of whether or not a prescriber has prescribed more than 10 prescriptions in 2014, is not done by summing up the number of drugs prescribed by the provider, as multiple drugs may be prescribed on a single prescription.
: This dataset is also collected from [4]. It contains the population in each of the 50 states and also Opioid related deaths in that state.
: It is the Cincinnati Heroin Overdose dataset available on [9]. This dataset is a subset of the Emergency Medical Services (EMS) dataset, where each record contains detailed information regarding an incident, such as location, time, EMS response type, neighborhood, and others, that required an EMS dispatch. This dataset contains information related to Heroin incidences from July 2015 to present time. As of April 18, 2018, there were 5568 such incidences. is a subset of EMS dataset in the sense that it contains information only regarding Heroin incidences. It may be noted that heroin and opioid painkillers are extremely similar in terms of their chemical structure, mechanism of action and range of effects. Accordingly, for the purpose of this study, we use Heroin and other Opioid drug related data, in a similar fashion.
IiB Data Processing and Munging
We process and munge data from our collected datasets through , to create “secondary” datasets , and to provide answers to the questions raised in Section. I. In the following, we describe these three datasets:
: This dataset is created by processing information available in and . From , we create a temporary dataset that contains information regarding the total number of prescribers and prescriptions written in each of the 50 states. was joined with , to create , that contains information regarding the total number of prescribers, prescriptions and Opioid related deaths in each of the 50 states.
: This dataset was created by processing information available in , and it contains information related to the number of Opioid related incidences in each of the 50 neighborhoods of Cincinnati.
: This dataset was created by processing information available in datasets and and it contains information related to the median income, median age, median education and the number of Opioid related incidences in each of the 50 neighborhoods of Cincinnati. It may be noted that provides information related to the distribution of educational level of each of the neighborhoods. We define median education level of a neighborhood as the number of years, of the residents of the neighborhood spend in school. In [11], the educational level is divided into 10 different categories from , …., where corresponds to None and corresponds to Doctorate. The categories , …., correspond to , …., years of education, with None implying 0 years of education and Doctorate implying 22 years of education. The precise definition of median education level of a neighborhood is as follows. The median educational level of a neighborhood is years, if is the smallest integer, such that , where , …., represents the percentage of neighborhood population that has educational levels corresponding to , …., .
Iii Data Analysis Results
In this section, we provide results of our data analysis to provide answers to the five questions raised in Section. I. In the following, we discuss the results in detail.
Iiia Data Analysis for Q1
In order to provide an answer to this question, we first compute the partial correlation [13] between the number of prescribers in the states with the number of opioid deaths, by controlling the effect of the total number of prescriptions. Next, we compute the partial correlation between the number of prescriptions in the states with the number of opioid deaths, by controlling the effect of the total number of prescribers. Both the correlations were computed using the data available in . It may be noted that partial correlation is a measure of linear relationship between two variables while controlling the effect of a third variable. In this context, we first found the relationship between the prescriber and death by controlling the number of prescriptions and then found the relationship between the number of prescriptions and death by controlling the prescribers. The results of the correlations are presented in the Table. I.
Number of  Number of  

Prescribers  Prescriptions  
Opiate Deaths  
(Partial Correlation)  0.4664  0.3619 
In Table. I, we observe a moderate positive correlation between the number of prescribers and prescriptions with Opioid deaths. This implies that, with an increase in prescribers and prescriptions, there tends to be an increase in Opioid related deaths.
IiiB Data Analysis for Q2
In the previous section, we analyze the relationship between the prescribers and Opioidrelated deaths, and notice that Opioid prescribers play a significant role. Given this information, we try to predict whether a prescriber predicts less than or more than 10 opiate prescriptions in a calendar year, based on data. This problem can be framed as a supervised classification task with two classes (class 1 and class 2, representing less and more than 10 prescriptions per calender year respectively).
As an initial data preprocessing step, we ran a series of boosting algorithms, using data on the nonopioid drugs and treating Gender, State and prescribers’ Specialty as categorical variables. This is because, our motivation was to predict, if a prescriber would prescribe less than or more than 10 Opioid prescriptions, by just analyzing their trend of issuing non Opioid prescriptions.
XGBoost [15] gave a test accuracy of 81.8. Using CatBoost [16], the accuracy increased to 84.7. The CatBoost algorithm also provided a feature importance array. A partial list is depicted in Figure 1. Providers’ specialty is having the most impact by far. As a follow up, we looked into providers that prescribed Opioids, and their specialties.
We also implemented a multilayer perceptron, for this classification task. The perceptron model considered the State categorical variable along with the non Opioid drugs. The MLP had three hidden layers of sizes, 500, 400 and 300 neurons each. We used the Adam Optimizer to optimize the loss function. The learning rate was set at 0.0001. This model produced a training accuracy of
and a testing accuracy of . Finally, we implemented a random forest classifier, with 200 trees. This model had an accuracy of . Due to the lack of available data, we could not perform a trend analysis and quantify the growth of the Opioid related deaths, with the prescribers’ specialty.Having analyzed the role of prescribers in the country, we examined the role of prescribers by state. Adjusting the Opioid related deaths in a state with the population of that state, we get the Opioid deaths per capita, which is shown in Fig. 2. We can see that the state of West Virginia is the worst affected state by this ongoing Opioid epidemic. Fig. 3 illustrates the annual average Opioids prescribed by state, for which there are prescribers who prescribed more than 10 or more Opioids in a year. The figure plots the annual average values for those states which exceed 150 Opioid prescriptions. We next examined the specialties of the prescribers in the country. We discovered that the specialty, Addictive Medicine, prescribed the most average annual Opioid drugs. This result is illustrated in Fig. 4. The figure plots the average annual Opioid prescription rate, by specialty, where the number of such prescriptions exceeded 250. Addictive Medicine prescribers usually tend to patients, who are addicted to alcohol, drugs, etc. These results give us an idea of the role of prescribers and their specialties, nationwide and by state, in the ongoing Opioid epidemic. The results are nothing but a starting point in the mitigation of the epidemic, using data analytic techniques.
IiiC Data Analysis for Q3
In order to provide an answer to this question, we first compute the partial correlation [13] between the median income in a neighborhood to the number of Opioid related deaths in that neighborhood, by controlling the effect of median age and median education level of the neighborhood. The correlation was computed using the data available in . The result of this correlation can be found in column 2 of Table. II.
This result shows us that the median income of the neighborhoods have a very small negative correlation with the Opioid related deaths in the neighborhoods. This implies that, the Opioid addiction has spread to all income levels, in the neighborhoods of Cincinnati.
Median  Median  Median  

Income  Age  Education  
Opiate Deaths  
(Partial Correlation)  0.0576  0.0789  0.1516 
IiiD Data Analysis for Q4
In order to provide an answer to this question, we first compute the partial correlation between the median age in a neighborhood to the number of Opioid related deaths in that neighborhood, by controlling the effect of median income and median education level of the neighborhood. The correlation was computed using the data available in . The result is presented in column 3 of Table. II.
The partial correlation coefficient of median age and Opioid related deaths is similar to that of median income and Opioid related deaths. This is small negative correlation illustrates that the addiction is not just concentrated to certain age groups, but affects individuals of all ages.
IiiE Data Analysis for Q5
In order to provide an answer to this question, we first compute the partial correlation between the median education in a neighborhood to the number of Opioid related deaths in that neighborhood, by controlling the effect of median income and median age level of the neighborhood. The correlation was computed using the data available in . The result is presented in column 4 of Table. II.
This result illustrates that there is a weak negative partial correlation between the median education and Opioid related deaths. This implies that, as the median education value increases, the Opioid related deaths tend to decrease.
IiiF Joint Analysis for Q3, Q4 and Q5
In order to determine the joint effect of the median income, median age and median education level of the neighborhoods on Opioid related deaths, we computed the multiple correlation between these three predictor variables and the target variable, Opioid related deaths. We found a weak positive correlation, which implies that median income, median age and median education are related in a way that is producing a counterintuitive result. To decipher the accurate impact of these three predictor variables on the target variable, we require more granular level data. This granular data will help to model these variables and to calculate the impact of these variables on Opioidrelated deaths.
Iv Conclusions
Firstly, we highlight the role of prescribers and prescription opiate drugs, by analyzing their role with the number of Opioid related deaths in 2014. This analysis shows that there is a moderate positive correlation between the number of prescribers and the number of prescriptions with the number of Opioid related deaths in U.S. states. Secondly, our classification models report higher accuracy when compared to the benchmark scores of IBM. We analyzed the possibility of a prescriber prescribing Opiate drugs, by studying their trend of issuing non Opioid prescriptions. Thirdly, we take a look at the neighborhoods of Cincinnati to observe the impact of income, age and education on the Opioid related deaths in the city. We find that the Opioid addiction affects individuals of all income and age levels, and is not just limited to one specific level. Finally, we observe that, with an increase in the educational levels of a neighborhood, the Opioid related deaths tend to decrease.
References
 [1] J. H. Davis, ”Trump Declares Opioid Crisis a ’Health Emergency’ but Requests No Funds”, https://www.nytimes.com/2017/10/26/us/politics/trumpopioidcrisis.html, 2017.
 [2] Blue Cross Blue Shield, ”https://www.bcbs.com/thehealthofamerica/reports/americasopioidepidemicanditseffectonthenationscommerciallyinsured”.
 [3] Centers for Medicare and Medicaid Services, ”https://www.cms.gov/ResearchStatisticsDataandSystems/StatisticsTrendsandReports/MedicareProviderChargeData/PartDPrescriber.html.
 [4] Kaggle Dataset: ”U.S. Opiate Prescriptions/Overdoses”, https://www.kaggle.com/apryor6/usopiateprescriptions.
 [5] D. Wei, ”Combating the Opioid Epidemic with Machine Learning”, https://www.ibm.com/blogs/research/2017/08/combatingtheopioidepidemicwithmachinelearning/, 2017.
 [6] T.K. Mackey, J. Kalyanam, T. Katsuki, G. Lanckriet, ”TwitterBased Detection of Illegal Online Sale of Prescription Opioid”, American Journal of Public Health, 2017.

[7]
M. Chary, N. Genes, C. GiraudCarrier, C. Hanson, L.S. Nelson, A.F. Manini, ”Epidemiology from Tweets: Estimating Misuse of Prescription Opioids in the USA from Social Media”, Journal of Medical Toxicology, 13, 278286, 2017.
 [8] IBM Opioid Github: ”https://github.com/IBM/predictopioidprescribers”, 2017.
 [9] City of Cincinnati, ”Heroin Overdoses”, https://insights.cincinnatioh.gov/stories/s/Heroin/dm3sep3u/.
 [10] CityData, ”http://www.citydata.com/city/CincinnatiOhio.html”.

[11]
Statistical Atlas,
”https://statisticalatlas.com/place/Ohio/Cincinnati/Overview”. 
[12]
Point2Homes,
”https://www.point2homes.com/US/Neighborhood/OH/CincinnatiDemographics.html”. 
[13]
Partial Correlation,
https://www.unc.edu/courses/2008spring/psyc/270/001/partials.html.  [14] K. Landis, ”Metroeast doctors bring opioid battle to emergency rooms”. 2018.
 [15] XGBoost, ”http://dmlc.cs.washington.edu/xgboost.html”.
 [16] CatBoost, ”https://catboost.yandex/”.
Comments
There are no comments yet.