Platelet products are a vital component of patient treatment for bleeding problems, cancer, AIDS, hepatitis, kidney or liver diseases, traumatology and in surgeries such as cardiovascular surgery and organ transplants (kumar2015platelet). In addition, miscellaneous platelet usage and supply are associated with several factors such as patients with severe bleeding, trauma patients, aging population and emergence of a pandemic like COVID-19 (stanworth2020effects). The first two factors affect the uncertain demand pattern, while the latter two factors result in donor reduction. Platelet products have five to seven days shelf life before considering test and screening processes that typically last two days (fontaine2009improving). The extremely short shelf life along with the highly variable daily platelet usage makes platelet demand and supply management a highly challenging task, invoking a robust blood product demand and supply system.
To deal with high demand variation, hospitals usually hold excess inventory. However, holding surplus inventory makes platelet demand forecasting even more challenging for blood distribution centres. In particular, holding excessive platelet inventory in hospital blood banks not only results in a high wastage rate, but also prevents the blood suppliers from realizing the real underlying platelet demand. which in turn yields an inefficient demand forecasting system. This can result in the bullwhip effect, the increased variation in demand as a result of moving upstream in the supply chain (croson2006behavioral). This effect arises in a number of domains, including grocery supply chains (dejonckheere2004impact) as well as in healthcare service-oriented supply chains (samuel2010supply; rutherford2016evidence). The bullwhip effect is inevitable in a healthcare system, since for the blood products, specifically platelet products with extremely short shelf lives, hospitals tend to order more than their regular demand to ensure that they can meet the actual demand, as shortages must be avoided (they can put patients at risk). Apart from this bullwhip effect, there are many uncertainties faced by blood centres due to the high demand variation. Since platelet products are perishable, high inventory levels result in excessive wastage, something that could be mitigated with better demand forecasting. On the other hand, low inventory levels increase the risk of shortages resulting in urgent delivery costs. Accordingly, accurately forecasting the demand for blood products is a core requirement of a robust blood demand and supply management system.
Two organizations, CBS and Héma-Québec, are responsible for providing blood products and services in transfusion and transplantation for Canadian patients. The former operates within all Canadian provinces and territories excluding Québec, while the latter is in charge of the province of Québec. The current blood supply chain for CBS is an integrated network consisting of a regional CBS distribution centre and several hospitals, as illustrated in Fig. 1. Currently, there are nine regional blood centres operating for CBS, each covering the demand for several hospitals (cbs:2020). Hospitals request blood products from the regional blood centres for the next day, yet, the regional blood centres are not aware of the actual demands as each hospital has its own blood bank. Furthermore, recipients’ demographic and hospitals’ inventory management systems are not disclosed to CBS or the regional blood centres. This results in a high wastage rate for the hospitals (about 6% in Hamilton, Ontario) and CBS (about 15% and has seasonal variation), which may be partly due to the bullwhip effect.
This research is motivated by the platelet management problem confronted by CBS. There are several issues faced by CBS, making the platelet management problem difficult. As mentioned above, CBS currently does not have access to transfused patients’ demographics. Also, owing to the lack of information transmission between CBS and hospitals, the effect of clinical changes on platelet demand cannot be tracked by CBS. The need for robust and accurate platelet demand forecasting calls for transparency of information between CBS and hospitals. In this research, we forecast platelet demand to overcome these challenges.
We study a large clinical database with 61377 platelet transfusions for 47496 patients in hospitals in Hamilton, Ontario from 2010 to 2018. We analyse the database to extract trends and patterns, and find relations between the demand and clinical indicators. We find that there are three key issues that should be considered in the demand forecasting process: seasonality, the effect of clinical indicators on demand, and nonlinear correlations among these clinical indicators. Consequently, we progressively build four demand forecasting models (of increasing complexity) that address these issues. The proposed methods are applied on the data to determine the influence of demand history as well as clinical indicators on demand forecasting. The first two methods are univariate time series that only consider the demand history, while the remaining two methods, multivariate regression and machine learning, consider clinical indicators. These four methods are utilized to pursue the following goals: i) more precise platelet demand forecasting resulting in increased information transparency between CBS and hospitals, ii) reducing the bullwhip effect, as a consequence of effective demand forecasting; iii) investigating the impact of clinical indicators on the platelet demand; and iv) having a robust forecasting model for platelet demand which accounts for variation in multiple factors. The main contributions of this study are as follows:
We analyze the time series of platelet transfusion data by decomposing it into trend, seasonality and residuals, and detect meaningful patterns such as weekday/weekend and holiday effects that should be considered in any platelet demand predictor.
We utilize four different demand forecasting methods from univariate time series methods to multivariate methods including regression and machine learning. Since CBS has no access to recipients’ demographic data, our first method, ARIMA, only considers demand history for forecasting, while the second model, Prophet, includes seasonalities, trend changes and holiday effects. We found that these models have issues with respect to accuracy, in particular when a limited amount of data are available, accordingly we apply a lasso regression method to include clinical indicators for demand forecasting. Finally, LSTM networks are used for demand forecasting to explore the nonlinear dependencies among the clinical indicators and the demand.
We utilize clinical indicators in the demand forecasting process, and select those that are most impactful through a structural variable selection and regularization method called lasso regression. Results show that incorporating the clinical indicators in demand forecasting enhances the forecasting accuracy.
We investigate the effect of different amounts of data on the forecasting accuracy and model performance and provide a holistic evaluation and comparison for different forecasting methods to evaluate the effectiveness of these models for different data types, providing suggestions on using these robust demand forecasting strategies in different circumstances. Results show that by having a limited amount of data (two years in our case), multivariate models outperform the univariate models, whereas having a large amount of data (eight years in our case) results in the ARIMA model performing nearly as well as the multivariate methods.
The rest of this paper is organized as follows. Section 2 provides an initial exploration of the data. In Section 3 we provide a literature review of demand forecasting methods for blood products, with a focus on platelets. We provide the data description, model background, model development and evaluation of four different models for platelet demand forecasting in Section 4. In Section 5, a comparison of the models is provided, and finally, in Section 6, concluding remarks are provided, including a discussion of ongoing work for this problem.
2 Problem Definition
In this study, we consider a blood supply system consisting of CBS and four major hospitals operating in Hamilton, namely, Hamilton General Hospital, Juravinski Hospital, McMaster University Medical Centre (MUMC), and St. Joseph’s (STJ) Healthcare Hamilton. These hospital blood banks operate with a Transfusion Medicine (TM) laboratory team to manage blood product transfusions to patients. Platelet products are collected and produced at CBS and after testing for viruses and bacteria (a process which lasts two days), platelets are ready to be shipped to hospitals and transfused to patients. At the beginning of the day, hospitals receive platelet products that were ordered on the previous day, from CBS. In the case of shortages, hospitals can place expedited (same-day) orders at a higher cost. Prior to September 2017, platelets had five days of shelf life, while after this date, the shelf life of platelets was increased to seven days. After passing the shelf life, platelet products are expired and discarded. Given the high shortage and wastage costs, forecasting short-term demand for platelets is of particular value.
In order to propose a short-term demand forecasting model for CBS, we first explore the data for identifying temporal (daily/monthly) patterns that can inform our demand forecasting techniques. This initial analysis ranges from 2010/01/01 to 2018/12/31. An initial observation is that the demand is highly variable, with a transfused daily average of 17.90 units and a standard deviation of 7.05 units.
In terms of specific temporal patterns, we first looked into the daily platelet usage patterns by year, month and day. There are no specific yearly patterns for platelet transfusions. Fig. 2
(fig:MeanDailyUnits(b)) illustrates the daily platelet usage pattern by month. It indicates that the platelet usage varies by month, the lowest number of transfusions is in January while the highest number of transfusions is in July (January = mean [sd(standard deviation)]: 17.35 [6.43], July = mean [sd]: 20.38 [8.03]). It is worth noticing that in July, the number of outliers (points outside of the whiskers in the boxplot) is also higher in comparison to other months. Outliers are defined as points with values greater than the third quartile plus 1.5 times the interquartile range or less than the first quartile minus 1.5 times the interquartile range. As we can see from Fig.2(fig:MeanDailyUnits(b)), there is significant monthly seasonality (one-way ANOVA, F = 3.94, P value <0.001).
The mean daily platelet usage by day of the week is plotted in Fig. 2
(fig:MeanDailyUnits(c)). As we can see, there is a significant difference in the mean daily platelet usage when comparing weekdays to weekends (weekday = mean [sd]: 21.20 [6.22], weekend = mean [sd]: 12.37 [4.60], Mann-Whitney U test: P value = 0.04). The P value is considered as statistically significant (as it is less than 0.05). One reason for the lower platelet usage on weekends is that prophylactic platelet transfusions to cancer patients on Fridays ensure that platelet counts remain sufficiently high over the weekend. This pattern of practice (prophylactic use) does not tend to occur in other patient populations (trauma, surgery, etc.) since these patients get platelets as a direct consequence of bleeding. Furthermore, during the weekdays, Tuesday has the highest number of transfusions (mean [sd]: 21.68 [6.10]), while Monday has the lowest number of transfusions (mean [sd]: 20.04 [6.53]). Consequently, there is a clear weekday/weekend effect. This reflects that day of the week should be considered in any daily demand estimator.
Fig. 3 compares the mean daily units transfused and the mean daily units received from January 1, 2010 to December 31, 2018. Reviewing the figures, it can be concluded that generally, the range of the daily units received is greater than the range of total transfused units in a given, day, month, or year which suggests higher variability for the total number of received units (standard deviation of 9.33) compared to transfused units (standard deviation of 7.04). Indeed, this has its roots in the bullwhip effect, that is hospitals tend to order more than their actual demand. We see an opportunity to better coordinate supply (number of units received) with demand (number of units transfused) through the development of a daily demand predictor.
Fig. 3(fig:MeanDailyTR(a)) shows that the number of received units is more than the transfused units over the years. However, there does not appear to be any pattern between the number of received and transfused units, suggesting that there is no yearly impact on platelet demand. Fig. 3(fig:MeanDailyTR(b)) indicates that there is a near-uniform pattern for the number of received and transfused units by month. Finally, in Fig. 3(fig:MeanDailyTR(c)), it can be noted that on Mondays the number of received units tends to be significantly larger than for the weekend. It is also noticeable that on weekends the number of received and transfused units clearly differ from the weekdays due to lower staffing levels over the weekends. On Saturday, on average, the number of transfused units is lower than the number of received units, but on Sunday, the total number of received units drops considerably, resulting in a large gap between the number of received and transfused units. This appears to be compensated for by the total number of received units on Monday. This is an additional effect that could be mitigated by better coordination between supply and demand.
We additionally explored applying the Seasonal and Trend decomposition using Loess (STL) model to decompose the time series data into trend, seasonality, and residuals. Fig. 4 depicts the time series decomposition ranging from 2010 to 2018. As we can see from Fig. 4, there is an upward trend for data in this period, and it becomes stationary from 2016 onwards. As a result, we will train our models in two scenarios, one with the whole dataset and the other by using the data ranging from 2016 to 2018.
3 Literature Review
There is a limited literature on platelet demand forecasting; most investigates univariate time series methods. In these studies, forecasts are based solely on previous demand values, without considering other features that may affect the demand. frankfurter1974management develop transfusion forecasting models using Exponential Smoothing (ES) methods for a blood collection and distribution centre in New York. critchfield1985automatic develop models for forecasting platelet usage in a blood centre using several time series methods including Moving Average (MA), Winter’s method and ES. silva2012decision develop a Box-Jenkins Seasonal Autoregressive Integrated Moving Average (BJ-SARIMA) model to forecast weekly demand for blood components in hospitals. Their proposed method, SARIMA, is based on a Box-Jenkins approach that takes into consideration the seasonal and nonseasonal characteristics of time series data. Later, silva2013demand extend their model by developing an automatic procedure for demand forecasting while also changing the level of the model from hospital level to regional blood centre in order to help managers use the model directly. kumari2016efficient propose a blood inventory management model for the daily supply of platelets focusing on reducing platelet shortages. Three time series methods, namely MA, Weighted Moving Average (WMA) and ES are used to forecast the demand, and are evaluated based on shortages. fortsch2016reducing test various approaches to predict blood demand such as Naïve, moving average, exponential smoothing, and multiplicative Time Series Decomposition (TSD), among them a Box-Jenkins (ARMA) approach, which uses an autoregressive moving average model, results in the highest prediction accuracy. They forecast total blood demand as well as individual blood type demands, a feature that differentiates their method. lestari2017forecasting apply four models to forecast blood components demand including moving average, weighted moving average, exponential smoothing, exponential smoothing with trend, and select the best method for their data based on the minimum error between forecasts and the actual values. volken2018red use generalized additive regression and time-series models with exponential smoothing to predict future whole blood donation and RBC transfusion trends.
Several recent studies take clinically-related indicators into consideration. drackley2012forecasting estimate long-term blood demand for Ontario, Canada based on previous transfusions’ age and sex-specific patterns. They forecast blood supply and demand for Ontario by considering demand and supply patterns, and demographic forecasts, with the assumption of fixed patterns and rates over time. khaldi2017artificial
apply Artificial Neural Networks (ANNs) to forecast the monthly demand for three blood components, red blood cells (RBCs), platelets and plasma for a case study in Morocco.guan2017big propose an optimization ordering strategy in which they forecast the platelet demand for several days into the future and build an optimal ordering policy based on the predicted demand, concentrating on minimizing the wastage. Their main focus is on an optimal ordering policy and they integrate their demand model in the inventory management problem, meaning that they do not try to precisely forecast the platelet demand. li2020decision
develop a hybrid model consisting of seasonal and trend decomposition using Loess (STL) time series and eXtreme Gradient Boosting (XGBoost) for RBC demand forecasting and incorporate it in an inventory management problem.
In this study, we utilize multiple demand forecasting methods, including univariate analysis (time series methods) and multivariate analysis (regression and machine learning methods), and evaluate the performance of these models for platelet demand forecasting. Moreover, in contrast with earlier studies, we explore the value gained from including a range of clinical indicators for platelet demand forecasting. More specifically, we consider clinical indicators, consisting of laboratory test results, patient characteristics and hospital census data as well as statistical indicators, including the previous week’s platelet usage and previous day’s received units. In addition to the linear effects of the clinical indicators, we study the nonlinear effect of these clinical indictors in our choice of machine learning model. Results indicate that platelet demand is dependent on the clinical indicators and therefore including them in the demand forecasting process enhances the accuracy. Moreover, we explore the effect of having different amounts of data on the accuracy of the forecasting methods. To the best of our knowledge, this study is the first that utilizes and evaluates different demand forecasting methodologies from univariate time series to multivariate models for platelet products and explores the effect of the amount of available data on these approaches.
4 Demand Forecasting
4.1 Data Description
The data in this study are constructed by processing CBS shipping data and the TRUST (Transfusion Research for Utilization, Surveillance and Tracking) database at the McMaster Centre for Transfusion Research (MCTR) for platelet transfusion in Hamilton hospitals. The study is approved by the Canadian Blood Services Research Ethics Board and the Hamilton Integrated Research Ethics Board.
The dataset ranges from 2010 to 2018 and consists of 61377 transfusions for 47496 patients in Hamilton, 48190 transfusions to inpatients and 13187 to outpatients. It is high dimensional, with more than 100 variables that can be divided into four main groups: 1. the transfused product specifications such as product name and type, received date, expiry date, 2. patient characteristics such as age, gender, patient ABO Rh blood type, 3. the transfusion location such as intensive care, cardiovascular surgery, hematology, and 4. available laboratory test results for each patient such as platelet count, hemoglobin level, creatinine level, and red cell distribution width.
The data for analysis are processed in two steps: in the first step, we have the granular data that includes all of the platelet transfusions. Each row contains product-related information, the recipient’s characteristics, hospital location information and lab tests. In the second step, a daily-aggregated dataset is constructed with each row containing the daily product, patient-related and location information along with the aggregated lab tests. In the daily aggregated dataset, 170 processed variables are included using straightforward statistical transformations (e.g. mean, min, max, sum).
Additionally, we add new variables such as the number of platelet transfusions in the previous day and previous week, the number of received units in the previous day, and day of the week. Table 1 gives the set of variables that are used in this study along with their descriptions. These variables are selected by a lasso regression model (tibshirani1996regression) which is explained in detail in Section 4.4. Since the ABO Rh blood type compatibility between the patient and the product is not necessary (although preferable) for platelet transfusion, it is not considered as a model variable.
|abnormal_ALP||Number of patients with abnormal alkaline phosphatase|
|abnormal_MPV||Number of patients with abnormal mean platelet volume|
|abnormal_hematocrit||Number of patients with abnormal hematocrit|
|abnormal_PO2||Number of patients with abnormal partial pressure of oxygen|
|abnormal_creatinine||Number of patients with abnormal creatinine|
|abnormal_INR||Number of patients with abnormal international normalized ratio|
|abnormal_MCHb||Number of patients with abnormal mean corpuscular hemoglobin|
|abnormal_MCHb_conc||Number of patients with abnormal mean corpuscular hemoglobin concentration|
|abnormal_hb||Number of patients with abnormal hemoglobin|
|abnormal_mcv||Number of patients with abnormal mean corpuscular volume|
|abnormal_plt||Number of patients with abnormal platelet count|
|abnormal_redcellwidth||Number of patients with abnormal red cell distribution width|
|abnormal_wbc||Number of patients with abnormal white cell count|
|abnormal_ALC||Number of patients with abnormal absolute lymphocyte count|
|location_GeneralMedicine||Number of patients in general medicine|
|location_Hematology||Number of patients in hematology|
|location_IntensiveCare||Number of patients in intensive care|
|location_CardiovascularSurgery||Number of patients in cardiovascular surgery|
|location_Pediatric||Number of patients in pediatrics|
|Monday||Indicating the day of the week|
|Tuesday||Indicating the day of the week|
|Wednesday||Indicating the day of the week|
|Thursday||Indicating the day of the week|
|Friday||Indicating the day of the week|
|Saturday||Indicating the day of the week|
|Sunday||Indicating the day of the week|
|lastWeek_Usage||Number of units transfused in the previous week|
|yesterday_Usage||Number of platelet units transfused in the previous day|
|yesterday_ReceivedUnits||Number of units received by the hospital in the previous day|
One of the data characteristics is that data variables, in particular abnormal laboratory test results, are highly correlated, as shown in Fig. 5. These high correlations give rise to some challenges when data variables are considered in the demand forecasting process, as will be seen in Section 4.4.
As we can see from Table 1
, data variables have different ranges, and hence are standardized by removing the mean and scaling to unit variance by, where is the variable, is its mean and is its standard deviation. All data processing and analysis and model implementations are carried out using the Python 3.7 programming language. We implement four models and train them for two scenarios. In the first scenario, we train each model for two years of data from 2016 to 2017 and we use data from 2018 for testing. In the second scenario, each model is trained with data from 2010 to 2017 for eight years, and is tested with 2018 data. We use Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE) to evaluate model performance.
4.2 The ARIMA model
An autoregressive integrated moving average model consists of three components, an autoregressive (AR) component that considers a linear combination of lagged values as the predictors, a moving average (MA) component of past forecast errors (white noise), and an integrated component where differencing is applied on the data to make it stationary. Letbe the observations over time period ; the time series can be written as:
An ARIMA model assumes that the value of an observation is a linear function of a number of previous past observations and stochastic terms. The stochastic term, , is independent of previous observations and is identically distributed for each
, following a normal distribution with zero mean. The ARIMA model can be written as:
where is a constant, and are model parameters in which and , and are the model orders and define the number of autoregressive terms and moving average terms, respectively.
In order to apply the ARIMA model on a time series, one should first make it stationary. We use the Augmented Dickey-Fuller (ADF) test (cheung1995lag) to examine our time series data for stationarity. Given a stationary time series, in the next step, model parameters are estimated such that the error is minimized. We estimate the parameters using the Hyndman-Khandakar algorithm which uses a stepwise search for a combination of and to minimize AIC (Akaike information criterion) (sakamoto1986akaike). Having the model fitted based on the estimated parameters, the next step is to evaluate it for which we calculate RSME and MAPE errors and also examine the model residuals through ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) plots. By using the ARIMA model discussed above, we train the model for two years: 2016 and 2017. Fig. 6 depicts the actual demand and the predicted demand from the ARIMA model. The RMSE and MAPE for the predictions are 8.13 and 46.32%, respectively. As we can see from Fig. 6, the ARIMA model cannot capture demand variations and its forecasts are close to the mean daily units transfused (18.67).
To perform a deeper investigation of the high forecasting error, we examine the model residuals, ACF and PACF, to find any possible correlation that remains in the differentiated time series. Fig. 7(fig:Res(a)) shows that the residuals are normally distributed with zero mean, meaning that no trend exists for the model residuals. The ACF plot (Fig. 7(fig:ACF(b))) represents the coefficients of correlation between a value and its lag, and the PACF (Fig. 7(fig:PACF(c))) plot illustrates the conditional correlation between the value and its lag. As we can see from Fig. 7, there is an autocorrelation and partial autocorrelation at time seven (and multiples of seven) due to weekly seasonality that is not incorporated in the model.
In the next step, we train an ARIMA model with the whole dataset for eight years from 2010 to 2017, and use one differentiation to induce stationarity in the data. Fig. 8 shows the result of demand forecasting for 2018 with an ARIMA model trained for the previous eight years. As we can see, forecasts are improved notably with RMSE of 5.72 and MAPE of 28.96%.
The results show that ARIMA has a significant improvement when moving from two years to eight years worth of data. Since ARIMA’s forecasts are only based on the previous demands, and the seasonality in data has not changed significantly during the eight years, the model parameters, and , are more robust for longer time series data resulting in more accurate forecasts. The performance of the ARIMA model trained with two years and eight years of data are presented in Table 2.
As the results indicate, when a limited amount of data are available, the ARIMA model has a high forecast error not only because its forecasts are solely based on the previous demands, but also due to the fact that it cannot capture the seasonality in data. Since seasonality is one of the primary features of our time series data, in the next section, we include seasonality directly in the forecasting process.
|ARIMA trained for 2 years||6.77||33.44%||8.13||46.32%|
|ARIMA trained for 8 years||5.86||30.49%||5.72||28.96%|
4.3 Prophet model
Prophet is a time series model introduced by taylor2018forecasting that considers common features of business time series: trends, seasonality, holiday effects and outliers. Let be the trend function which shows the long-term pattern of data, be the seasonality which captures the periodic fluctuations in data such as weekly, monthly or yearly patterns, and finally be the non-periodic holiday effect. These features are combined through a generalized additive model (GAM) (hastie1987generalized), and the Prophet time series model can be written as:
The normally distributed error
is added to model the residuals. The Prophet model was developed for forecasting events created on Facebook and is implemented as an open source software package in both Python and R. Using the Prophet model described above, we construct a demand forecast model for our data. We consider weekly and yearly seasonality as observed in Fig.2. Moreover, we add Canadian holiday events through the built-in country-specific holiday method in Python. Fig. 9 illustrates the components of the forecasts for the training and testing period, the trend, holidays, weekly seasonality, and yearly seasonality. As we can see from Fig. 9, there is a downward trend from the beginning of 2016 to July 2017 and an upward trend from July 2017 to the end of 2018. Besides, almost all holidays have a negative effect on the model, except for July 1st. This means that the demand is lower than regular weekdays for almost all of the holidays, except for July 1st. However, when the holidays are removed from the model, the model performance decreases.
We can also see that there is weekly seasonality in which Wednesdays have the highest platelet usage while the weekends have the lowest usage, in qualitative agreement with Fig. 2. Moreover, the monthly seasonality, captured by Fourier series in the Prophet model, depicts three cycles: 1. January to May in which March has the highest demand while May has the lowest demand; 2. May to September in which the demand is highly variable. July has the highest demand in this cycle and the highest demand of all months while May has the lowest demand in the cycle and also the lowest demand of all months; 3. September to January with a slight variation in demand - November with the highest and January with the lowest demands.
|Prophet model trained for 2 years||5.83||26.05%||6.64||37.73%|
|Prophet model trained for 8 years||5.48||28.89%||5.99||30.93%|
The results of predictions with the Prophet model are illustrated in Fig. 10. The black dots are actual demand values, and the blue line shows the model forecasts with the 95% uncertainty interval for the forecasts. The Prophet model has an RMSE of 6.64 and MAPE of 37.73% for the forecasted values. By removing holidays from the model, there are slightly higher forecasting errors, RMSE of 6.84 and MAPE of 39.26%.
As a further step, the Prophet model is trained with data from 2010 to 2017 and tested with data in 2018, resulting in more accurate forecasts with test RMSE of 5.99 and test MAPE of 30.93%. Table 3 presents the performance of the Prophet model trained with two years and eight years of data. Unlike ARIMA, the Prophet model’s accuracy is not significantly affected as the amount of data increases.
Given that Prophet’s forecasting errors are high, in the next step we explore the effect of including clinical indicators in the forecasting process.
4.4 Lasso Regression
Lasso regression is a linear regression method that was popularized bytibshirani1996regression
to perform feature selection and regularization. We use lasso regression since it allows clinical indicators to be included in the demand forecast model. In order to do so, the lasso method puts a constraint on the sum of the absolute values of the model weights to be less than a certain value. This results in some of the variable weights being set to zero. In other words, it performs variable selection to reduce the complexity of the model, as well as improving the prediction accuracy. This not only results in interpretable models, but also tends to reduce overfitting. By considering the actual demand on day() as and the predicted demand on day as the product of the clinical indicators () and their corresponding weights ,where specifies the clinical indicator and , the lasso model is the solution to the following optimization problem:
The optimization problem defined in (4)-(5) chooses the weights, , that minimize the sum of squares of the errors between the actual values () and the predicted values, with a sparsity penalty () on the sum of the absolute values of the model weights. Constraint (5) forces some of the weights (that have a minor contribution to the estimate) to be zero. Variables that have non-zero weights are selected in the model. In this study, lasso regression is used as a variable selection method to find important clinical indicators for platelet demand. Subsequently, this information is used for demand forecasting. Lasso regression requires selection of the penalty weight . For our data, is chosen based on five-fold cross-validation.
In order to calculate the uncertainty of the resulting predictor, confidence intervals are calculated for the weights as well as the resulting demand forecasts. There are multiple methods for calculating the confidence interval for lasso regression; one of the most popular is the bootstrap method(efron1994introduction) which is a widely-used non-parametric method for estimating statistics such as mean and standard deviation of an estimate by sampling the data with replacement. It can also be used to estimate the accuracy of a machine learning method for forecasting. The bootstrap method is used in the experiments for calculating confidence intervals for the lasso regression forecasts and the weights.
Table 4 gives the selected variables that result in an RMSE of 5.52 and MAPE of 24.26% for the two years of data training set and RMSE of 5.93 and MAPE of 28.55% for the testing set. Considering the weights for the variables and their corresponding confidence intervals in Table 4, and based on (ranstam2012p), variables that have zero weights and confidence intervals that are symmetric around zero are candidates to be eliminated. As we can see from Table 4, abnormal plt has the highest weight. The variables abnormal_hb and abnormal_redcellwidth can be considered as two other important lab tests for forecasting the demand. Day of the week, last week’s platelet usage and yesterday’s platelet usage also have notable impact on the platelet demand. As we can see in Table 4, unexpectedly, some of the variables have a negative weight in the demand forecasting model. The reason is that, as we can see from Fig. 5, there are high correlations among the variables that result in interactions among the model predictors, which may cause multicollinearity issues. Specifically, the variables abnormal_hb, abnormal_INR, abnormal_hematocrit, and abnormal_MPV are correlated with abnormal_plt. The variables abnormal_hematocrit and abnormal_hb also have high correlations with most of the other abnormal laboratory test results.
|abnormal_ALP||-0.02||(-0.08 , 0.04)|
|abnormal_MPV||0.01||(-0.06 , 0.11)|
|abnormal_hematocrit||0.00||(-0.11 , 0.14)|
|abnormal_PO2||-0.11||(-0.19 , 0.00)|
|abnormal_creatinine||0.03||(-0.03 , 0.11)|
|abnormal_INR||0.06||(-0.02 , 0.22)|
|abnormal_MCHb||-0.03||(-0.10 , 0.04)|
|abnormal_MCHb_conc||-0.03||(-0.10 , 0.04)|
|abnormal_hb||0.05||(-0.04 , 0.19)|
|abnormal_mcv||-0.03||(-0.11 , 0.04)|
|abnormal_plt||0.23||(0.02 , 0.36)|
|abnormal_redcellwidth||0.07||(0.00 , 0.15)|
|abnormal_wbc||-0.02||(-0.09 , 0.03)|
|abnormal_ALC||0.01||(-0.05 , 0.08)|
|location_GeneralMedicine||-0.11||(-0.21 , 0.00)|
|location_Hematology||0.04||(-0.02 , 0.16)|
|location_IntensiveCare||0.05||(-0.01 , 0.15)|
|location_CardiovascularSurgery||0.04||(-0.03 , 0.11)|
|location_Pediatric||0.04||(-0.02 , 0.10)|
|Monday||0.07||(0.00 , 0.16)|
|Tuesday||0.07||(0.00 , 0.14)|
|Wednesday||0.00||(-0.04 , 0.07)|
|Thursday||0.01||(-0.03 , 0.09)|
|Friday||-0.39||(-0.46 , -0.31)|
|Saturday||-0.31||(-0.39 , -0.23)|
|Sunday||0.10||(0.03 , 0.18)|
|lastWeek_Usage||0.12||(0.05 , 0.19)|
|yesterday_Usage||0.10||(0.02 , 0.17)|
|yesterday_ReceivedUnits||0.06||(0.00 , 0.14)|
Fig. 11 depicts the actual and predicted demand for 2018. The green line is the predicted demand while the yellow and red lines are the limits of the confidence intervals. We use the bootstrap method for computing the confidence intervals, thus for a 95% confidence interval, the confidence interval lines indicate the range where 95% of the predictions for 1000 runs fall into. As illustrated in Fig. 11, there is not a large gap between these bounds. This small gap depicts the fact that the forecasted values have small variation for the 1000 runs.
It appears that the model does some degree of smoothing and thus cannot detect the sharp peaks. The reason is that regression models tend to be regressed on the expectation of the outcome, and are not good at capturing the extreme derivations from this expectation. However, as shown in Fig. 11, smoothing mostly occurs for the maxima rather than the minima. In other words, the model potentially has large errors when there is unexpected excess demand, for example in emergency situations.
Using the bootstrap method, confidence intervals can also be computed for the weights. As shown in Fig. 12, the weights have a wide range, so we see high values (abnormal_plt = 0.23) as well as low values (Friday = -0.39) for the lab tests and day of the week data. Overall, the range of the weights for the 95% confidence interval is narrow which implies that there are small margin of errors for the forecasted values.
The variables and their corresponding weights are given in Table 4. The weights for lab tests are high. This is consistent with the observation that the lab test results are significant indicators for platelet transfusion. The variables abnormal_plt, abnormal_hb, abnormal_ALC and abnormal_wbc have higher weights and consequently higher impact on platelet demand. For day of the week, Friday and Saturday have negative weights due to the fact that they cover the weekend (Friday: -0.39 and Saturday: -0.31). For hospital census data, except for location_GeneralMedicine, all the weights are in a similar range to the lab tests.
Similar to the other methods, the lasso model is also trained for eight years data, from 2010 to 2017, and tested for 2018 data. The results, presented in Table 5, indicate that there is not much difference for the lasso method when there is a large amount of data for training.
|Lasso for 2 years||5.52||24.26%||5.93||28.55%|
|Lasso for 8 years||5.49||29.85%||5.78||28.02%|
While our goal is to have the minimum forecasting error, we want our models to be interpretable. Since there are high correlations as well as potential nonlinear relationships among the model variables and the linear models are not able to handle them, some unexpected weights appear in the model output. Consequently, in the next section, we forecast platelet demand with a machine learning method, LSTM networks, to explore such issues.
4.5 LSTM Networks
LSTM networks are a class of recurrent neural networks (RNN) that were introduced by(hochreiter1997long)
and are capable of learning long-term dependencies in sequential data. In theory, RNNs should be capable of learning long-term dependencies, however they suffer from the so-called vanishing gradient problem. Consequently, LSTM networks are designed to resolve this issue. Since LSTM networks are able to work with sequential data with long-term dependencies, they have been widely used in various areas such as finance(fischer2018deep; qiu2020forecasting), medicine (wang2020deep; kim2019prediction; bouhamed2020covid), environment (le2019application; zhang2019short) and transportation (lv2014traffic)
for time series forecasting, pattern recognition, speech recognition(smagulova2019survey) and classification (zhou2019long).
We train an LSTM network for two years from 2016 to 2017 toward the minimum MSE and test it for one year, 2018. We perform grid search hyperparameter tuning to find the best parameters. Model input variables are selected via the lasso method discussed in the previous section. As the number of inputs increases, both the data variables that make data wide and the data rows that make data tall, LSTM performance tends to decrease because it is highly dependent on the input size. Moreover, wide data results in model overfitting(lai2018modeling). Having wide data, one can apply a feature selection method such as lasso to reduce the number of variables and regularize the input. However, hyperparameter tuning and training for a large amount of historical data are still time consuming, resulting in a tradeoff between accuracy and time and memory complexity.
We implemented the LSTM network using the TensorFlow package(abadi2016tensorflow). The LSTM network is trained by using the ADAM optimizer (kingma2014adam)
, and MSE (Mean Square Error) is used as the loss function for this optimizer. Forecasting results are presented in Fig.13 with RMSE of 4.28 and MAPE of 16.59 % for the training set and RMSE of 6.77 and MAPE of 28.03% for the testing set. Forecasting problems can have linear or nonlinear relationships among the model variables. Due to the fact that LSTM networks can work on both linear and nonlinear time series, and are able to capture the nonlinear dependencies, it can outperform linear regression models when there exists long term correlations in the time series. Based on the LSTM results, we conclude that long term correlations and nonlinearity are not major issues for our data since the LSTM model does not significantly outperform lasso regression.
We also train the LSTM with eight years of data. Despite the long time spent for hyperparameter tuning and training, its performance is similar to having two years of data meaning that the LSTM network has robust performance with different data volumes. Table 6 gives the performance of LSTM networks when trained with two years vs. eight years of data. As we can see from Table 6, having a larger amount of data reduces overfitting in the LSTM network.
|LSTM for 2 years||4.28||16.59%||6.77||28.03%|
|LSTM for 8 years||4.52||25.53%||6.65||28.52%|
5 Comparison and Discussion
In this section, we compare the methods and provide recommendations for using these methods in various scenarios. In Section 5.1 we compare the methods based on the first scenario (two years of data), in Section 5.2 we discuss the impact of an increased amount of data on the forecasting methods. In Section 5.3 we provide overall recommendations.
5.1 Univariate versus Multivariate Models
We have presented four different models for platelet demand forecasting that can be divided into two groups: univariate and multivariate. Univariate models, ARIMA and Prophet, forecast future demand based only on the demand history. The ARIMA model only considers a limited number of previous values, while the Prophet model incorporates the historical data, seasonality and holiday effects into the demand forecasting model and improves the forecasting accuracy by approximately 10% compared to ARIMA with two years of data. This highlights the impact of weekday/weekend and holiday effect in the platelet demand variation. As we discussed in Section 2, there is a weekday/weekend effect for platelet demand, which is not (directly) captured in the ARIMA model.
Multivariate models, on the other hand, incorporate clinical indicators as well as historical demand data for demand forecasting. We use lasso regression to select the dominant clinical indicators that affect the demand. Lasso regression examines the linear relationship among the clinical indicators and their influence on the demand. However, as presented in Fig. 5, there are several correlations among the clinical indicators. There may also be nonlinear relationships among these clinical indicators that cannot be captured by a linear regression model. These issues motivate us to use a machine learning approach, LSTM networks, that also accounts for the nonlinearities among these variables. Moreover, an LSTM network is capable of retaining the past information while forgetting some parts of the historical data. Table 7 summarizes the train and test errors for all the models. As we can see from the table, LSTM and lasso models outperform other methods, owing to the addition of the clinical indicators. However, one should be cautious using an LSTM model since it has higher time and memory complexity in comparison to other methods.
|Univariate Models||ARIMA trained for 2 years||6.77||33.44%||8.13||46.32%|
|Prophet trained for 2 years||5.83||26.05%||6.64||37.73%|
|ARIMA trained for 8 years||5.86||30.49%||5.72||28.96%|
|Prophet trained for 8 years||5.48||28.89%||5.99||30.93%|
|Multivariate Models||Lasso trained for 2 years||5.52||24.26%||5.93||28.55%|
|LSTM trained for 2 years||4.28||16.59%||6.77||28.03%|
|Lasso trained for 8 years||5.49||29.85%||5.78||28.02%|
|LSTM trained for 8 years||4.54||25.53%||6.65||28.52%|
5.2 Two Years versus Eight Years of Data
As discussed in Section 4, we train our models for two scenarios, with two years and eight years of data, respectively. Since there is no trend in the data from 2016 onwards (see Fig. 4), in the first scenario the models are trained for two years (2016 to 2017). With this amount of data, forecasts are not accurate for univariate time series approaches, and one needs to include the clinical indicators in the forecasting model (see Table 7). However, by dedicating eight years of data for training, the ARIMA model’s performance improves by approximately 20% and is very close to the multivariate methods. The Prophet model, on the other hand, is not significantly affected when more data are available. The reason is that, unlike ARIMA, information about seasonality and holiday effects are explicitly included in the model.
Both of the multivariate models result in small forecasting errors for two years of data for training, whereas they do not perform significantly better as the amount of data increases. This highlights the importance of including the clinical indicators in the forecasting process.
5.3 General Insights and Recommendations
In general, when there is access only to previous demand values, using a univariate model like Prophet that explicitly includes the seasonality in data is effective. In the case that several data variables are available, lasso regression and LSTM can forecast the demand with high accuracy. Considering the time and memory complexity, and interpretability of these models, lasso regression has lower time and memory complexity while it is also very interpretable. Moreover, training an LSTM network needs expertise in the machine learning area since poor training will cause low-precision results. However, it is also worth mentioning that the LSTM network is a robust learning model and is capable of learning the linear and nonlinear relationships among the model variables even in very short time series data (boulmaizimpact; lipton2015learning).
Having a sufficient amount of data, using a traditional method such as ARIMA is also reasonable. Specifically, when there is only access to the previous demand (as is currently the case for CBS) and adequate historical data are available, one can benefit from a simple univariate model like ARIMA. Univariate models are simpler than the multivariate models, and easier to interpret. Machine learning methods, and specifically deep learning methods, have been shown to be overkill in other areas where traditional methods can perform as well as them with lower cost.o2019deep
compare traditional computer vision methods with deep learning models. Although computer vision is a popular domain for the application of deep learning methods,o2019deep suggest that traditional methods are able to perform more efficiently than deep learning methods at lower cost and complexity. rathore2018malware compare different models for malware detection in which the best result is not for a deep learning method. Our observations are consistent with this line of thought.
In this study, we utilize two types of methods for platelet demand forecasting, univariate and multivariate methods. Univariate methods, ARIMA and Prophet, forecast platelet demand only by considering the historical demand information, while multivariate methods, lasso and LSTM, also consider clinical indicators. The high error rate for the univariate models motivates us to utilize clinical indicators to investigate their ability to improve the accuracy of forecasts. Results show that lasso regression and LSTM networks outperform the univariate methods when a limited amount of data are available. Moreover, since they include clinical predictors in the forecasting process, their results can aid in building a robust decision making and blood utilization system. On the other hand, when there is access to a sufficient amount of data, a simple univariate model such as ARIMA can work as well as the more complex methods.
Future extensions of this work will include: (i) proposing an optimal ordering policy based on the predicted demand over a planning horizon with ordering cost, wastage cost and shortage (same-day order) cost; (ii) further exploring the lasso regression approach to enhance variable selection, with a particular focus on interpretability (this not only will affect the lasso model itself, but also may improve LSTM forecasting accuracy since LSTM inputs are selected with the lasso regression); (iii) explore the generality of the results (outside of Hamilton).
This study was funded by the NSERC Discovery Grant program and Mitacs through the Accelerate Industrial Postdoc program (Grant Number: IT3639) in collaboration with Canadian Blood Services. The funding support from Canadian Blood Services was through the Blood Efficiency Accelerator program, funded by the federal government (Health Canada) and the provincial and territorial ministries of health. The views herein do not necessarily reflect the views of Canadian Blood Services or the federal, provincial, or territorial governments of Canada.
The authors thank Tom Courtney, Rick Trifunov, Dr. John Blake, and Masoud Nasari for providing information about blood collection, blood processing, and blood distribution at Canadian Blood Services. All final decisions regarding manuscript content were made by the authors.