A unified machine learning approach to time series forecasting applied to demand at emergency departments

07/13/2020 ∙ by Michaela A C Vollmer, et al. ∙ Imperial College London 198

There were 25.6 million attendances at Emergency Departments (EDs) in England in 2019 corresponding to an increase of 12 million attendances over the past ten years. The steadily rising demand at EDs creates a constant challenge to provide adequate quality of care while maintaining standards and productivity. Managing hospital demand effectively requires an adequate knowledge of the future rate of admission. Using 8 years of electronic admissions data from two major acute care hospitals in London, we develop a novel ensemble methodology that combines the outcomes of the best performing time series and machine learning approaches in order to make highly accurate forecasts of demand, 1, 3 and 7 days in the future. Both hospitals face an average daily demand of 208 and 106 attendances respectively and experience considerable volatility around this mean. However, our approach is able to predict attendances at these emergency departments one day in advance up to a mean absolute error of +/- 14 and +/- 10 patients corresponding to a mean absolute percentage error of 6.8 and 8.6 more traditional linear models. We find that linear models often outperform machine learning methods and that the quality of our predictions for any of the forecasting horizons of 1, 3 or 7 days are comparable as measured in MAE. In addition to comparing and combining state-of-the-art forecasting methods to predict hospital demand, we consider two different hyperparameter tuning methods, enabling a faster deployment of our models without compromising performance. We believe our framework can readily be used to forecast a wide range of policy relevant indicators.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

fuggione

colorazione video in bianco e nero

Authors

page 3

page 4

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Background

In 2019 there were million attendances at emergency departments (EDs) in the UK, corresponding to patients attending every day [3]. The National Health Service (NHS) trusts across England are under very high pressure to maintain current standards and quality of care [29]. In fact, the rate of attendance has grown by % since 2018 and by % over the last years meaning that it is increasing at a rate faster than population growth thus putting high pressure on our health care system. Failure to make provisions for surges in demand can lead to overcrowding, which in turn has been linked to multiple adverse patient outcomes such as unfavourable patient satisfaction, poor quality of care and diseconomies of scale [27]. In order to decrease overcrowding, the NHS introduced a new operational standard in 2010, commonly-known as the “four hour target”, requiring that at least % of patients attending EDs should either be discharged, admitted or transferred within hours of arrival. However, this target has not been met since 2014 and failure rates have reached a new high in January 2020 with % of patients at EDs waiting longer than four hours despite the fact that the overall number of attendances was lower in January 2020 than in January 2019. In fact % of patients spent more than hours in EDs in 2019 compared to % last year and % five years ago, making 2019 the year with the worst annual performance on record [3]. A shortage of staff is not the predominant cause for long waiting times and low quality of care [31], rather, it is that the correct type of staffing is not matched with patient demand. This inefficiency in staffing can have substantial impacts such as the of elective surgeries that had to be cancelled for non-clinical reasons on the day the patient was due to arrive in 2019. These cancellations leave NHS hospital trusts with lost costs of surgeons, anaesthetists and nurses as well as surgical session time and theatre capacity. Moreover, in the same year, the percentage of patients who had not been treated within days of cancellation decreased from % in 2018 to % in 2019, however, still failing one of the NHS’s improvement objectives. In addition the NHS England advises that a % bed occupancy rate is the maximum safe level of occupancy and it advises that trusts should try and keep bed occupancy below %. However, out of trusts recorded bed occupancy above %, trusts rates above % in the third quarter of 2019, eight trusts had occupancy above % and one trust recorded % bed occupancy (NHS SitRep).

Overall the NHS in England has spent around £ billion in 2018/19 on the delivery of health services [13]. A major fraction, % in the financial year 2016-2017 [9], of this spending is due to the million people employed by NHS hospital and community health services. Between February 2018 and 2019 the number of doctors alone rose by % and by % during the past five years [23]. This emphasises the fact that long waiting times may not simply result from a shortage of staff. Nonetheless, every year high agency and staffing costs are required to cover for staffing shortages. A key step to addressing the issue of staffing is the prediction of the rates of admission and the duration of stay at EDs (collectively referred to as “demand”). In particular, an adequate prediction of demand and a better understanding of the reasons for demand is fundamental to the delivery of high quality care.

1.1 Previous work

Despite its importance, methods for predicting rates of admission and understanding underlying dynamics have not been studied extensively in the literature. Existing methods have been limited to the application of classical time series forecasting methods. In an Australian study, Boyle et al [6] predict monthly, daily and four hourly demand at EDs in Queensland. Taking public holidays into account as predictors, the authors used an autoregressive integrated moving average (ARIMA) model as well as regression and exponential smoothing methods to predict demand up to a mean absolute percentage error (MAPE) of % for daily admissions. McCarthy et al [28] predicted hourly presentations to American emergency departments while including several temporal, weather and patient factors on the number of hourly arrivals in order to characterise behaviour at EDs. Jones et al [24]

used seasonal autoregressive integrated moving average (SARIMA), time series regression, exponential smoothing and artificial neural network models to forecast demand. The authors had access to two years of data and made predictions ranging from

to days in advance for three hospitals in the USA. The authors found seasonal and weekly patterns to be most important for an accurate prediction and obtained a Mean Absolute Predictive Error (MAPE) of % depending on the facility. Champion et al [12] performed an analysis of monthly demand at an emergency department in regional Victoria. They applied exponential smoothing and Box-Jenkins methods to five years worth of admissions data. Hoot et al [21] carried out a discrete event simulation in order to obtain forecasts for , , and hours into the future. The authors analysed waiting times, length of stay and bed occupancy.

1.2 Our contribution

We develop a novel, predictive framework to understand the temporal dynamics of hospital demand and we apply an exhaustive statistical analysis to daily presentations at EDs at St Mary’s and Charing Cross Hospitals, evaluating a range of standard time series and machine learning approaches and ultimately developing our own unique approach. In contrast to existing studies, we do not only focus on the application of time series algorithms in order to characterise demand but develop a generic procedure that allows us to compare and combine both time series and machine learning algorithms in order to obtain an informative, more appropriate and consistently accurate approach to the prediction of demand. Our models have the ability to be retrained regularly and efficiently and are therefore a powerful tool for online platforms and near real time prediction. Using novel data from electronic logging systems from eight years of daily presentations to EDs at St Mary’s and Charing Cross Hospitals in London, we construct a model that predicts the number of daily arrivals to both hospitals. Our analysis accounts for seasonal fluctuations, daily observed weather data and specific, pre-planned events indicated by staff at both EDs such as the yearly Notting Hill Carnival. Using our procedure can help the hospitals with the provision of the right staffing numbers and deploying resources in the most effective way.

2 Data sources

Figure 1: Attendances at EDs per day for 2011 to 2019. The horizontal lines are the average demands.

The St Mary’s and Charing Cross Hospitals are part of the Imperial College Healthcare NHS Trust, one of NHS hospital trusts in England. St Mary’s Hospital is the major acute care hospital for North West London housing a major trauma centre. Its ED has faced an average demand of patients (with a maximum of and a minimum of ) every day since 2011. Charing Cross Hospital includes the serious injuries centre for West London as well as a hyper acute stroke unit. On average there have been (with a maximum of and a minimum of ) daily attendances at the ED since 2011. For our analysis, we had access to electronic data records of the number of daily attendances at the EDs for both hospitals from 2011 to 2018 (see Figure 1). In order to investigate the demand dynamics, we also collected data on school [30] and bank holidays [20], as well as on the weather and Google search volume for the word “flu” [2, 19] (see the Appendix for more details). Finally, experienced staff at the EDs of both hospitals provided us with a list of specific known events in the locality that cause surge in demand (e.g. the Notting Hill carnival - an annual festival taking place in the catchment area of the hospital).

3 Exploratory data analysis

Figure 2: Time series decomposition’s of attendances at both EDs for 2011 to 2019. Left St Mary’s Hospital and right Charing Cross Hospital. The first row of plots is the data, the second the trend, the third seasonality and the fourth random residuals.

A central aim of this paper is not only to predict hospital demand accurately but also attempt to understand the factors driving hospital demand. Daily data is driven by a complex web of exogenous variables, many of which are related to seasonal patterns or trends. Both time series depicted in Figure 1 show a strong underlying trend. In the case of Charing Cross Hospital this trend is clearly upwards while it goes downwards first for St Mary’s Hospital due to a change of the hospital’s infrastructure, see Figure 2. There is also clear seasonality, the monthly attendance at St Mary’s Hospital shows a very clear monthly periodic pattern with troughs in January, April and August and a rise in attendance during the winter months (likely due to increases in acute respiratory infections), see Figure 3.

Figure 3: Monthly attendance at St Mary’s Hospital, 2011-2018.
Figure 4: Monthly attendance at Charing Cross Hospital, 2011-2018.

Figure 3 also shows indications that bank or school holidays have a strong influence on the number of ED admissions together with the flu season. It also shows that the flu seasons contribution to increased demand runs well into spring. While the monthly attendance at Charing Cross Hospital also shows some periodic behaviour, it is not as strong, see Figure 4. It is therefore useful to note that dynamics differ even from geographically close hospitals with overlapping catchments. Both series show clear day-of-week patterns, characterised by a strong autocorrelation with respect to their lagged values of order , see Figure 5. Mondays have the highest volume of attendances at both hospitals while attendance reaches its minimum during weekends. This finding validates and confirms other studies on hospital demand [15, 14].

Figure 5: Day of the week effect for ED attendances at both hospitals 2011-2019.

4 Methods and algorithms

We focus on forecasting demand one, three, and seven days into the future. These particular forecasting intervals are relevant as they allow the hospitals to take action by using short term measures such as the cancellation of elective surgeries or the hiring of additional staff through agencies.

We use two different kinds of algorithms for our predictions: traditional time series and machine learning algorithms [26]. A discrete time series is a sequence of data points in chronological order divided into regular time intervals. The fundamental assumption behind both algorithms is that data points that are close to each other in time show a similar behaviour and that there is a dependency between data points at the same position of the time interval, e.g. same time of the year or same day of the week. Time series algorithms use both the chronology of the events and the specified interval in order to make inference and split the time series into different linear components such as seasonality, trend and a residual. The residual, which is assumed to contain some correlative structure, is usually modelled using an autoregressive stochastic process or exponential decay where future values are predicted based on past values [26]. In contrast, machine learning algorithms specify a broad function class (such as trees or smooth curves) with sufficient capacity to learn complex functions. These algorithms learn from data balancing function complexity with predictive accuracy. For both sets of algorithms we create a model (design) matrix containing explanatory variables. For the time series algorithms only lagged demand from previous time points were used. For the machine learning algorithm lagged demand values were used alongside other covariates (see Table 1). As predictors we have chosen past values such as demand on the previous day, last week and the average of the past week as well as indicators for bank holidays and school holidays. Moreover, we use data from some of the surrounding weather stations on precipitation, minimal and maximal temperature as covariates. Finally, we use search engine query data as a covariate, as it has proven to be a very efficient measure for the detection of influenza epidemics [18]. Table 1 shows a few rows of our model matrix.

Date 2014-04-10 2014-04-11 2014-04-12 2014-04-13
demand
month April April April April
yesterday
same day last week
average of previous week
time
bank holiday FALSE FALSE FALSE FALSE
school holiday FALSE FALSE TRUE TRUE
day of week Thursday Friday Saturday Sunday
precipitation on previous day
max temperature on previous day
min temperature on previous day
flu hits on google on previous day
Notting Hill carnival FALSE FALSE FALSE FALSE
Christmas FALSE FALSE FALSE FALSE
Table 1: Excerpt from the model matrix corresponding to St Mary’s Hospital. Explanatory covariates for the machine learning algorithms are listed.

Below, we summarise the time series and machine learning algorithms that we have considered (for detailed mathematical information we refer the reader to [22]:

4.1 Time series algorithms

  1. ARIMA - AutoRegressive Integrated Moving Average [22]
    An ARIMA model consists of three parts: The autoregressive component (AR) referring to the fact that the indicator of interest is regressed on its own previous values (i.e. current values of demand depend on past values of demand), the integrated (I) part representing one or several differencing steps to make the time series stationary and the moving average (MA) component indicating that the regression error is a linear combination of past error terms.

  2. ETS - Exponential smoothing methods [22] In general exponential smoothing refers to forecasting methods which also regress on lagged values of the target variable. However, it uses exponentially decreasing weights for past observations. The ETS model we employ uses exponential smoothing for error, trend and seasonality.

  3. STLM - a seasonal decomposition of time series by LOESS (STL)a seasonal decomposition of time series by LOESS (STL) STLM is another type of exponential smoothing model. The time series is decomposed into its seasonal components using LOESS (locally estimated scatterplot smoothing) before exponential smoothing is used to model the error and trend component of the time series. Finally the series is re-seasonalized.

  4. StructTS - Structural Time Series Model [22] A StructTS model is formulated directly in terms of unobserved components, such as trends, seasonality and exogenous factors that have a natural interpretation. The StructTS model forms a state space which makes it similar to the ARIMA and ETS models.

4.2 Machine learning algorithms

  1. glmnet [17] is a penalised generalized linear model with built-in variable selection. The glmnet model is an extension of the generalised linear model in which bias (penalty/weight decay/regulariser) is introduced in the form of a mixture penalty consisting of the parameter and

    norms. The magnitude of this penalty is tuned to balance overfitting vs. underfitting, with the goal of reducing variance by introducing some bias.

  2. ranger [33]

    is a fast implementation of random forests

    [8]

    . Random forests are a bagged ensemble of decision trees. Random forests use bootstrapped random subsets of the covariates variables to build decision trees based on these subsets. Random forests prevent overfitting by averaging to reduce variance.

  3. Gradient Boosting Machines (gbm) [16] are a generalized boosted regression model. In contrast to random forests which builds a collection of multiple independent decision trees, the decision trees created by gradient boosting machines depend additively on each other. Each new tree is added to an ensemble by improving the previous trees residual error (the functional negative gradient). Hyperparameters are tuned to balance over and under fitting.

  4. k-nearest neighbours (k-NN) [5]

    is a classic supervised learning algorithm based on the idea that data points that are close to each other in the space of covariates should have similar predictions. Hence, to make a prediction at a given location in covariate space, an average of the labels of the k nearest neighbours is taken. The disadvantage of this algorithm is that it slows down quickly once the volume of data points increases.

4.3 Stacked Regression

All the above algorithms have strengths and weaknesses. It is therefore challenging to choose a single model for prediction. We therefore create a consensus model by adopting stacked regression, a particularly effective ensemble approach. The idea of stacked regression is to combine the diversity and strengths of multiple algorithms into a single model with a better overall ability to generalise. We choose a linear stacking model subject to convex combination constraints [7]. Following [7, 32] we train a linear stacker on cross validation predictions of the individual time series and machine learning models. This procedure helps us avoid selecting models that overfit to the data. Stacking models is not only empirically motivated, but has a strong theoretical backing and has been proven to perform perform asymptotically exactly as well (by some loss metric) as the best possible choice for the given data set among the family of weighted combinations of the estimators. Stacking has also been showing to work in a variety of settings [4].

5 Validation and evaluation

Developing an algorithm with the best possible forecast accuracy is the main goal of this study. However, care must be taken to ensure that these forecasts are not simply overfitting to noise in the data but are accurate and can truly forecast to unseen data. A key innovation of this paper is the development of a novel general purpose time series cross validation procedure to ensure that:

  1. [label=()]

  2. all algorithms are evaluated fairly and equally,

  3. the same data is used in all algorithms, and

  4. forecast errors are completely blind to the held-out data (i.e. exactly as if the model was being used in a real forecasting setting).

Temporal or time series cross-validation [22] is a method to split the data into testing and training sets in order to account for temporal structure in the data. The main idea is that each test set only consists of a forecasting window of one day which lies one, three or seven days in the future while the corresponding training set consists of a number of observations prior to the forecasting window. The origin can either be fixed so that the length of the training window grows by one, three, or seven days for each new test set or it can move forward so that the training window size remains fixed. We employ the latter method.

Using temporal cross-validation for time series algorithms only requires splitting the data into a training and test set. Adapting the data so that the last days ( months) of data are held out for testing, days (roughly six years) were available for training which corresponds to a %/% split. We then applied temporal cross validation to each day in the test set (see Figure 6) using a rolling window so that all training sets consist of the same number of days.

Given the fairness and robustness of our cross validation scheme, we are confident that our results are robust to data shift and not only valid for the data times we have collected.

Figure 6: Splitting the data into training, (one day) validation and test set.

5.1 Validation of hyperparameters

In order to allow for a fair comparison of the machine learning algorithms, the method for temporal cross-validation has to be adapted as most machine learning algorithms require tuning of their hyperparameters to balance over and under fitting. Therefore we also have to split the training data into a training and a validation set in order to choose the best set of hyperparameters which minimizes the error on the validation set. All machine learning algorithms listed in Section 4.2 were trained on years worth of data and their hyperparameters were compared on a validation set consisting of days. In our analysis we have used two different approaches to split the training set and to choose the hyperparameters. We will call these the batch and the online method.

5.1.1 The batch method

In order to choose the set of hyperparameters which minimizes the error on the validation set, we consider five different approaches. We choose the set of hyperparameters which minimize the error

  1. on the previous day,

  2. on the past days,

  3. over the whole validation period using an exponential moving average,

  4. averaged over the whole validation period, or

  5. according to caret’s built in rules (see [25]).

For each of the five cases above, we choose the best set of hyperparameters for each day of the test set and refitted all models on a daily basis. Of course refitting every model for each day of the test set is computationally expensive. Therefore we develop an online method, described below.

5.1.2 The online method

In the batch method described above, all models are refitted daily, which takes significant computational power to run and might not be feasible for a deployed version of our methods in a hospital setting. Thus, we explore whether keeping the parameters fixed for longer periods, which yields significant savings in terms of computation, hurts performance. We refit each model over several testing periods of various sizes which are subsets of the test set of length days with a rolling origin. Our chosen testing periods are day, days, days, days, days and days long. That means that we validated the best set of hyperparameters for each algorithm for every testing period, and then rolled forwards. The final error rates are the result of the overall error on all predictions on the whole test set of length .

Figure 7: Splitting the data set into training, validation and test set in case of the online method.

6 Results

We compare the mean absolute error (MAE) and mean absolute percentage error (MAPE) rates for all time series algorithms as well as for the batch and the online method as outlined in Section 5.1.1 and 5.1.2. Finally, we compare our results with the stacked regression.

St Mary’s Hospital: Charing Cross Hospital:
algorithm days MAE MAPE (in%) MAE MAPE (in %)
ARIMA
ETS
StructTS
STLM
Table 2: Error rates for all time series algorithms for St Mary’s and Charing Cross Hospital.

6.1 St Mary’s Hospital

The MAE of the time series algorithms range from to as shown in Table 2, with an MAPE ranging from from to . Hence, although the time series algorithms are simpler and only based on the time series without using any other predictors, such as weather, bank or school holidays (see Table 1 for details on all predictors), they already give relatively accurate results. In case of the machine learning algorithms, independent of the type of hyperparameter tuning we use, most of the results for the batch methods range between an MAE of to and an MAPE of , as shown in Table 3. Only the k-nearest neighbour algorithm performs worse despite different ways of tuning. Especially when using the online method, see Table 5, the linear models produce the best results for St Mary’s hospital with an MAPE of in the case of daily retraining of the generalized linear model.

Choosing the best set of hyperparameters based on:
algorithm days yesterday
the past
days
exponential
moving average
the average over the
whole training set
caret
lm
gbm
glmnet
knn
rf
Table 3: MAE rates for all types of hyperparameter tuning predicting hospital demand , or days in advance in case of the batch method for St Mary’s hospital.
Choosing the best set of hyperparameters based on:
algorithm days yesterday
the past
days
an exponential
moving average
the average over the
whole training set
caret
lm
gbm
glmnet
knn
rf
Table 4: MAE error rates for all types of hyperparameter tuning predicting hospital demand , or days in advance in case of the batch method for Charing Cross hospital.
algorithm period MAE MAPE
lm
glmnet
gbm
lm
lm
glmnet
glmnet
gbm
lm
glmnet
glmnet
lm
glmnet
gbm
gbm
gbm
rf
glmnet
lm
rf
rf
rf
rf
rf
knn
knn
knn
knn
knn
knn
(a) St Mary’s Hospital
algorithm period MAE MAPE
lm
lm
glmnet
lm
gbm
glmnet
glmnet
gbm
gbm
lm
glmnet
lm
glmnet
lm
glmnet
rf
gbm
gbm
rf
gbm
rf
rf
rf
rf
knn
knn
knn
knn
knn
knn
(b) Charing Cross Hospital
Table 5: Results for the online method for both hospitals.

6.2 Charing Cross Hospital

The results for Charing Cross Hospital are similar although the error rates are slightly increased. As shown in Table 2, most time series algorithms yield MAE error rates between 10.5% and 14.5% while the corresponding MAPE error rates range from 8.5% to 12%. The best performance reached using the batch method is also around 10.5% although the overall performance is a little bit better, see Table 4. In case of the online method, the MAPE error rates are as low as for gradient boosting machine, the generalized linear model or a simple linear model retrained on a daily basis, as shown in Table 5.

6.3 Stacked Regression

In order to make use of the strengths of all algorithms, we applied a generalised linear model as well as a penalized regression to create an effective ensemble approach, see Figure 6 for details. The best performance was achieved using penalized regression with an MAE error rate of for St Mary’s Hospital and for Charing Cross Hospital.

6.4 Interpretation of results

In our analysis we consider a variety of predictors ranging from past values of ED attendance, to the weather forecast and to school and bank holidays. The question left to answer is therefore which of the predictors are actually important for making accurate predictions? Could some of them actually be redundant?

In machine learning, variable importance can be defined as the dependence between input and output variables and computed by permuting the values of a given predictor and calculating error on a held out set. This measure has drawbacks in the case of multicollinearity, which could suppress the importance of certain variables in some models. As shown in Figure  8

, the most important variable predicting demand is the average demand the week before - except for glmnet - followed by specific days and months. The importance of these variables varies considerably between algorithms but largely is consistent with one another. To our knowledge, this is the first attempt to quantify the importance of variables in predicting demand. We believe that these importance percentages can be utilised as heuristics to help staff improve their prediction of demand in the absence of statistical analysis.

(a) lm
(b) glmnet
(c) gbm
(d) rf
Figure 8: Variable importance for the machine learning algorithms.

7 Conclusion

The results of our analysis highlight interesting statistical properties: simple linear methods like generalized linear models are often better or at least as good as ensemble learning methods like the gradient boosting or random forest algorithm. However, though sophisticated machine learning methods are not necessarily better than linear models, they improve the diversity of model predictions so that stacked predictions are more robust and accurate than any single model including the best performing one. This largely confirms a ’wisdom of the crowd’ rule for ensemble learning.

The online method we have created has the ability to provide accurate results with a very quick turnaround time: an average model only takes a couple of minutes to train and forecast. Running it over longer periods of time without retraining of the model is much faster and at the same time not significantly worse than tuning the hyperparameters on a daily basis.

The framework, analysis and methodology proposed in this paper are highly relevant from an operational viewpoint. To the authors knowledge, the majority of EDs in the UK do try to account for demand, but do so using ad hoc heuristics. Our approach provides some scientifically backed information to improve these heuristics, but more importantly provides a framework that is quick and easy to implement. Estimates of , and day demand forecasts can be created at any time and updated easily allowing statistically backed estimates to be used to inform hospital policy and practice. Our hope is this paper will fill a knowledge gap and increase hospitals uptake in the use of these methods. Given epidemics and disease outbreaks that can strain health systems, our approach can provide added precision to help EDs operate as efficiently as possible. The challenge will then be for teams in hospitals to implement more insight-driven ways of working, more flexible approaches to rostering staff and innovative ways of communicating with their local communities.

In an era of precision medicine, future work will undoubtedly focus on granular data sets including individual level patient data with diagnoses and testing information. This will not only help to improve performance of the predictions but also help to understand dependencies between seasonality, events and reasons for ED presentations. This may come at the cost of longer running times but only an in depth analysis of patient’s pathways, the rate of admission for different diagnoses and their corresponding bed occupancy will help improve hospital planning substantially.

8 Acknowledgements

We acknowledge the NIHR Imperial BRC, joint Centre funding from the UK Medical Research Council and Department for International Development. Grant reference: MR/R015600/1. The authors would like to thank Claire Hook (Director of Operational Performance) for helping understand the operational impact of this project. GC is supported in part by an NIHR research professorship. CC is funded by a National Institute for Health Research (NIHR) Career Development Fellowship (NIHR-CDF-2016-09-015) and NIHR North West London Applied Research Collaborative funding (NIHR200180). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.

9 Competing interests

The authors declare that they have no competing interests.

10 Appendix

For weather data we used minimum and maximum daily temperatures as reported by the Met Office [10, 11] from the following weather station:

  • ID St. James’s Park (postcode SW1A 1), straight line distance: miles/ km to St Mary’s hospital and miles/ km to Charing Cross Hospital.

Data on the daily precipitation in London is publicly available [1].

The number of Google searches for “flu” on the google search engine is also publicly available and can be downloaded from Google trends [19]. Google provides data on the relative search volume and assigns a score between 1 and 100 to every time unit within the time frame specified by the user. However, the length of this time unit is specified by Google, so the data within the time frame may only show the weekly or monthly searches for ”flu”. Therefore we have to adjust the data by taking both the daily data, downloaded within a time frame of six months, and monthly data, downloaded over the past few years, into account [2].

References

  • [1] Environment Agency. Daily areal rainfall calculations, June 2019. URL: https://data.london.gov.uk/dataset/daily-areal-rainfall.
  • [2] Franz B. Google trends: How to acquire daily data for broad time frames, May 2019. URL: https://medium.com/@bewerunge.franz/google-trends-how-to-acquire-daily-data-for-broad-time-frames-b6c6dfe200e6.
  • [3] Carl Baker. Nhs key statistics, england, february 2020. (Number 7281), published February 2020.
  • [4] Samir Bhatt, Ewan Cameron, Seth R Flaxman, Daniel J Weiss, David L Smith, and Peter W Gething. Improved prediction accuracy for disease risk mapping using Gaussian Process stacked generalisation. Journal of the Royal Society, Interface, 14(134):20170520. URL: http://www.ncbi.nlm.nih.gov/pubmed/28931634, arXiv:1612.03278, doi:10.1098/rsif.2017.0520.
  • [5] Christopher M Bishop. Pattern Recognition and Machine Learning, volume 4. 2006. URL: http://www.library.wisc.edu/selectedtocs/bg0137.pdf, arXiv:0-387-31073-8, doi:10.1117/1.2819119.
  • [6] Justin Boyle, Melanie Jessup, Julia Crilly, David Green, James Lind, Marianne Wallis, Peter Miller, and Gerard Fitzgerald. Predicting emergency department admissions. Emerg Med J, 29(5):358–365, 2012.
  • [7] Leo Breiman. Stacked regressions. Machine learning, 24(1):49–64, 1996.
  • [8] Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001. doi:10.1023/A:1010933404324.
  • [9] Acute Care and Pensions & Employment Services /13710 Workforce/ Workforce/Pay. Title: The nhs pay review body (nhsprb) review for 2018: Written evidence from the department for health and social care for england. published January 2018.
  • [10] NCAS British Atmospheric Data Centre. Met office (2006): Midas: Uk daily rainfall data, 20 May 2019. URL: http://data.ceda.ac.uk/badc/ukmo-midas/data/RD/yearly_filesfrom.
  • [11] NCAS British Atmospheric Data Centre. Met office (2006): Midas: Uk daily temperature data, 20 May 2019. URL: http://data.ceda.ac.uk/badc/ukmo-midas/data/TD/yearly_files.
  • [12] Robert Champion, Leigh D Kinsman, Geraldine A Lee, Kevin A Masman, Elizabeth A May, Terence M Mills, Michael D Taylor, Paulett R Thomas, and Ruth J Williams. Forecasting emergency department presentations. Australian Health Review, 31(1):83–90, 2007.
  • [13] NHS England. Reference costs 2017/18: highlights, analysis and introduction to the data. November 2018. Accessed on 04.07.2019. URL: https://improvement.nhs.uk/documents/1972/1_-_Reference_costs_201718.pdf.
  • [14] NHS England and NHS Digital. Hospital accident and emergency activity 2017-18. published 13 September 2018. Accessed on 15.04.2019. URL: https://files.digital.nhs.uk/D3/CCB4FE/AE1718_%20Annual%20Summary.pdf.
  • [15] NHS England and NHS Digital. Accident and emergency attendances in england - 2008-2009, experimental statistics. published 26 January 2010. Accessed on 15.04.2019. URL: https://digital.nhs.uk/data-and-information/publications/statistical/hospital-accident--emergency-activity/accident-and-emergency-attendances-in-england-2008-2009-experimental-statistics.
  • [16] J H Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, pages 1189–1232, 2001.
  • [17] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22, 2010. URL: http://www.jstatsoft.org/v33/i01/.
  • [18] Jeremy Ginsberg, Matthew H Mohebbi, Rajan S Patel, Lynnette Brammer, Mark S Smolinski, and Larry Brilliant. Detecting influenza epidemics using search engine query data. Nature, 457(7232):1012, 2009.
  • [19] google. Google trends, May 2019. URL: https://trends.google.com/trends.
  • [20] gov.uk. Bank holidays, May 2019. URL: https://www.gov.uk/bank-holidays.
  • [21] Nathan R Hoot, Larry J LeBlanc, Ian Jones, Scott R Levin, Chuan Zhou, Cynthia S Gadd, and Dominik Aronsky. Forecasting emergency department crowding: a discrete event simulation. Annals of emergency medicine, 52(2):116–125, 2008.
  • [22] Rob J Hyndman and George Athanasopoulos. Forecasting: principles and practice. OTexts, 2018.
  • [23] Anita Charlesworth Ian Seccombe James Buchan, Ben Gershlick. Falling short: the nhs workforce challenge, workforce profile and trends of the nhs in england. November 2019.
  • [24] Spencer S Jones, Alun Thomas, R Scott Evans, Shari J Welch, Peter J Haug, and Gregory L Snow. Forecasting daily patient volumes in the emergency department. Academic Emergency Medicine, 15(2):159–170, 2008.
  • [25] Max Kuhn. Building predictive models in R using the caret package. Journal of Statistical Software, 2008. doi:10.18637/jss.v028.i05.
  • [26] Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. The M4 Competition: 100,000 time series and 61 forecasting methods. International Journal of Forecasting, 36(1):54–74, 2020.
  • [27] Melissa L McCarthy. Overcrowding in emergency departments and adverse outcomes, 2011.
  • [28] Melissa L McCarthy, Scott L Zeger, Ru Ding, Dominik Aronsky, Nathan R Hoot, and Gabor D Kelen. The challenge of predicting demand for emergency department services. Academic Emergency Medicine, 15(4):337–346, 2008.
  • [29] House of Commons Library. Nhs key statistics: England, may 2019. Briefing Paper 7281, May 2019.
  • [30] City of Westminster. School term and holiday dates, May 2019. URL: https://www.westminster.gov.uk/school-term-and-holiday-dates.
  • [31] Kate Silvester, Richard Lendon, Helen Bevan, Richard Steyn, and Paul Walley. Reducing waiting times in the nhs: is lack of capacity the problem? Clinician in Management, 12(3), 2004.
  • [32] David H Wolpert. Stacked generalization. Neural networks, 5(2):241–259, 1992.
  • [33] Marvin N. Wright and Andreas Ziegler.

    ranger: A fast implementation of random forests for high dimensional data in C++ and R.

    Journal of Statistical Software, 77(1):1–17, 2017. doi:10.18637/jss.v077.i01.