1 Introduction
Italy has been, and partially still is, under pressure to properly manage the recent COVID19 epidemic emerged from China in December 2019. Its quick spread required a global response to prepare health systems worldwide. In its present form, COVID19 seems to have two very challenging characteristics (see also [1]): it is highly infectious and, despite having a benign course in the vast majority of patients, it requires hospital admission and even intensive care for a far from negligible proportion of infected.
In Italy, particularly in the two Regions of Lombardia and Veneto, the COVID19 infection emerged in February 2020 with a basic reproductive number between 3 and 4 [2]. At the beginning of the coronavirus outbreak, Italy was one of the countries with the lowest amount of acute care beds per person in Europe. The resources of the national health system were not designed to face a largescale epidemic. The national health system was equipped with a total number of approximately 5200 beds, that was substantially increased very quickly during the spread of the disease [3]. This reversed a trend that started years ago, especially after financial crises, according to which resources allocated to the national health system were progressively cut.
In general, ICUs are characterized by a rather low number of beds with high turnover. Most of patients stay in intensive care only for one or few days [4], even if some of them can stay months. However, ICU length of stay (LOS) due to COVID19 infection was rather long: a recent study [5] involving 1591 patients admitted in ICUs in Lombardia reported a median length of stay of 9 days (613 [95% CI]). Long LOS implied a slower turnover, and increased the risk of collapse of the national ICU system. Indeed, in the absence of measures to flatten the epidemic curve, the number of ICU beds available in Italy would have achieved very quickly 100% occupation, leaving several thousand patients short of the cures needed (see also [3] for a discussion).
It is now clear that measures like social distancing, use of masks, contact tracing, and a wide access to diagnostic testing coupled with isolation of positives can be very effective. Nevertheless, careful and reliable planning of resources can also aid substantially in controlling the consequences of the epidemics, and likely increase the likelihood of early diagnoses and better care. To respond to the looming threat of shortage of ICU beds, hospitals urgently need to establish and implement policies that more fairly allocate these scarce resources. If hospitals can plan in advance how many ICU beds shall be made available for the nearly following days, capacity can be increased (or decreased) to match the demand. This would avoid the ethical dilemma of severe triaging patients and not admitting those whose lives are not worth saving [6]. In this work we propose one statistical tool of this sort, which can be used to accurately forecast the ICU occupation for the next one to five days. Beds demand in intensive care depends on two factors: the number of COVID19 patients needing intensive care, and duration of their hospitalisation. Unfortunately, these data are not made available during the Italian outbreak. Public data include, anyway, daily ICU occupation by Region, which we use as a proof of concept.
Existing approaches to forecast ICU occupancy are mainly based on exponential models fitted to daily numbers of occupied critical care beds out of confirmed cases [7, 8] or on SIR (susceptible, infectious, recovered) models [9]. Both approaches are subject to certain limitations in our opinion. An obvious limitation of the former approach includes ascertainment bias due to the use of counts of confirmed cases, and inappropriate modeling assumptions (including for instance a Gaussian assumption for logcounts). A model of the SIR family may on the other hand be a very appropriate option. However, SIRbased models require the possibility to precisely estimate several characteristics of the epidemics which are still mostly unknown, and it is well known that changing even slightly the initial conditions can lead to very different results. This is even more difficult as the exposed population is only partially observed [10, 11, 12].
Our approach is based on optimally combining two forecasting methods. The first is based on Poisson mixed effects regression; the second one is a Regionspecific time series model for counts, taking into account timedependence over time. The count outcome is appropriately modeled as a Poisson conditionally on observed time trends and unobserved heterogeneity including dependence, as implied by random effects or by the autoregressive structure of the timeseries models, and the averaged predictions give an optimal balance between pooling information over different areas (which targets a low variance prediction) and adaptation at the specific area (which targets a low bias prediction).
The rest of the paper is as follows: in the next section we give a description of the available data. In Section 3 we describe our method for prediction of the nextday ICU occupancy. We mention here that we have validated our method during the outbreak, by repeatedly producing ICU predictions with the current data, and waiting for the official data for the next five days. An illustration of the performance of our approach is given in Section 4. Some concluding remarks are in Section 5.
An implementation of our approach for the Italian case is available within our Shiny app at: https://statgroup19.shinyapps.io/StatGroup19Eng.
2 Data
On January 31, 2020, the Italian Government declared the state of emergency for six months as a consequence of the health risk associated with COVID19 infection.
To inform citizens and make the collected data available, the Department of Civil Protection developed an interactive geographic dashboard (accessible at the addresses http://arcg.is/C1unv (desktop) and http : //arcg.is/081a51 (mobile)) and built a daily updated data githubrepository (CCBY4.0 license), with a large number of variables; among them the intensive care defined as number of hospitalized patients in Intensive Care Units (ICU) available at the Regional level (20 Italian Regions). In Figure 1 the time series of ICU admission from February 24 until April 16 , 2020 are reported and it appears that several sources of heterogeneity affect the data. The epidemic started at different times across Regions, starting from the North of the country and evolved very differently in each Region, leading to trajectories of different shapes. Population size is also very different across Italian Regions (Figure 2), which are also characterized by different economic and social structures. Moreover, a further heterogeneity source relates to the ICU capacity.
In Italy, the health system is Regionalbased, so in practice there are 20 different health systems and management choices, local authorities and local policy makers play a fundamental role in the definition of health services capacities, with very few national directives.
During the epidemic some of the northern Regions went beyond the capacity of the local system. Figure 3 reports the Regional rate of occupancy (occupancy over capacity^{1}^{1}1Notice that in figure 3 we use the capacity on April 16, 2020 that was augmented in the northern Regions to face the COVID19 emergency) during the epidemic. Regions are ordered geographically from North to South showing the northsouth gradient of the epidemic evolution mostly affecting the northern Regions. It is rather clear that the health systems of northern Regions were under pressure for a longer period and this reflects the difference in the spread of the epidemic. A Regional model that takes into account different sources of heterogeneity and the different timing in the spread of the epidemic is thus required.
3 Methods
In order to produce accurate forecasts we combined two methods, one based on joint modelling all Regions through a mixedeffects generalized linear regression model, and the other one based on separately modelling each Region as a nonstationary timeseries of counts, which uses an integervalued autoregression specification, with covariates. The first approach pools information over Regions, reducing variance of the final prediction; the second instead can be expected to have a lower bias. The final predictions are then averaged with weights chosen through a leavelastout method. Furthermore, in order to focus more on the recent trends, we always set
, that is, we use the last two weeks of data to make predictions and ignore previous measurements.3.1 Random effects modelling of longitudinal count data
We start assuming that the observed daily ICU admissions for Region at day ,
, are realizations of independent Poisson random variables
with parameter , . The interest is in modelling as a nonlinear function of time, taking into account unobserved heterogeneity and Regionspecific effects. A simple way to deal with these features is through a Generalised Linear Mixed Model (GLMM) [13] with linear predictor(1) 
where a canonical link has been adopted, the offset term accounts for different population exposures,
represents the vector of shared fixedeffects regression parameters, and
represents the random coefficients, i.e. the Regionspecific intercept and slopes, withThe observed counts are assumed independent given the threedimensional random vector . The likelihood function is obtained integrating out the in (2)
(2) 
As widely discussed in the literature, the integral in (2) has no analytical solution, and approximate methods should be used in order to estimate the parameters and . Here, a Laplace approximation has been used first of all to speed up computations. Adaptive Gaussian quadrature may represent a slightly more accurate option, at the expense of a higher computational burden. Other approaches can also be considered [14, 15].
Predictions are based on the posterior estimates of the random effects and the MLE of the fixedeffect parameters. Predictions intervals are found through nonparametric block bootstrap using 500 replicates. Block bootstrap involves resampling Regions, and once a Region is included its entire timeseries is used for model estimation of the resampled data. This is done to preserve the dependency structure of the data.
It is worth mentioning that the specific covariance structure among the random effects has been chosen on the grounds of model selection using BIC. In our data, the best covariance structure, which has been then used for all estimates and predictions, has turned out to be:
(3) 
3.2 Regionspecific integervalued autoregression modelling
At the second step, we fit and obtain predictions for Regional time series separately. In other words, 20 different models are fitted, as timeseries models for counts.
Let again be the daily number of ICU admissions and let denote an ()dimensional timevarying covariate vector, consisting of a polynomial specification of time; notice that this vector is Regionspecific, so different polinomial specifications can be selected for different Regions. We model
as a conditional Poisson distribution where the expectation
at time depends on both past counts and past covariates:(4) 
where the coefficients and represent the effecst of the expectation in the previous day and the number of ICU admissions in the previous day , respectively.
Note that the model defined by (4) belongs to the INGARCHX family [16, 17, 18]. An alternative approach would be to examine the loglinear INGARCH model of [19]. However, the linear model has significant advantages in terms of interpretation, since it allows for an additive decomposition of the expectation. Under the Poisson assumption, equations (4) corresponds to a GARCHtype model [20] for the conditional variance of the process; hence the name INGARCH has been used frequently in the literature (though there is some debate on this terminology, see e.g.[21]).
For each Region, we compare stationary, linear, quadratic and cubic trends (that is, ). We select the best model specification for each one separately, according to the Bayesian information criterion.
Parameters in equation (4) are estimated via conditional maximum quasilikelihood estimation, using the function tsglm in the tscount R
package. If the Poisson assumption holds true, then we obtain an ordinary ML estimator. Predictions and their confidence intervals are obtained as follows. The distribution of the 1stepahead prediction is not known analytically and it is thus approximated numerically by a parametric bootstrap procedure. Onestepahead prediction intervals can be straightforwardly obtained and are based on simulations of realizations from the fitted model; the approximated prediction intervals are obtained from the empirical
and quantiles of the boostrapbased predictions.3.3 Model averaging
Let denote the prediction obtained for the th Region ( for the Italian case) and time with the GLMM method, and the prediction obtained with the integer autoregressive method. The final prediction is
(5) 
for some . In order to estimate an optimal , we first repeat model estimation excluding for ; obtaining leavelastout predictions and ; and then we solve the optimization problem
It shall be here noted that for the first few days, when , we make the weighthomogeneity assumption for all . For reasons of simplicity final prediction intervals are obtained as the weighted average of the limits of prediction intervals for and . It is straightforward to use Jensen’s inequality to show that this conservatively guarantees the nominal level.
4 Results
The reliability and goodness of our approach can be assessed by checking the nextday performance as: (i) median absolute error over the twenty Italian Regions, (ii) mean relative error over the twenty Italian Regions, (iii) proportion of prediction intervals that do not contain the actually observed occupancy, (iv) proportion of observed occupacies above the upper limit of the prediction interval. For illustrative purposes we discuss in this section predictions related to the next day, for days from March, 17 to April, 27th.
The daily absolute error has a median of 4 beds, with first quartile 1 and third quartile 8. The daily relative error over the twenty Regions has first quartile 2%, median 5%, third quartile 12%. Its mean is 9.2%. For prediction intervals we used a nominal level of 99%. Out of the 840 intervals produced, 99.4% indeed contained the observed ICU occupation. Figure
4 summarizes the main results. We report a plot of the nextday performance for each day since March, 17. It can be seen that after the first two weeks, when data available were scarce and a unique weight was used, the absolute and relative errors decreased substantially. Furthermore, coverage of the prediction intervals was always very close to 100%, with only one occasion in which one Region reported an ICU occupation above the predicted upper limit. Overall, the approach is rather effective and needs just few data points to produce reasonable estimates.Just to corroborate the results, we provide an example of the forecasts shown daily at https://statgroup19.shinyapps.io/StatGroup19Eng. On April 9, we published the forecasts displayed in Table 1 on the web, along with 99% confidence intervals. On April 10, we checked how well the ensemble approach performs. Overall, the performance of the proposal is more than satisfactory. The observed values are all into the 99% confidence intervals, whose length is reasonable and ensure a good representation of the uncertainty surrounding the forecasts. The upper bound of the confidence interval can be used as the worst possible scenario for a specific day and resources should be allocated accordingly to guarantee optimal health services. To be fair, the confidence interval for Puglia is rather wide. However, this is not surprising, as patients were transferred from northern Regions to Puglia for a few days, and daily bursts in the time series of ICU admissions were observed.
Regions  Observed  Forecast  Confidence Interval () 

Abruzzo  53  57  48 – 91 
Basilicata  15  17  8 – 29 
Calabria  14  13  5 – 23 
Campania  90  87  60 – 107 
EmiliaRomagna  349  347  301 – 397 
Friuli Venezia Giulia  33  34  19 – 48 
Lazio  201  197  155 – 225 
Liguria  151  142  111 – 173 
Lombardia  1202  1202  1053 – 1305 
Marche  127  120  94 – 151 
Molise  4  4  0 – 11 
Piemonte  394  392  348 – 451 
Puglia  80  79  41 – 159 
Sardegna  26  24  12 – 38 
Sicilia  62  62  51 – 94 
Toscana  256  240  203 – 283 
Trentino Alto Adige  128  128  100 – 157 
Umbria  39  37  28 – 63 
Valle d’Aosta  16  18  9 – 32 
Veneto  257  259  216 – 299 
In the following plot (Figure 5), we provide a focus on some Regions, those whose health systems were under pressure for the longest time. The prediction intervals were slightly wide but reasonable, for instance for Lombardia, the most severely affected Region, the median length was 20% of the predicted occupancy. Median difference between upper limit of the prediction interval and total available beds in the Region is 4.6% in Veneto. We also report data and estimates for Piemonte. This is because the news overlooked the situation in Piemonte, mainly focusing on Lombardia and Veneto. However, while Veneto was more effective at containing the epidemic, the health system in Piemonte was (and partially still is) under pressure, with a decay in the ICU occupancy much slower than the other Regions.
5 Conclusions
We estimated the occupancy of ICU beds at the Regional level in Italy during the Covid19 outbreak. The resulting estimates are obtained as a combination of two approaches, which address different data features, i.e. heterogeneity and timedependence, merged via ensambling. The proposed approach is datadriven, no simulated scenarios are required, and it is based on rather simple modelling assumptions. We have discussed in this paper only oneday ahead predictions, but our approach can also be easily extended to threedays and fivedays horizons.
From the beginning of this emergency in Italy, the ICU bed capacity was identified as a major bottleneck, to be monitored to avoid the increase in the fatality rate. Accordingly, strong efforts have been dedicated to this issue from government and planners. Our approach was able to predict the demand of ICU beds since the very beginning, showing an improvement in its behavior as long as more data were available. These information and predictions were shown and freely available on a daily basis on the StatGroup19 Facebook page (https://www.facebook.com/StatGroup19100907671547894). A correct communication of the evolution of the epidemic under all its variables was crucial to properly inform the general public and avoid any unmotivated concern.
As a byproduct conclusion drawn from this analysis, we noticed that the Regional health systems, though under pressure, were able to properly satisfy the demand for critical care. This might be related to the restrictions imposed by the government, which effectively reduced the spread of the Covid19 in Regions with a low number of ICU beds.
References
 [1] Peeri, N.C., Shrestha,N., Rahman, S., Zaki, R., Tan, Z., Bibi, S., Baghbanzadeh, M., Aghamohammadi, N., Zhang, W and Haque, U. (2020). The SARS, MERS and novel coronavirus (COVID19) epidemics, the newest and biggest global health threats: what lessons have we learned? International Journal of Epidemiology dyaa033, https://doi.org/10.1093/ije/dyaa033.
 [2] Flaxman, S. , Mishra, S., Gandy, A., Unwin, H. J. T., Coupland, H., Mellan, T. A., Zhu, H., Berah, T., Eaton, J. W., Guzman, P. N. P., Schmit, N., Callizo, L., Imperial College COVID19 Response Team, Whittaker, C., Winskill, P., Xi, X., Ghani, A., Donnelly, C. A., Riley, S., Okell, L. C., Vollmer, M. A. C., Ferguson, N. M., and Bhatt, S. (2020). Estimating the number of infections and the impact of nonpharmaceutical interventions on COVID19 in European countries: technical description update. arXiv:2004.11342v1 [stat.AP].
 [3] Remuzzi, A. and Remuzzi G. (2020). COVID19 and Italy: what next? Lancet, 395, 1225–1228.
 [4] Diekmann, O., Heesterbeek, H. and Britton, T. (2013). Mathematical Tools for Understanding Infectious Disease Dynamics. Princeton University Press, Princeton.
 [5] Grasselli, G., Zangrillo, A., Zanella, A., Antonelli, M., Cabrini, L., Castelli, A., Cereda, D., Coluccello, A., Foti, G., Fumagalli, R., Iotti, G., Latronico, N., Lorini, L., Merler, S., Natalini, G., Piatti, A., Ranieri, M. V., Scandroglio, A. M., Storti, E., Cecconi, M., Pesenti, A., for the COVID19 Lombardy ICU Network (2020). Baseline Characteristics and Outcomes of 1591 Patients Infected With SARSCoV2 Admitted to ICUs of the Lombardy Region, Italy. JAMA, 323(16): 1574–1581. doi:10.1001/jama.2020.5394.
 [6] White, D.B. and Lo, B. (2020). A framework for rationing ventilators and critical care beds during the COVID19 pandemic. JAMA. Published online March 27, 2020. doi:10.1001/jama.2020.5046
 [7] Grasselli, G., Pesenti, A., Cecconi, M. (2020) Critical Care Utilization for the COVID19 Outbreak in Lombardy, Italy: Early Experience and Forecast During an Emergency Response. JAMA, 323: 1545–1546.
 [8] Sebastiani, G., Massa, M. and Riboli, E. Covid19 epidemic in Italy: evolution, projections and impact of government measures. European Journal of Epidemiology 35: 341–345.
 [9] Giordano, G., Blanchini, F., Bruno, R., Colaneri, P., Di Filippo, A., Di Matteo, A. and Colaneri, M. (2020). Modelling the COVID19 epidemic and implementation of populationwide interventions in Italy. Nature Medicine, to appear,DOI:10.1038/s4159102008837.
 [10] Böhning, D., Rocchetti, I., Maruotti, A. and Holling, H. (2020). Estimating the undetected infections in the Covid19 outbreak by harnessing capturerecapture methods. medRxiv 2020.04.20.20072629
 [11] Yue, M., Clapham, H. E., Cook, A. R. (2020). Estimating the Size of a COVID19 Epidemic from Surveillance Systems. Epidemiology, April 10, 2020  Volume Publish Ahead of Print  Issue  doi: 10.1097/EDE.0000000000001202
 [12] Chen, Y.C., Lu, P.E., Chang, C.S. (2020) A Timedependent SIR model for COVID19. arXiv preprint arXiv:2003.00122, 2020.
 [13] Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88, 9–25.
 [14] Zhang, H., Lu, N., Feng, C., Thurston, S. W., Xia, Y., Zhu, L. and Tu, X. M. (2011). On fitting generalized linear mixedeffects models for binary responses using different statistical packages. Statistics in Medicine, 30, 2562–2572.

[15]
Kim, Y., Choi, Y.K. and Emery, S. (2013). Logistic regression with multiple random effects: a simulation study of estimation methods and statistical packages.
The American Statistician, 67, 171–182.  [16] Fokianos, K., Rahbek, A. and Tjøstheim, D. (2009). Poisson autoregression. Journal of the American Statistical Association, 104, 1430–1439.
 [17] Agosto, A., Cavaliere, G., Kristensen, D. and Rahbek, A. (2016). Modeling corporate defaults: Poisson autoregressions with exogenous covariates (parx). Journal of Empirical Finance 38, 640–663.
 [18] Chen, C. W. and Lee, S. (2016). Generalized poisson autoregressive models for time series of counts. Computational Statistics & Data Analysis, 99, 51–67.
 [19] Fokianos, K. (2011). Some Recent Progress in Count Time Series. Statistics, 45, 49–58.
 [20] Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31, 307–327.
 [21] Tjøstheim, D. (2012). Some recent theory for autoregressive count time series. Test, 21, 413–438.
Comments
There are no comments yet.