Multivariate endemic-epidemic models with higher-order lags and an application to outbreak detection

by   Johannes Bracher, et al.
Universität Zürich

Multivariate time series models are an important tool for the analysis of routine public health surveillance data. We extend the endemic-epidemic framework, a class of multivariate autoregressive models for infectious disease counts, by including higher-order lags. Several parsimonious parameterizations motivated from serial interval distributions are explored. The added flexibility considerably improves the model fit and forecast performance in a case study on weekly norovirus and rotavirus counts in twelve districts of Berlin. New methodology based on periodically stationary moments is proposed and applied to investigate the goodness-of-fit of the different models. This motivates a novel approach to prospective multivariate outbreak detection as illustrated in a reanalysis of our case study.



There are no comments yet.


page 1

page 2

page 3

page 4


Forecasting Based on Surveillance Data

Forecasting the future course of epidemics has always been one of the ma...

An integer-valued time series model for multivariate surveillance

In recent days different types of surveillance data are becoming availab...

A marginal moment matching approach for fitting endemic-epidemic models to underreported disease surveillance counts

Count data are often subject to underreporting, especially in infectious...

A zero-inflated endemic-epidemic model with an application to measles time series in Germany

Count data with excessive zeros are often encountered when modelling inf...

Eliciting Disease Data from Wikipedia Articles

Traditional disease surveillance systems suffer from several disadvantag...

Number of Sign Changes: Segment of AR(1)

Let X_t denote a stationary first-order autoregressive process. Consider...

Tracking change-points in multivariate extremes

In this paper we devise a statistical method for tracking and modeling c...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Routine surveillance of seasonal infectious diseases produces multivariate time series of counts with characteristic features including seasonality, overdispersion and substantial auto- and cross-correlations. Models for such data can serve various purposes. They can help to understand mechanisms of spread, to predict future incidence or to inform outbreak detection systems. As Keeling and Rohani [1] point out, the different goals are often in conflict. While prediction requires high accuracy, transparency of models is more important to better understand disease spread. This trade-off has given rise to a wide spectrum of approaches [2]

, reaching from detailed individual-level over mechanistic to more statistical or empirical models and finally black-box machine learning approaches. In this article we build upon the endemic-epidemic framework (in the following: EE), an autoregressive model class for multivariate surveillance counts first introduced by Held et al

[3] and extended in a series of articles [4, 5, 6, 7]. The approach can be motivated from mechanistic arguments [8] and has been used in a number of studies to identify factors relevant for disease transmission [9, 10]. In other applications the focus was on accurate predictions for future disease incidence [11, 12]. For outbreak detection the model has so far only been used in a univariate setting [13].

In its current formulation and implementation in the R package surveillance [7] the EE framework is limited to models with one lag. Relaxing this assumption is interesting from a mechanistic perspective, as generation times (or serial intervals) may vary in reality and exceed one measurement interval [8]. But also in terms of forecasting this may be beneficial. In a recent article, Ray et al [14]

suggested a kernel conditional density estimation (KCDE) method to forecast univariate time series of counts. They used the EE model with lag one as a competitor in a case study on dengue fever in Puerto Rico, and found their method to give better results. However, both conceptually and computationally, KCDE seems difficult to extend to a multivariate setting, which is the primary focus of the EE class. It is therefore promising to integrate some characteristics of the KCDE method into the EE framework, in particular the fact that it bases forecasts on the last

observations rather than just one. To this end we suggest higher-order lag models based on parsimonious weighting schemes for the different autoregressive coefficients. These are inspired by integer-valued GARCH time series models [15] and discrete-time serial interval distributions used in infectious disease modelling [16].

Multivariate periodic models are an important special case within the EE framework and frequently applied in the absence of long-term temporal trends [4, 7, 9]. We therefore analyse such models more closely, taking into account the extension to higher-order lags. In particular we address their periodically stationary properties [17] and show how to compute periodically stationary moments. These give a concise description of the model behaviour during different parts of the year, enhancing interpretability and enabling new strategies for model assessment. We introduce visual tools and a new goodness-of-fit test to assess how well a model reflects the seasonal structure of the data. A well-fitting model can in turn be used to assess whether a new observation is out of the ordinary. We exploit this in new algorithms for uni- and multivariate outbreak detection.

The article is structured as follows. Section 2 introduces our case studies on two gastrointestinal diseases in the German state of Berlin. In Section 3.1, a brief overview on the EE model class is given before it is extended towards higher-order lag structures in Section 3.2. In Sections 3.3 and 3.4 we study the periodically stationary properties of the extended model class and propose ways of using them for model assessment. We then apply the model to the case studies on gastrointestinal diseases model 4, comparing the fit and predictive performance of different variants. In Section 5 the potential of periodically stationary moments in outbreak detection is addressed before a brief discussion in Section 6 concludes the article.

2 Case study: Surveillance of gastrointestinal diseases

Norovirus and rotavirus are common causes of viral gastroenteritis, with symptoms including diarrhoea, vomiting, fever and headaches. Both are highly transmissible with low infectious doses. They can be transmitted directly from person to person, but, due to their high environmental stability, also via contaminated food or items.[18] The average generation time is 3–4 days for norovirus and 5–6 days for rotavirus.[19] For rotavirus a vaccine has been available since 2006, while this is not the case for norovirus. In Germany both diseases are notifiable and the Robert Koch Institute, Germany’s national public health institute, makes weekly numbers of reported cases available ( We consider counts from the twelve districts of Berlin, Germany’s largest city, from week 2011-W01 through 2017-W52 (downloaded on 30 Aug 2018). Figure 1 shows weekly totals and the spatial distribution over the twelve districts. We also show counts from the districts of Pankow and Spandau, Berlin’s largest and smallest, respectively, which will serve for illustration throughout the article (see the Supplementary Material for the other districts). Both time series show stable seasonal patterns but no long-term trends. Norovirus typically peaks in the winter, rotavirus in late winter and spring. Note that parts of the norovirus data have been used in previous work [7, 12].

Figure 1: Weekly number of norovirus (left) and rotavirus (right) cases in Berlin, 2011–2017. Top row: Pooled over the twelve city districts. Second and third row: Districts of Pankow and Spandau. Plots for the remaining districts can be found in the Supplementary Material. Bottom row: Average number of cases per year and 100,000 inhabitants in the twelve districts.

3 Methodology

3.1 The endemic-epidemic model

The endemic-epidemic (EE) model, first introduced by Held et al [3] and extended in a series of subsequent papers [4, 5, 6, 7], is a statistical time series model for multivariate infectious disease surveillance counts. Given the past, the number of cases in unit and week

is assumed to follow a negative binomial distribution


with (conditional) mean and overdispersion parameter so that . A common simplification is to assume just one global overdispersion parameter, i.e.  for . Counts from different units and at time are assumed to be conditionally independent given .

The conditional mean in (1) is further decomposed into


where is referred to as the endemic component and represents the part of the incidence that cannot be explained by previous cases. For norovirus and rotavirus this includes for instance sporadic foodborne cases. The remaining autoregressive term in (2) is the epidemic component and describes how the incidence in region is linked to previous cases in units . The parameters and are constrained to be non-negative and modelled in a log-linear fashion, for instance with sine-cosine terms to account for seasonality [5]. Long-term temporal trends and covariates like meteorological conditions [8] or vaccination coverage [9] could also be included.

The coupling between units is described by normalized weights . Parsimonious parameterizations for different types of stratification exist, including the use of social contact matrices for spread across age groups [10] and power law weights to describe spatial spread [6]. In the latter case the weights are specified as


where is the path distance between the regions and (with ) and is a decay parameter to be estimated from the data. The power law formulation is motivated from human movement behaviour [20] and has been found to be an efficient way of capturing spatial dependence, yielding similar or even better results than complex approaches based on mobility data [21].

3.2 Higher-order lags

Formulation (2) is restrictive as it only takes into account cases from time , but ignores observations further back in the past. In the following we include higher-order lags, starting with the simple univariate formulation


where the autoregressive coefficients are again assumed to be non-negative. However, unconstrained maximum likelihood estimation of the coefficients may become unstable for larger [16]. A more parsimonious formulation can be obtained by keeping , but adding an autoregression on the past conditional mean to (4) [15]:

For this corresponds to an ARMA(1, 1)-type model with equivalent AR representation


where the weights


for are geometrically decaying. A combination of (4) and (5),


with normalized weights based on (6) is a promising formulation for diseases reported on a weekly basis. From a mechanistic perspective the weights can be interpreted as the discretized generation time or serial interval distribution, with (6

) corresponding to a geometric distribution. Another option


is the shifted Poisson distribution with weights


This may be more suitable for daily data or long generation times as, unlike (6), it does not force the first lag to be the most important. We moreover consider a simple version of (7) with (in the following denoted as ”AR2”), i.e.


Weights based on discretized versions of continuous distributions like the gamma distribution

[16] could also be examined. Turning to the multivariate setting we specify the EE model with higher-order lags as


so the weights are the same for all units. Here denotes the relevant observations prior to time . As before, a pre-specified number of lags is included and the lag weights are normalized to . A related formulation with pre-specified weights can be found in work by Wang et al [23].

We have extended the likelihood-based inference procedure from the surveillance package [4, 11] to allow for higher-order lags in the package hhh4addon (see Appendix A). The weighting parameter (or a suitable transformation) is estimated via a profile likelihood approach while for a given

optimization is based on a Newton-Raphson method with analytical formulae for gradients and Hessians. Once the maximum is found the full log-likelihood function can be differentiated numerically to obtain standard errors based on the observed Fisher information matrix of all unknown parameters. This way the uncertainty in

is accounted for correctly in all standard errors, despite the fact that profiling is used in the optimization.

3.3 Periodic stationarity

In many instances [4, 7, 9, 12] EE models take a periodic structure with cycles of length , i.e.  and for . They then fall into the large class of periodic autoregressive models. These have found applications in numerous disciplines, see Gardner et al[17] for an overview and Moriña et al[24] for an example from infectious disease epidemiology. A rich body of methodological literature exists; in particular periodic autoregressive moving average (PARMA) models have been studied in much detail, see for instance the works by Lund and Basawa [25, 26].

We now describe how to compute the periodically stationary moments of EE models, a more formal derivation with proofs is given in Appendix B. An

-dimensional vector process

is called periodically stationary in the mean with period if exists and holds for all . The process is called periodically stationary of second order if the same periodicity holds for the autocovariance functions , i.e.  for all

The periodically stationary variances are also denoted by


We first consider the univariate formulation


where are periodic with period . To relax the AR(1) assumption the lagged conditional mean is included in (13), leading to a periodic version of the negative binomial INGARCH(1, 1) model by Zhu [15]. As mentioned in Section 3.2, this is closely related to a geometric lag structure. To be in line with the existing literature we also allow for periodicity in and the overdispersion parameter . A similar periodic Poisson INGARCH model has been suggested by Bentarzi and Bentarzi [27], stationarity conditions for the negative binomial case are given in work by Aknouche et al [28]. Building on their results the periodically stationary moments for model (12)–(13) can be obtained. To simplify the notation we set for in the following. If , the process is periodically stationary in the mean with period and means


The process can thus be periodically stationary even if for some . During such periods the process can exhibit strong growth, as is needed to capture seasonal disease outbreaks. Under the stricter condition with the model is second-order periodically stationary with variances


and autocovariance function


where . For multivariate models with higher-order lags as in (10)–(11) we can still obtain the periodically stationary means analytically while this turns out to be difficult for the variance/covariance structure. However, it is possible to compute the latter numerically using a recursive relationship, see Appendix B.2 for details.

3.4 Model diagnostics using periodically stationary moments

A common way to check whether a model fits the data well is to consider conditional Pearson residuals


where are the observed counts. The residuals should be approximately uncorrelated with mean 0 and variance 1. We here suggest a complementary approach where instead of the conditional moments in (17) unconditional, i.e. periodically stationary moments and are used. These are properties purely of the fitted model and no data is re-used in their computation. They are useful to assess if a model has incorporated important properties of the data, such as the seasonal structure. In a first step the periodically stationary moments of a fitted model

can be compared visually to the empirical means and standard deviations of the available observations from a given unit

and calendar week :


Based on these quantities, marginal Pearson residuals


can be defined and used to identify important deviations. The

should approximately also have mean 0 and variance 1, but are usually correlated. A goodness-of-fit testing procedure can be constructed based on the test statistic


with . The covariance matrix is straightforward to obtain from the periodically stationary moments of the model . The , being empirical means of

practically independent and identically distributed random variables, follow approximately a normal distribution. Consequently, under the model

, follows approximately a distribution with degrees of freedom. This, however, ignores that we estimated from the data. A possible correction is to reduce the number of degrees of freedom to , but simulation studies indicate that this leads to a somewhat liberal testing procedure. This problem is aggravated as the distributions of the

may still be right-skewed, especially if the data cover only few cycles or if counts are low. We thus suggest a Monte Carlo approach

[29] to obtain the distribution of under model . To this end we simulate repeatedly from , re-fit the model to the simulated data and retain the corresponding values of .

4 Application to case studies

4.1 Model specifications

We now apply the higher-order lag models to the data on norovirus and rotavirus in the twelve districts of Berlin. To model the incidence in districts and weeks W2011-06 to W2017-52 () we use the formulation (10)–(11) with and the power law from equation (3) to describe spatial spread. The overdispersion parameter is shared across districts. For the weights we consider the three forms (6), (8), (9) as well as the simpler AR1 version where . To account for yearly seasonality sine and cosine waves with frequency are included into the endemic and the epidemic parameters:


The indicator accounts for reduced case counts during the Christmas holidays (calendar weeks 52 and 1). This effect, quantified by the coefficients and , is often attributed to changes in reporting behaviour or social contact patterns [7]. The intercepts are unit-specific to account for variation between the districts, including heterogeneity in population numbers. The parameters and for seasonality are shared across districts.

In order to explore the role of different model features we study different variants of this model:

  1. the full model as described above

  2. a model without seasonal terms in the epidemic parameters, i.e. setting in (22)

  3. a model without cross-lags, i.e. replacing (3) by and for

  4. a model without seasonal terms in the epidemic parameters and without cross-lags

In particular, this will allow us to assess the importance of higher-order lags compared to the other extensions of the baseline formulation (d) with AR1 structure.

4.2 Model fits

We fitted each combination of model specifications (a)–(d) and lag structure (AR1, AR2, geometric, Poisson) to both multivariate time series. A comparison via the Akaike information criterion (AIC; Table 1) indicates that for both norovirus and rotavirus the model performance improves with increasing complexity. The full model (a) performs best while the reference model (d) does worst. The higher-order lag versions consistently outperform their AR1 counterparts. Differences between the weighting schemes exist, but are less pronounced than the differences to the AR1 versions. The geometric formulation leads to the best results, followed by the Poisson and AR2 parametrizations. Interestingly, the inclusion of geometric lags into the baseline model (d) with AR1 lags leads to a more pronounced improvement than cross-lags or seasonality in the epidemic parameters (-146.2 vs. -129.0 vs. -41.3 for norovirus; -93.1 vs. -57.3 vs. -63.9 for rotavirus). This emphasizes the importance of including the information contained in observations from two or more weeks back in time. A very similar ordering of models has been identified in an assessment of the out-of-sample predictive performance (one-step-ahead forecasts) which can be found in the Supplementary Material.

Norovirus Rotavirus
Model version (a) (b) (c) (d) (a) (b) (c) (d)
AR1 -164.6 -129.0 -41.3 0.0 -119.6 -57.3 -63.9 0.0
AR2 -262.4 -225.0 -153.0 -111.5 -179.0 -119.6 -132.4 -68.4
geometric -279.1 -237.8 -191.1 -146.2 -206.9 -133.8 -170.6 -93.1
Poisson -275.9 -236.8 -178.5 -136.2 -196.0 -131.2 -154.7 -85.4
Table 1: Improvements in AIC relative to the baseline model (d) with AR1 lag structure. Results are shown for both norovirus and rotavirus and all combinations of model version and lag weighting scheme.
Figure 2: Estimated lag weights , total endemic component , decay parameter of the spatial power law and overdispersion parameter for the full model (a) with the four different lag schemes (AR1, AR2, geometric and Poisson). Top row: Norovirus. Bottom row: Rotavirus.

In the remainder of this section we focus on the full model (a) which performed best in terms of AIC. Figure 2 shows some characteristics of fits with the four different lag structures. The estimated lag weights , shown in the first column, are quite similar for the two diseases, despite the differences in average generation times mentioned in Section 2. The AR2 models assign a normalized weight of to the first lag, the geometric and Poisson versions slightly less (). The geometric version puts more weight on lags of order three or higher than the Poisson parametrization. As shown in the second column, moving to higher-order lags reduces the total endemic component . This is expected as part of the disease risk now explained by lags 2 to 5 was previously absorbed by the endemic component. Note that the lower values in calendar weeks and are due to the indicator from (21). The power law decay parameters increase when higher-order lags are included (third column), meaning that the model borrows slightly less information across regions. Finally, the overdispersion parameters are lower for the extended models, indicating less unexplained variability (fourth column).

The role of the different components and lags is best described by the decomposition of the fitted values, shown for model (a) with geometric lags in Figure 3. We display this exemplarily for Pankow, and refer to the Supplementary Material for the remaining districts. The epidemic component has a dominant role in the modelling of both diseases. The contributions from other districts (blue areas) play a larger role for norovirus than for rotavirus (compare also the in Figure 2

). The role of higher-order lags (shown in darker red and blue, respectively) is similar for both diseases. Overall the model seems to capture the dynamics in the time series well, with most of the observations falling between the 10% and 90% quantiles of the fitted distribution (dotted lines).

Figure 3: Fitted values of model (a) with geometric lags in the district of Pankow, decomposed into the endemic (grey) and epidemic component, along with observed values (dots). The epidemic component is further decomposed into contributions from the same region (red) and other regions (blue) as well as the different lags (light colours for first lag, darker for lags 2–5). Dotted lines show the 10% and 90% quantiles of the fitted negative binomial distributions. Note that the rotavirus time series contains an unusually high value in week 2015-W15.

4.3 Model diagnostics

Next, we assess the adequacy of our models using the methods discussed in Section 3.4, starting with an analysis of the conditional Pearson residuals (17). The residual autocorrelations look quite similar across models (a)–(d), but differ across lag structures. Figure 4 shows these differences for the full model (a) and the districts of Pankow and Spandau (plots for the other districts are in the Supplementary Material). The AR(1) model shows some residual autocorrelations at lags and while this is much less the case higher-order lag models. None of the models show significant residual autocorrelation at lag , i.e. a remaining seasonal pattern.

Figure 4: Autocorrelation of conditional Pearson residuals (17) for model (a) with different lag weighting schemes, applied to both diseases. The dotted lines are thresholds for significance at the 5% level. Results are shown for the districts of Pankow (left column) and Spandau (right column).
Figure 5: First and second row: Periodically stationary means and standard deviations of models (a)–(d) with geometric lags applied to norovirus. Dots represent the empirical moments and as in equation (18). Bottom row: Marginal Pearson residuals of models (a) and (b). Models (c) and (d) are omitted as the residuals are practically identical to those of (a) and (b), respectively. Results are shown for the districts of Pankow (left column) and Spandau (right column).

To complement this residual analysis we assess the periodically stationary moments of the fitted models. In terms of means and standard deviations the different lag weighting schemes lead only to minor differences, so that we focus on the differences between models (a)–(d) instead. Figure 5 shows moments from the four geometric lag models for norovirus and the districts of Pankow and Spandau, along with empirical estimates as in (18). For the first weeks of the year, i.e. towards the end of the season, there are some differences between the models and discrepancies with the observed data. However, variability is quite high during this part of the year so that some deviations should be expected. From week 15 onwards agreement with the data is good for all models. The empirical standard deviations look a bit more erratic, but are still in good agreement with the model-based ones. Models (a) and (c) imply slightly more variability than (b) and (d). The marginal Pearson residuals

fall mainly into a range between -2 and 2 with no strong outliers. Overall, the seasonality of the data appears to be reflected well in the models. This is also the case for the remaining regions, see the Supplementary Material.

The corresponding analysis for rotavirus is shown in Figure 6. The empirical moments and are less smooth than for norovirus. Again there is a certain divide between models (a) and (c) on the one hand and (b) and (d) on the other. During the high season, the means from models (a) and (c) are in better agreement with the observations than (b) and (d) which imply lower incidence. The implied variability, too, is lower for (b) and (d). The Pearson residuals show some quite large values, especially for models (b) and (d). These are not due to systematic differences between observed incidence and periodically stationary means, but due to single extreme events. For instance the large residual for week 15 in Pankow is due to the observation of 77 cases in 2013-W15 (highlighted in Figure 3). Seasonality in the rotavirus data thus seems more difficult to capture appropriately.

Figure 6: First and second row: Periodically stationary means and standard deviations of models (a)–(d) with geometric lags applied to rotavirus. Dots represent the empirical moments and as in equation (18). Bottom row: Marginal Pearson residuals of models (a) and (b). Models (c) and (d) are omitted as the residuals are practically identical to those of (a) and (b), respectively. Results are shown for the districts of Pankow (left column) and Spandau (right column).

To assess this more formally we applied the testing procedure from equation (20). The results, shown in Table 2, confirm the visual impression and indicate moderate to strong evidence for misspecification in all rotavirus models (-values of 0.022 or below). The norovirus models seem to be unproblematic. The -values shown here are based on simulation; approximate -values based on distributions lead to qualitatively similar conclusions and can be found in the Supplementary Material.

Norovirus Rotavirus
(a) (b) (c) (d) (a) (b) (c) (d)
AR1 0.51 0.51 0.57 0.45 0.005 0.001 0.006 0.001
AR2 0.70 0.63 0.71 0.57 0.016 0.001 0.007 0.001
geometric 0.68 0.47 0.67 0.47 0.018 0.001 0.011 0.001
Poisson 0.73 0.56 0.72 0.53 0.022 0.001 0.008 0.002
Table 2: Assessment of goodness of fit using the test statistic from (20), applied to norovirus and rotavirus and all combinations of model version and lag structure. The reference distribution of was obtained by simulation (999 iterations).

5 New strategies for prospective outbreak detection

In Section 4.3, we identified unusual events in the rotavirus data retrospectively using the periodically stationary moments of EE models. A natural question is whether this is also possible prospectively to detect outbreaks. We suggest different strategies to achieve this, inspired by the widely used Farrington-Noufaily (FN) algorithm [30, 31]. In the FN algorithm, thresholds are based on a univariate quasi-Poisson regression model which accounts for seasonality, but not for the autocorrelations typically found in infectious disease counts. We replace this by more complex models from the EE class.

5.1 Univariate outbreak detection

We first consider a detection algorithm which is based on a multivariate model, but assesses each time series separately, so that we refer to it as univariate outbreak detection. For application in week we start by fitting an EE model to data from the years preceding , with the most recent 26 weeks discarded. Subsequently, the new observations are compared to seasonal expectations constructed from this model. Our first approach is based on a third type of standardized residuals, defined as


where the denominator is a first-order Taylor approximation to the standard deviation of . As in the original Farrington et al [30] paper, the 2/3 power transformation is applied to induce symmetry in the distribution of the . An outbreak in district is flagged whenever exceeds a certain value . Setting and applying the Bonferroni correction for multiplicity gives a threshold of for . Alternatively, we approximate the unconditional distribution of by a negative binomial (NB) distribution with mean and variance , similar to the approach in Noufaily et al [31]. An alarm is flagged if exceeds the quantile of this distribution, in our case the 99.92% quantile. Note that our methods do not involve any conditioning on the preceding observations , i.e. we construct purely seasonal thresholds. This distinguishes our approach from e.g. Wang et al [32] and Rao and McCabe [33] where one-step-ahead forecast distributions are used.

Running the described procedure directly on the available historical data would likely lead to too high thresholds, as past outbreaks impact the model fit. In the FN algorithm a downweighting scheme based on Anscombe residuals is applied, but it is not clear how this translates to a multivariate autoregressive setting. We therefore adopt a different approach inspired by Wang et al [32] who refer to it as smoothing. Whenever an incoming observation exceeds the quantile of the approximated periodically stationary distribution, it is replaced by the one-step-ahead forecast , suitably rounded to the next integer, for all future model fits. For the historical data preceding the actual detection period this is done retrospectively.

We apply the two versions of the outbreak detection procedure (2/3 power transformation and NB approximation) to the norovirus and rotavirus counts of the years 2015–2017. We use model (a) with geometric lags and follow Noufaily et al [31] in setting . For comparison we applied the FN algorithm as implemented by Salmon et al [34] separately to each district. We chose the recommended half-window width , a threshold for downweighting of 2.58 and, as in the EE algorithm, with 26 weeks discarded. We ran the FN algorithm with both the 2/3 power transformation [30] and the negative binomial distribution [31], thus obtaining references which parallel our two approaches. The EE and the FN methods flag a similar total numbers of alarms, but agreement is not perfect, see Table 3 for a comparison. In both frameworks, using the power-transformation leads to some additional alerts compared to the negative binomial approximation.

(a) 2/3 power transformation (b) NB approximation
Norovirus Rotavirus Norovirus Rotavirus
no alarm alarm no alarm alarm no alarm alarm no alarm alarm
EE no alarm 1187 9 1184 7 1205 5 1192 5
alarm 11 17 5 28 9 5 6 21
Table 3: Concordance in alarms flagged by the endemic/epidemic and the FN procedures, (a) based on the 2/3 power transformation, (b) based on the negative binomial approximation.

Figure 7 visualizes the different procedures for rotavirus in three selected districts (plots for the other districts and norovirus are available in the Supplementary Material). During the off-season the thresholds of the FN algorithm are often higher (bottom row), in some instances so high that potentially interesting patterns as in the last quarter of 2017 in Charlottenburg-Wilmersdorf are only partly flagged. During the high season the thresholds from the EE method are higher in some instances. The threshold curves are generally smoother than in the FN method and, as a consequence of the shared parameters for seasonality, synchronized across districts. The systematic shifts between the curves observed in certain regions are partly due to the fact that in the EE model the overdispersion parameter is shared across regions while the FN procedure uses separate dispersion parameters.

Figure 7: Outbreak detection algorithms applied to rotavirus (three selected districts). Top row: EE algorithm based on 2/3 power-transformed residuals. Bottom row: EE (red) and FN (green) algorithms using the negative binomial approximation.

5.2 Multivariate outbreak detection

We now move on to an outbreak detection algorithm which can be applied jointly to all districts. A simple solution is to aggregate the data to a univariate time series by summing over the twelve districts and then apply the EE detection algorithm with (i.e. without Bonferroni correction). The result of such an approach (based on 2/3 power-transformed residuals) is shown in the top row of Figure 8. In contrast to the district-wise analysis, more outbreaks are flagged for norovirus than for rotavirus. This is due to the well-documented early onset of the 2016/17 norovirus season [35]. For rotavirus, hardly any aberrations are flagged at the aggregate level. Those identified at the district level are mostly absorbed in the overall noise. This includes pronounced patterns like the first two alerts in Charlottenburg-Wilmersdorf (weeks 2016-W08 and W11; compare Figure 7).

Figure 8: Top row: Outbreak detection using 2/3 power-transformed residuals , applied to aggregated counts of norovirus and rotavirus, respectively. Bottom row: Joint outbreak detection based on the test statistic from equation (24). The test statistic is shown on a square root scale. Grey bars indicate weeks with at least one district-level alert flagged by the EE algorithm with 2/3 power transformation.

To avoid this apparent loss of sensitivity it is desirable to maintain a truly multivariate approach. Various methods for multivariate outbreak detection have been suggested [36, 37, 38], see Allévius and Höhle[39] for a recent and comprehensive overview. However, these are usually based on quite simplistic reference models so that it is of interest to explore methods based on more realistic formulations within the EE class. The residual-based version from the previous section extends more easily to this task than the negative binomial approximation. A natural approach is to use a test statistic


with , as commonly used in scalar reduction approaches to surveillance [39]. If we assume that the follow approximately a standard normal distribution we get that . For and this leads to a (one-sided) threshold of 26.2. The bottom row of Figure 8 shows the values of over the course of the years 2016–2017. A much larger number of alerts (26 and 22 for norovirus and rotavirus, respectively) are flagged than for the aggregated data (as shown in the top row of Figure 8). Agreement with events detected at the district level (with the Bonferroni-corrected -level 0.0008; highlighted in grey) is remarkably good.

6 Discussion

In this article we extended the endemic-epidemic modelling framework towards higher-order lag structures. In case studies on two gastrointestinal diseases the three proposed parameterizations led to fairly similar results, with the geometric weighting scheme performing best. Compared to the AR1 formulation all variants consistently improved the model fits and forecasting performance. A potential limitation of our lag weighting procedure is that the parameter governing it is assumed to be time-constant and identical across regions. This is motivated from the assumption of similar serial interval distributions. There are several reasons why this may be too restrictive, for instance varying social contact patterns and meteorological conditions or different serial interval distributions of dominant strains in different seasons. However, while in principle it would be possible to allow for a time-varying , we think that this would likely lead to identifiability problems.

We then demonstrated how periodically stationary moments of our models can be evaluated and used in model assessment. The suggested methods helped identify problems in the models for rotavirus which were not visible from a conventional analysis of the (conditional) Pearson residuals. Lastly, we suggested a novel approach for outbreak detection which is also based on periodically stationary properties of EE models. It represents a natural way of obtaining reference distributions and thresholds from multivariate models which explicitly take into account the complex auto- and cross-correlation patterns in the data. Outbreak detection can then be based on considerably more realistic models than in the FN algorithm. Moreover, our approach allows to share strength across time series, leading to a more stable estimation of the seasonality and degree of variability. We suggested both a district-wise and a joint (scalar reduction) procedure to flag alarms. The latter showed good sensitivity for district-level anomalities. In an aggregate-level analysis which we conducted for comparison these patterns were often obscured by noise from the remaining districts. These results from our case study on norovirus and rotavirus are promising, but at the same time a number of questions remain before our method could be used in practice. A standardized procedure to select the model formulation would need to be established. This could be done by specifying a set of candidate models, including models as simple as the quasi-Poisson model underlying the FN algorithm, and selecting the best model based on AIC. Another question is how diseases with longterm trends in incidence could be handled. A potential solution would be to move from periodically stationary moments to e.g. 26-week-ahead forecasts, which are also almost independent of any conditioning values, but do not require periodicity. The next step in the development of the algorithms will be more extensive simulation and real-data studies involving comparisons to established methods [40].

Appendix A Software and reproducibility

Methods for model fitting and the computation of periodically stationary moments can be found in the R package hhh4addon which extends the functionality of the surveillance package [7]. It is available at and is easiest installed using the devtools package:

library(devtools); install_github("jbracher/hhh4addon", build_vignettes = TRUE)

The package also contains the data presented in Section 2 , which were obtained from the web platform of the Robert Koch Institute ( Codes to re-run the analyses from Section 4 can be found in the file example_analyses/noro_rota_berlin.R.

Appendix B Details on periodically stationarity properties from Section 3.3

b.1 The univariate model

We first consider the univariate model from (12)–(13), i.e. 


As mentioned in Section 3.3 we can build on work by Bentarzi and Bentarzi [27] and Aknouche et al [28] to obtain the periodically stationary moments of this model. The conditions for periodic stationary in the mean is while for second-order periodic stationarity the slightly stricter needs to be fulfilled (Aknouche et al [28], Proposition 3.1). To obtain the periodically stationary means we start by noting that and then simply iterate the conditional mean equation (26):

With this implies equation (14). Keep in mind the notational convention for .

For the periodically stationary variances consider

with . Iterating this recursion gives

and, as ,

Equation (15) now follows directly from


For the periodically stationary autocovariances first note that


for . Next consider


For , equation (29) becomes

The last step holds as . Using (27) to express through we obtain


For , we can use (28) in order to simplify (29) to


Equation (16) then follows directly from (30) and (31).

b.2 The multivariate model

In practice we are more interested in the multivariate version of our model, including higher order lags. Moving to a slightly more general formulation of the model (1)–(2), consider an -dimensional vector process with


The parameters and are constrained to be non-negative and periodic with period , i.e.  etc. For simplicity we assume that . To make computations easier in the following we introduce column vectors

which contain 1 as first entry followed by the stacked observations of time periods leading up to and including . The model parameters are arranged into square matrices

so that . The dispersion parameters are assembled in column vectors with leading zeros.

A necessary and sufficient condition for periodic stationarity in the mean of the process can be taken from work by Ula and Smadi [41] (we provide this in the Supplementary Material). If it is fulfilled the construction of the implies

for the periodically stationary means . Iterating this and using we get where . We obtain

as the eigenvector of

corresponding to an eigenvalue of 1, scaled so that the first element is 1.

Given that is second-order periodically stationary a similar recursive relationship exists for the uncentered second moments . With it can be shown that (see Section B.3 for the derivation)


Equation (35) just describes the evaluation of a quadratic form and subsequent manipulation of the last diagonal elements. It turns out to be quite challenging to solve this system analytically or find a condition under which a solution exists. We thus resort to solving it numerically:

Algorithm 1

Iterative computation of  –  Set to a starting value, e.g.   –  For until convergence or a maximum number of iterations:   –  Compute from (using (35))   –  For :     –  Compute from     –  If for : Return

This is an extension of an algorithm from Held et al [12] (Appendix A) where we only considered conditional moments and first order lags. The provided algorithm only allows to compute autocovariances at a range . If higher ranges are required one option is to artificially increase by adding autoregressive parameters with value 0 for the required number of lags.

b.3 Derivation of equation (35)

Assume a process of type (32)–(34) which is periodically stationary in the mean and consider