## 1 Introduction

At time of writing, it is five months into the COVID-19 pandemic. Despite millions of tests having been performed, there are still no results from statistically well founded sampling based testing programmes to establish basic epidemic quantities such as infection fatality rate and infection rates. In the absence of such direct data, epidemic management has to proceed on the basis of data produced largely as a side effect of the clinical response to the disease.

The purpose of this note is to point out that a very simple smooth additive model is able to satisfactorily represent country specific death rate data series, such as those shown in figure 1 and collated at
`https://www.worldometers.info/coronavirus/`

. Such a model allows resolution of inferential questions about the timing of the peak in the underlying death rate. Augmenting this model with information from Verity
et al. (2020)

on the distribution of times from symptom onset to death, a simple extension to the model also allows inference about the infection rate dynamics required to generate the deaths series. The advantage of such simple semi-parametric models is that, in contrast to detailed epidemic models, they make only modest assumptions about the data. They can therefore offer relatively clear insight into what the data are telling us, without the influence of strong modelling assumptions, thereby providing a useful complement to more heavily model based analyses.

## 2 A smooth additive model for the death rate series

Let denote the deaths reported on day . Assume that

follows a negative binomial distribution with mean

and variance

. Then letwhere is a smooth function of time, , measured in days, and is a zero mean cyclic smooth function of day of the week, , set up so that , where or 2 denotes order of derivative. is a known link function, either log or identity in this case.

and are represented using appropriate intermediate rank spline basis expansions, with associated cubic spline penalties (see e.g. Wood, 2017). represents the underlying death rate, while accounts for the strong weekly cycle seen in the data, presumably as a result of reporting artefacts and the well known weekly signal in hospital admissions. The model treats the weekly variability issue descriptively rather than mechanistically.

Model inference uses the empirical Bayes approach of Wood et al. (2016)

in which the smooth functions are estimated by penalized likelihood maximisation

(e.g. Green and Silverman, 1994), with the smoothing parameters and estimated by Laplace approximate marginal likelihood maximization. Writingfor the combined vector of basis coefficients for

and , the penalized log likelihood can be writtenwhere : and are known constant positive semi-definite matrices. Smoothing parameters, and , control the smoothness of and . Let be the maximizer of the penalized log likelihood, and its negative Hessian at . Viewing the penalty as being induced by an improper Gaussian prior, , is also the MAP estimate of . Furthermore in the large sample limit

(1) |

Writing the density in (1) as , and the joint density of and as , the Laplace approximation to the marginal likelihood for the smoothing parameters and is . Nested Newton iterations are used to find the values of maximizing and the corresponding (for details see Wood et al., 2016).

Given (1) credible intervals for are readily computed, but it is also straightforward to make inferences about when the peak in occurs. Simply simulate replicate coefficient vectors from (1) and find the day of occurrence of the peak for each corresponding underlying death rate function, . This approach requires a few lines of R code to implement, which are provided in the supplementary material.

The model fits to the reported deaths in the UK and Sweden are shown in figure 2 along with basic model checking plots. Figure 3

shows the posterior mode of the underlying death rates against day, with credible intervals, as well as illustrating the posterior distribution of the day of peak underlying death rate. The UK peak is around day 28, something which could have been confidently declared around day 40 with this modelling approach. For Sweden the peak is later and by the end of the plotted data it is not yet quite possible to rule out that the peak has not yet occurred, although the probability of this is

.## 3 Inferring the past fatal infection profile

It is also of interest to infer the sequence of past (ultimately fatal) infections required to produce the observed sequence of deaths. This is not simply a matter of lagging the death sequence by the average infection to death time, because the distribution of time from symptom onset to death is broad and asymmetric. Verity et al. (2020) show that the distribution of time from onset of symptoms to death for fatal cases can be modelled by a gamma density with mean 17.8 and variance 71.2 (s.d. 8.44). Note that this mean is longer than that often quoted in clinical papers and the media, because it corrects for the bias associated with seeing short infections before long ones at the start of an epidemic.

Let be the function describing the variation in the number of eventually fatal cases over time. Let be the square matrix such that if and 0 otherwise, where denotes the onset-to-death gamma density, given above. If and then , where is the expected number of deaths on day . Now can be represented using an intermediate rank spline, again with a cubic spline penalty. We can then employ exactly the model of the previous section but setting . The only difference is that we need to infer over a considerable period before the first death occurs. 40 days is clearly sufficient given the form of . In fact it makes sense to reduce this interval, after inspecting a pilot run, to avoid a lengthy initial period of zero fatal cases, consequent lack of identifiability of and poor MCMC mixing. For the UK data a 20 day initial period is more than sufficient. For stable inference it also makes sense to explicitly include in the death data the fact that no deaths were observed in this initial period. Note that is rank deficient, so inferring can be viewed as an inverse problem, only soluble because of the smoothness assumption on .

In principle this model could be estimated using the framework of Wood et al. (2016), but some non-trivial work would be required to set it up. In the current context, in which results are needed rapidly, it is sensible to take the approach requiring the least amount of bespoke code. The most straightforward implementation is to use Gibbs sampling via JAGS (Plummer, 2003; Plummer et al., 2006). Implementation is particularly straightforward if the automatic code template generation described in Wood (2016) is used, as implemented in mgcv function jagam. The JAGS code template can be generated for the model given in the previous section. A few lines of additional code are then sufficient to implement the modification described here. The only disadvantage of the approach is that stochastic simulation is slow. While the previous section’s model took less than 1/20th of a second to compute, a couple of hours of computing was required for a mean effective sample size of 3.5 thousand for the coefficients of (and even then the slowest mixing coefficient had an effective sample size of only 250).

Figure 4 (left) shows the posterior median estimate of the fatal infection rate against day for the UK. Also shown are 80% and 95% credible intervals, and a scaled bar chart summarizing the posterior distribution of the day of peak fatal infection. The day of UK lock down is shown as a vertical grey line. The alignment with the time axis requires an allowance for the incubation period of the disease, i.e. the time from infection to onset of symptoms. This period is 2 to 11 days for 95% of people, with a median of 5 days (Lauer et al., 2020). In the absence of more detailed information about the incubation distribution I simply shifted the inferred curve back 5 days.

Figure 4 (right) shows a model checking comparison with the death rate models of section 2. 100 death rate profiles were simulated by taking the median number of fatal infections each day and, for each infection, randomly generating the corresponding death day from the onset-to-death gamma model. These profiles are shown in grey. Overlaid is the death rate estimate from the section 2 model. Clearly the models are consistent with each other, but at its end the median infection curve may be over-optimistic about how far the infection rate has declined.

Given that the peak is estimated to occur before lock down, it is worth considering whether this could be an artefact resulting from a smooth model trying to accommodate a sudden drop in infections at lock down. There is obviously a question about how sudden this drop would be, given that lock down initially substantially increases contact with household members at the expense of everyone else, and in addition there had already been substantial social distancing before lock down. However sticking with checking the model itself, the inferred curve has strange properties if it contains artefacts from trying to match a very steep drop. The most notable is that the magnitude of the slope of the curve is actually lower on the first day of lock down, after the peak, than before the peak. The second is that the log scale squared second differences, which measure the contribution to the smoothing penalty, are low around lock down compared to earlier: again there is no indication of the model trying to smooth through a step.

A peak before lock down is consistent with the absence of a catastrophe in the data from Sweden, which did not lock down, and is not entirely implausible, given the level of social distancing already in place before UK lock down. Alternatively, it could be that the onset to death distribution from Verity et al. (2020) gives too long durations for the UK context. For example, this could happen if age strongly effects time from onset-to-death and the age structure of the UK cases differs substantially from the age structure of the cases used by Verity et al. (2020). Similarly within hospital transmission to already very ill patients might lead to shorter disease duration, and in turn to the peak being estimated as earlier than it was, but the cross infection rate would have to be quite high for this to be a substantial factor.

The very wide confidence interval at the end of the data period relates to the rank deficiency of the

matrix and the mild smoothing assumptions on used to deal with this. Increase in at the end of the time period is biologically implausible, given earlier decline and the measures still in place at the end of the period. This implausibility is not built into the model, in keeping with the desire for the analysis to be as data driven as possible. However, the wide interval does emphasise the desirability of disease surveillance via a statistically sound randomized testing program in order to reduce uncertainty. Such testing would be inexpensive in the context of the £5.5Billion per day UK economy being put on hold for weeks.## References

- Green and Silverman (1994) Green, P. J. and B. W. Silverman (1994). Nonparametric Regression and Generalized Linear Models. Chapman & Hall.
- Lauer et al. (2020) Lauer, S. A., K. H. Grantz, Q. Bi, F. K. Jones, Q. Zheng, H. R. Meredith, A. S. Azman, N. G. Reich, and J. Lessler (2020). The incubation period of coronavirus disease 2019 (covid-19) from publicly reported confirmed cases: estimation and application. Annals of internal medicine.
- Plummer (2003) Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003)., pp. 20–22.
- Plummer et al. (2006) Plummer, M., N. Best, K. Cowles, and K. Vines (2006). coda: Convergence diagnosis and output analysis for MCMC. R News 6(1), 7–11.
- Verity et al. (2020) Verity, R., L. C. Okell, I. Dorigatti, P. Winskill, C. Whittaker, N. Imai, G. Cuomo-Dannenburg, H. Thompson, P. G. Walker, H. Fu, et al. (2020). Estimates of the severity of coronavirus disease 2019: a model-based analysis. The Lancet Infectious Diseases.
- Wood (2016) Wood, S. N. (2016). Just another Gibbs additive modeller: Interfacing JAGS and mgcv. Journal of Statistical Software 75(7).
- Wood (2017) Wood, S. N. (2017). Generalized Additive Models: An Introduction with R (2 ed.). Boca Raton, FL: CRC press.
- Wood et al. (2016) Wood, S. N., N. Pya, and B. Säfken (2016). Smoothing parameter and model selection for general smooth models (with discussion). Journal of the American Statistical Association 111, 1548–1575.

## JAGS code

The following is modified from the template produced by the jagam call in the R code.

model { theta ~ dunif(1,1000) eta <- X[,1:20] %*% b[1:20] ## linear predictor for case fw <- X[,21:25] %*% b[21:25] ## lp for week cycle for (i in 1:n) { muc[i] <- exp(eta[i]) } ## expected cases Z ~ dnorm(eta[1],.1) ## allow slight regularization at start mud <- B %*% muc ## expected deaths for (i in 1:n) { mu[i] <- exp(log(mud[i])+fw[i]) } ## model expected deaths for (i in 1:n) { y[i] ~ dnegbin(theta/(theta+mu[i]),theta) } ## response ## Parametric effect priors tau=1/56^2 for (i in 1:1) { b[i] ~ dnorm(0,0.00032) } ## prior for s(day)... for (i in c(2:19)) { b[i] ~ dnorm(0, lambda[1]) } for (i in c(20)) { b[i] ~ dnorm(0, lambda[2]) } ## prior for s(dow)... for (i in c(21:25)) { b[i] ~ dnorm(0, lambda[3]) } ## smoothing parameter priors... for (i in 1:3) { lambda[i] ~ dgamma(.05,.005) rho[i] <- log(lambda[i]) ## for monitoring } }

## R code

setwd("/foo/bar") ## directory for plots uk 0) lines(c(i,i),c(0,pt[i]),lwd=3,col="grey") ## plot } if (ps) dev.off() ########################################## ## Now the infection profile modelling... ########################################## library(rjags);library(mgcv) load.module("glm") ## improved samplers for GLMs often worth loading setwd("foo/bar") ## directory for plots and JAGS code uk1 quantile,probs=c(.025,.1,.9,.975)) ## get profile CIs day 0) lines(c(i-26,i-26),c(0,pt[i]/4),lwd=3) ## Sanity check the median infection profile. fi

Comments

There are no comments yet.