Rapidly evaluating lockdown strategies using spectral analysis: the cycles behind new daily COVID-19 cases and what happens after lockdown

04/16/2020 ∙ by Guy P. Nason, et al. ∙ Imperial College London 0

Spectral analysis characterises oscillatory time series behaviours such as cycles, but accurate estimation requires reasonable numbers of observations. Current COVID-19 time series for many countries are short: pre- and post-lockdown series are shorter still. Accurate estimation of potentially interesting cycles within such series seems beyond reach. We solve the problem of obtaining accurate estimates from short time series by using recent Bayesian spectral fusion methods. Here we show that transformed new daily COVID-19 cases for many countries generally contain three cycles operating at wavelengths of around 2.7, 4.1 and 6.7 days (weekly). We show that the shorter cycles are suppressed after lockdown. The pre- and post lockdown differences suggest that the weekly effect is at least partly due to non-epidemic factors, whereas the two shorter cycles seem intrinsic to the epidemic. Unconstrained, new cases grow exponentially, but the internal cyclic structure causes periodic falls in cases. This suggests that lockdown success might only be indicated by four or more daily falls in cases. Spectral learning for epidemic time series contributes to the understanding of the epidemic process, helping evaluate interventions and assists with forecasting. Spectral fusion is a general technique that is able to fuse spectra recorded at different sampling rates, which can be applied to a wide range of time series from many disciplines.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Transformed series, the UK spectrum and fusing the world and Europe

We transformed the number of new daily COVID-19 cases by applying a signed log transform to the first differences of the new case time series (see Methods). The transformed number of new daily cases for 16 countries are shown in Figure 1 each showing a distorted noisy, but characteristic sinusoidal trace.

Figure 1: Number of daily cases on transformed scale for 16 countries. Left-to-right, top-to-bottom: UK, IT; FR, DE; ES, CH; BE, NL; AT, NO; US+CN; IR+CA; KR+AU. First country of pair in black, second is red.

The estimated log-spectrum for the UK transformed new daily cases is shown in Figure 2 and for all other countries we analysed in the Extended Data figures. Spectral estimates are commonly displayed on a logarithmic scale[16]

. Spectral peaks can be observed at wavelengths of 6.7, 3.2 and 2.3 days, respectively. Although the peaks are visible, the credible intervals indicate that there is a fairly large degree of uncertainty, because this time series contains 52 observations. A frequentist analysis, e.g. using the

spectrum function in R[16], produces a similar result, but with even wider confidence bands.

Figure 2: Bayesian log-spectral estimate of transformed UK new daily COVID-19 cases with 50% (dark blue shaded region) and 90% (light blue shaded region) credible intervals.

Similar spectral analyses for each country indicate three similar spectral peaks, although not always as well-defined nor in precisely the same location.

Figure 3 shows an estimate that is the result of coherently fusing spectra from 18 countries, giving an an effective sample size of 916 days. Here, the clear spectral peaks have narrow credible intervals, due to the large effective number of days afforded by using 18 countries together. The spectral peaks are located at wavelengths of 6.7, 4.1 and 2.7 days. The peak around 6.7 days is observed in the spectral plots for individual countries and we interpret it to be a weekly effect. Such a weekly effect could be produced by reporting artefacts (e.g. paperwork being delayed until Monday, or carried out differently at the weekend) or due to the behaviour differences of people at weekends. All countries analysed have a 5+2 working week/weekend pattern, although not necessarily the same days of the week (the actual days for a weekend are a phase effect, which does not effect the spectrum).

Figure 3: Bayesian log-spectral estimate of fusion of new daily COVID-19 cases for 18 countries with 50% (dark blue) and 90% (light blue) credible intervals. Countries included are the UK, Italy, France, Germany, Spain, Switzerland, Belgium, the Netherlands, Austria, Norway, the USA, China, Iran, Canada, South Korea, Australia, New Zealand and Sweden.

Clustering spectra and groups of countries with similar spectra

We next clustered our 18 countries based on their spectrum, by calculating a dissimilarity between the spectra for each pair of countries, and then performing both a hierarchical cluster analysis and multidimensional scaling on the dissimilarity matrix. The scaling solution indicated that only two dimensions were required to encapsulate 72% of variation in the data. Figure 

4 shows the resultant two-dimensional solution.

Figure 4: Multidimensional scaling solution of dissimilarity matrix generated by Euclidean distances calculated between two spectral estimates for each pair of countries. Countries’ position denoted by their two character ISO 3166-1 standard abbreviation. The ellipses group together countries that are clustered using the robust cluster stability method[15].

Attaching a meaning to the scaling axes in Figure 4 is not easy. We hypothesise that Axis 1 might indicate how badly a country has been perceived to have been affected by the virus with Australia, New Zealand and Sweden less so and those on the left of the plot considerably more so. However, Germany is the obvious anomaly to this interpretation as, currently, it has perhaps been perceived to have handled the crisis well so far.

Figure 5 show the spectral estimates for the three groups of countries identified in Figure 4, using the clustering techniques mentioned in Methods.

(a)
(b)
(c)
Figure 5: Bayesian log-spectral estimates and 50% and 90% credible intervals for (a) Group 1 countries: Spain, France, Italy, the US, the UK and Iran. Effective number of days=357; (b) Group 2 countries: Switzerland, Canada, Belgium and China. Effective number of days=229; (c) Group 3 countries: Norway, Australia and South Korea. Effective number of days=157.

The peak frequencies for each of these groups is listed in Table 1, which shows differences between them. However, each group possesses a possible weekly peak and higher-frequency peaks labelled a., of around three to four days, and b., around 2.6 days.

Peak Group 1 Group 2 Group 3
Weekly 6.48 7.27 6.59
a. 3.31 4.30 4.09
b. 2.52 2.77 2.70
Table 1: Spectral peaks for the three country groups in units of days. The peaks in the second and third rows have been arbitrarily labelled as peak a. and b.

Spectral changes after lockdown

Many countries experiencing the COVID-19 pandemic have instituted a lockdown procedure to dramatically reduce virus transmission. At the time of writing, these countries have observed new daily COVID-19 cases for between 43 and 54 days. We assume that, on average, it takes about seven days for the virus to incubate, for a person to seek attention and then be tested positive for the SARS-CoV-2 coronavirus. For each country, the number of days prior to and after the lockdown (including incubation time) are the UK (38, 14), Italy (26,28), France (23, 26), Germany (24, 25), Spain (22, 27), Switzerland (22, 25), Belgium (18, 25) and the Netherlands (19, 26). For some of these countries the lockdown was applied over a period of two of three days and we took the median of these as the lockdown start date.

The number of days before and after the lockdown are, in each case, too small to carry out anything other than the most simplistic time series to maintain statistical reliability. In particular, a spectral estimate in this situation would be subject to a high degree of uncertainty. However, Figure 6 shows our coherently fused spectral estimates[3] across these countries before and after the lockdown period, making use of 192 effective days prior to lockdown and 196 days afterwards.

Figure 6: Pre-lockdown Bayesian log-spectral estimate (red solid line) and 90% credible interval (red dashed lines). Post-lockdown log-spectral estimate (blue solid line) and 50% and 90% credible intervals.

The weekly peak is clearly visible in both estimates. The second and third peaks (labelled a. and b. in Table 1) are visible pre-lockdown, but have all but disappeared post-lockdown. The spectrum is flat in the location where peak a. was previously, and spectral power declines considerably, relatively, where peak b. was located previously.

This result is particularly interesting as it suggests that peaks a. and b. have been disrupted by the lockdown. The weekly effect seems relatively unchanged by the lockdown, indicating that perhaps it was strongly driven by non-epidemic effects, such as recording/paperwork or bureaucracy caused by weekends.

The post-lockdown spectrum is higher overall than the pre-lockdown spectrum, this is due to the larger variation associated with the larger number of cases identified during the progress of the epidemic. Our transformation suppresses this variation, but does not remove it entirely.

Forecasting daily cases

We have had varied success in forecasting daily cases using a sum of two time-modulated cosine waves model, described in Methods, and more research is required. We used the Nelder-Mead[17] optimisation routine built into R[16], with starting frequencies of and taken from our UK spectral estimate plots, and built the model on the transformed cases up to April 11th. After optimisation, the fitted model resulted in modified frequencies of , close to the starting frequencies (the other estimated parameters were ).

Figure 7: Number of transformed new UK COVID-19 cases up to and including April 11th (black line), fitted model (red line) with forecasts (green line).

Figure 7 shows transformed new daily UK cases, the model fit and forecasts. The model fit does not look too bad, many spectral peaks are being identified, but perhaps the amplitudes of them could be better matched.

The untransformed forecasts for April 12th, 13th, 14th and 15th were 5250, 5200, 6373 and 6164, all with approximate 95% confidence interval of

. The actual number of cases for April 12th turned out to be 5288. In this case, the forecast was good. However, the two-step ahead forecast of 5200 was wrong — the true value turned out to be 4342 on April 13th. We also used several stochastic forecasting methods based on autoregressive integrated moving average modelling and exponential smoothing, but nothing that we tried was particularly successful. The series is difficult as its amplitude/variance is not constant and we suspect that frequencies are changing over time (as, e.g., the lockdown plot Figure 

6 indicated).

However, rather than point forecasts, the general sinusoidal nature of the transformed cases suggests a further, perhaps more reasonable strategy. At this stage, the UK Government and media are looking expectantly at the daily case numbers to try and detect a sustained downward trend in cases. Excitement has been generated by a drop in cases two days in a row. This happened on April 5th with 5903 cases, followed by a drop to 3802 and then 3634 on April 6th and 7th and then, unfortunately, increasing to 5491 on the 8th. However, the general sinusoidal patten, with a wavelengths of about 2.7 and four days shows that we should only perhaps start believing that a downward turn is a downward trend after a sustained decrease of four days or more. However, caution needs to be applied here as there is no guarantee that the dynamics will remain unchanged.

Discussion

We analysed numbers of deaths using similar methods described here and found similar cycles. Although we have not carried out a detailed analysis, if the number of deaths process can be approximated by a linear system[1, 10] with the numbers of cases as input, then similar cycles are to be expected.

A time series with a fixed sampling rate and length has a minimum and maximum (Nyquist) frequency that can be observed.[1, 10] Although our spectral fusion methods[3] provide more accurate estimates of the spectrum in the normal range (equivalent to having a larger sample size), they can not provide information on frequencies outside of the normal range. To estimate lower frequencies, we would need a genuinely longer series and, for higher frequencies, we would require cases more frequently than once a day, which are arguably not really necessary for any practical purpose.

Our analyses assume approximate stationarity and linearity for the transformed series, which is unlikely to be exactly true in practice. For example, in the UK transformed case series in Figure 7, there are hints of the series oscillation speeding up over the last ten days. Practically speaking, changes in the testing regime, recording practices, the lockdowns or other interventions will change the dynamics of either the pandemic itself or recording of it. Ideally, it would be of interest to use methods for non-stationary time series[18, 19], but the current series available to us are far too short for such analyses.

Methods

All computations were executed in R[16] and packages that are mentioned specifically below.

COVID-19 New Daily Cases Transformation.

Let , for represent the number of new daily cases for days for country . The spectral dynamics of the number of daily cases for different countries are all countries masked by the well-known and characteristic exponential increases (and decrease, for those countries that locked down and have now passed their peak). Hence, we transform our number of daily cases series to reveal the spectral dynamics. After exploration[10] the following transform was used for all series , where the sign function is , if is positive or , if is negative, and for . The transform is easily inverted, which is essential for forecasting the number of daily cases.

Bayesian Spectral Estimation and Fusion: Regspec

We use the regspec[3, 20] Bayesian spectral estimation method with a neutral white noise prior with prior variance of and all default arguments, except for a smoothing parameter of , although the results are not sensitive to the latter. Regspec straightforwardly enables the production of spectral estimates using multiple data sets, with each having different lengths and produces coherent credible intervals to properly ascertain the uncertainty inherent in the estimation process. Regspec can also fuse spectra for multiple series recorded at different sampling rates, but we do not need to use this aspect of its functionality here as all our case time series are reported daily. However, if a country decided to release case numbers on some other sampling plan (e.g. every two days, or weekly) then Regspec would be able to fuse the spectral estimates as described here. Such a feature might be of use when dealing with reporting structures that are not equipped to provide daily reporting of cases or where weekly cases are thought to be more accurate. For example, this might apply to regions with fragile health or reporting systems or populations that are spread across widely dispersed geographical regions with poor communications.

Clustering of Spectra.

Although the number of cases transformed time series show similar spectral behaviour, it is possible to observe closer similarities within certain subgroups of countries. We used unsupervised clustering and scaling techniques[21, 22] to depict the relationship between different countries and suggest a clustering for them. First, for each country we produced a spectral estimate using regspec as mentioned above, and then formed a dissimilarity for each pair of countries by computing the Euclidean distance between their spectral values (using the dist function in R[16]). Classical multidimensional scaling was then used to produce an estimated configuration using the cmdscale function in R[16]. For clustering we use hierarchical cluster analysis on the dissimilarity matrix we computed. It is well-known that dendrograms are sensitive to the input dissimilarity matrix, so we used the clusterwise cluster stability assessment by resampling method to produce a stable clustering[15].

Forecasting

Given the form of the transformed new daily cases we propose a model, , that is the sum of two time-modulated cosine waves, , each with formula

(1)

where indexes the two waves and . Initial values for forecast model fitting we used , for

. For model evaluation we put more weight on getting later observations correct and use a residual weight vector

where and is the number of cases. For short term forecasting, we fit to the transformed daily cases by weighted least-squares using standard R[16] optimisation functions and then extrapolate , using recent weighted residuals to estimate the forecasting error.

Data Availability

The number of daily COVID-19 cases for countries can be found at the website of the European Centre for Disease Prevention and Control[2].

References

  • [1] Priestley, M.B. Spectral analysis and time series. (Academic Press, 1983).
  • [2] European Centre for Disease Prevention and Control, COVID-19 cases worldwide. https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide, April 14th 2020.
  • [3] Nason, G. P. et al. Should we sample a time series more frequently? decision support via multirate spectrum estimation (with discussion). J. Roy. Stat. Soc. A, 180, 353–407 (2016).
  • [4] Drewett, Z. We don’t know if coronavirus deaths will peak this week. https://metro.co.uk/2020/04/06/dont-know-coronavirus-deaths-will-peak-week-12517835/ (2020).
  • [5] Percival, D.B. & Walden, A.T. Spectra analysis for physical applications. (Cambridge University Press, 2010).
  • [6] Grenfell, B.T. et al. Travelling waves and spatial hierarchies in measles epidemics. Nature, 414, 716–723 (2001).
  • [7] Conlan, A.J.K. & Grenfell, B.T. Seasonality and the persistence and invasion of measles. Proc. Roy. Soc. B, 274, 1133–1141 (2007).
  • [8] Ferrari, M.J. et al. The dynamics of measles in sub-Saharan Africa. Nature, 451, 679–684 (2008).
  • [9] Benvenuto, D. Application of the ARIMA model on the COVID-19 epidemic dataset. Data in Brief, 29, 105340 (2020).
  • [10] Chatfield, C. The Analysis of Time Series: An Introduction, (Chapman and Hall/CRC Press, 2003).
  • [11] Hyndman, R.J. & Athanasopoulos, G. Forecasting: Principles and Practice, (OTexts, 2013).
  • [12] Nason, G.P. & Savchev, D. White noise testing using wavelets. Stat, 3, 351–362 (2014).
  • [13] Chowell, G. et al. A novel sub-epidemic modeling framework for short-term forecasting epidemic waves. BMC Medicine, 17, 164 (2019).
  • [14] Roosa, K. et al. Short-term forecasts of the COVID-19 epidemic in Guangdong and Zhejiang, China: February 13–23, 2020. J. Clin. Med., 9, 596 (2020).
  • [15] Hennig, C. Flexible Procedures for Clustering, R package version 2.2-5. https://CRAN.R-project.org/package=fpc (2020).
  • [16] R Core Team. R: a language and environment for statistical computing (R Foundation for Statistical Computing, 2020).
  • [17] Nelder, J.A. & Mead, R. A simplex algorithm for function minimization. Comput. J., 7, 308–313 (1965).
  • [18] Dahlhaus, R. Locally stationary processes. In Handbook of Statistics, Subba Rao, T., Subba Rao, S. & Rao, C. (eds), 351–413. (Elsevier, Amsterdam, 2012).
  • [19] Nason, G. P. A test for second-order stationarity and approximate confidence intervals for localized autocovariances for locally stationary time series. J. Roy. Stat. Soc. B, 75, 879–904 (2013).
  • [20] Powell, B. et al. Non-parametric Bayesian spectrum estimation for multirate data. R package version 2.4, https://CRAN.R-project.org/package=regspec (2016).
  • [21] Chatfield, C. & Collins, A.J.

    An Introduction to Multivariate Analysis

    (Chapman and Hall/CRC, 1980).
  • [22] Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning, (Springer, 2009).